[bitc-dev] ETL was: BitC surface syntax
Plotnikov, Constantine A
cap at isg.axmor.com
Wed Sep 6 01:31:28 CDT 2006
I would like if you would consider ETL approach to constructing a syntax
for the Bit-C language (online docs are at
http://etl.sourceforge.net/etl-java-0_2_0/doc/readme.html, the parser
could be downloaded from
http://sourceforge.net/project/showfiles.php?group_id=153075&package_id=178546).
Note that ETL requires semicolon for statement terminator, because of
different software engineering reasons. Among other things it makes,
statement termination deterministic and this is important for error
recovery in face of extensions. In case if you do not like it, it is
relatively easy to create a variant of ETL that uses new line as
statement terminator and even to support automatic continuation in case
of incomplete expression like it is done in E. At one of stages of ETL
development there were Python-like phrase syntax, however it has been
dropped out. I'm more interested in testing overall approach rather than
testing particular choice of phrase syntax and lexical syntax, so I
would like if Bit-C uses the same approach even with different phrase
syntax and lexical syntax. However, such decision will disable tool and
parser sharing between ETL and Bit-C.
The advantages of using ETL for Bit-C would be the following:
1. It will be easy to define a kernel language that does not use any
syntax sugar.
2. It will be possible to extend the language with new constructs.
3. If compiler will be constructed carefully enough, even by end-users
of the compiler could extend the language. This enables open source way
of development even for the language. Language users could create
language libraries like the currently create component libraries. Such
language library will consists from new grammar, plugin that supports
this grammar in the compiler, and possibly runtime utility library used
by compiler plugin. Successful extensions might be integrated back into
main language or be distributed as contributed extensions along with
compiler.
4. It will be possible to embed domain-specific languages into language
(for example, state machines) and these DSLs could be used to simplify
proofs and to reduce amount of bugs in the program (because DSL allows
following DRY principle further). Additional theorem could be generated
from embedded DSL fragments automatically. Note that such DSLs could be
project specific and even module specific. I expect that operating
system will have some concepts that are not easily or efficiently
expressed in general purpose languages. It would be also possible to do
optimized expansion of DSLs that uses unsafe constructs if such
expansion is proved to be functionally equivalent to some expansion that
uses safe constructs. Also DSLs fragments do not have to be embedded
into main language, they could exists as separate files and processed by
other tools. For example Interface Defintion Language could be also
expressed as DSLs. And such DSLs could share some some things with main
language, for example constant expression sub-language.
5. Language definition and language interpretation are separated from
each other. So it is possible to publish grammars and implement them
differently using completely different tools. Generic ETL tools like
text editors will be reusable as is, it will be easy to write tools that
examine source code in other programming languages (for example Java).
The situation will be similar to current situation with XML, for which
almost any language could be used. For example, there is a tool that
generates syntax highlighted html from source code written in Java. It
could be used for any text written in ETL-based language.
6. ETL-based syntax is easy to express using traditional LL(1) parser
generators. An ETL grammar definition could be converted to antlr (there
is no such tool yet, but it should be easy to create one). It is also
relatively easy to write hard-coded parser provided that lexer and
phrase parsers exists. ETL parser for Java uses a hard-coded parser to
parse grammar for grammars (it is used to parse only this grammar).
7. ETL syntax is friendly to pull parsing model. So it could be used for
interactive shells.
8. ETL phrase syntax was designed with considerations for easy automatic
error recovery and is friendly for mixing different languages together.
9. A generic ETL parser is relatively easy to implement. Latest almost
complete Java reimplementation took me about 3 month. In functional
languages with pattern matching and high order functions, it should be
simpler.
Current parser is for Java. I can implement parser for some additional
programming language like Haskell or I could write paper describing
parser implementation approach, if it might help ETL approach adoption
for Bit-C.
Constantine
More information about the bitc-dev
mailing list