[bitc-dev] ETL was: BitC surface syntax

Plotnikov, Constantine A cap at isg.axmor.com
Wed Sep 6 01:31:28 CDT 2006


I would like if you would consider ETL approach to constructing a syntax 
for the Bit-C language (online docs are at 
http://etl.sourceforge.net/etl-java-0_2_0/doc/readme.html, the parser 
could be downloaded from 
http://sourceforge.net/project/showfiles.php?group_id=153075&package_id=178546). 


Note that ETL requires semicolon for statement terminator, because of 
different software engineering reasons. Among other things it makes, 
statement termination deterministic and this is important for error 
recovery in face of extensions. In case if you do not like it, it is 
relatively easy to create a variant of ETL that uses new line as 
statement terminator and even to support automatic continuation in case 
of incomplete expression like it is done in E. At one of stages of ETL 
development there were Python-like phrase syntax, however it has been 
dropped out. I'm more interested in testing overall approach rather than 
testing particular choice of phrase syntax and lexical syntax, so I 
would like if Bit-C uses the same approach even with different phrase 
syntax and lexical syntax. However, such decision will disable tool and 
parser sharing between ETL and Bit-C.

The advantages of using ETL for Bit-C would be the following:
1. It will be easy to define a kernel language that does not use any 
syntax sugar.

2. It will be possible to extend the language with new constructs.

3. If compiler will be constructed carefully enough, even by end-users 
of the compiler could extend the language. This enables open source way 
of development even for the language. Language users could create 
language libraries like the currently create component libraries. Such 
language library will consists from new grammar, plugin that supports 
this grammar in the compiler, and possibly runtime utility library used 
by compiler plugin. Successful extensions might be integrated back into 
main language or be distributed as contributed extensions along with 
compiler.

4. It will be possible to embed domain-specific languages into language 
(for example, state machines) and these DSLs could be used to simplify 
proofs and to reduce amount of bugs in the program (because DSL allows 
following DRY principle further). Additional theorem could be generated 
from embedded DSL fragments automatically. Note that such DSLs could be 
project specific and even module specific. I expect that operating 
system will have some concepts that are not easily or efficiently 
expressed in general purpose languages. It would be also possible to do 
optimized expansion of DSLs that uses unsafe constructs if such 
expansion is proved to be functionally equivalent to some expansion that 
uses safe constructs. Also DSLs fragments do not have to be embedded 
into main language, they could exists as separate files and processed by 
other tools. For example Interface Defintion Language could be also 
expressed as DSLs. And such DSLs could share some some things with main 
language, for example constant expression sub-language.

5. Language definition and language interpretation are separated from 
each other. So it is possible to publish grammars and implement them 
differently using completely different tools. Generic ETL tools like 
text editors will be reusable as is, it will be easy to write tools that 
examine source code in other programming languages (for example Java). 
The situation will be similar to current situation with XML, for which 
almost any language could be used. For example, there is a tool that 
generates syntax highlighted html from source code written in Java. It 
could be used for any text written in ETL-based language.

6. ETL-based syntax is easy to express using traditional LL(1) parser 
generators. An ETL grammar definition could be converted to antlr (there 
is no such tool yet, but it should be easy to create one). It is also 
relatively easy to write hard-coded parser provided that lexer and 
phrase parsers exists. ETL parser for Java uses a hard-coded parser to 
parse grammar for grammars (it is used to parse only this grammar).

7. ETL syntax is friendly to pull parsing model. So it could be used for 
interactive shells.

8. ETL phrase syntax was designed with considerations for easy automatic 
error recovery and is friendly for mixing different languages together.

9. A generic ETL parser is relatively easy to implement. Latest almost 
complete Java reimplementation took me about 3 month. In functional 
languages with pattern matching and high order functions, it should be 
simpler.

Current parser is for Java. I can implement parser for some additional 
programming language like Haskell or I could write paper describing 
parser implementation approach, if it might help ETL approach adoption 
for Bit-C.

Constantine






More information about the bitc-dev mailing list