[bitc-dev] Quick haskell question

Mark P. Jones mpj at cs.pdx.edu
Tue Aug 10 22:57:08 PDT 2010


> The discussion of infix is clear that precedence is associated with the entity rather than the token

Right.

> I am struck that this implies a significant intertwining of state between the parser and the lexer. In particular, if the Lexer is to correctly report operator precedence to the parser, it needs the ability to look up the "<<" operator in the appropriate lexical environment.

No.  Operator precedence is not a lexical property.  A lexer
can tell you that it's seen an "<<" operator symbol, but it
has no notion of scope.  Indeed, the fixity declaration for
a symbol may not appear until after that symbol has been used
in the source text, so it is impossible for the lexer to report
accurate fixity information to the parser.

> From an implementation perspective, I can see two ways to approach this, and I would appreciate explanation from someone who knows what is actually going on in Haskell or ML:
> 
> The first approach is to restrict fixity introduction to top-level forms and process top-level forms one at a time, each in the environment resulting from the previous form.

Early versions of Haskell did something like this, allowing only
top-level fixity declarations and requiring that they all appear
at the very beginning of the source file.  This was workable, but
awkward and overly rigid from a programmer perspective, and
typically results in some ugly special cases from a
compiler/implementor perspective.

> The second approach is to cheat. We build a temporary environment at parse time containing only the mixfix operators and their precedence, without regard to their type. This construction is performed by the parser and carried up and down through the parse tree processing. The one complication with this is getting it to unwind correctly in the presence of parse errors, but that should not (in principle) be insurmountable. This approach has the property that it can successfully admit local fixity declarations.

Again, I think you're going to have problems with this because you
will, in general, have to parse an expression before you can be
sure that you have all the necessary fixity information.

> Which approach is used in Haskell?

I can't speak for all implementations but I know at least a few
where this is handled, not in lexical analysis or parsing, but
instead as part of static analysis.  Apologies if the following
goes in to more detail than you need, but here's how it works:

- The lexer just returns operator symbols, etc. as it sees them,
  without attempting to add any precedence or other details.

- The parser recognizes expressions of the following form as a
  special case, but does not attempt to resolve precedence etc.

        expr0 op1 expr1 op2 .... opN exprN

  For example, you could represent this kind of structure as a
  pair of type (Expr, [(Op, Expr)]); the first component provides
  the initial expr0, the second is a non-empty list of operator
  and argument pairs.  I'll call this kind of structure a
  "deferred infix" in what follows.   The parser also recognizes
  any fixity declarations in the program, and includes them in the
  abstract syntax for the appropriate list of decls, but again
  does not attempt to do anything more with them at this stage.

- When static analysis begins, the compiler can build environment
  structures that accurately reflect the structure and lexical
  scoping of the input programs.  These environments capture the
  the names (and, optionally, declared types) of the entities that
  are in scope at any given point in the program.  As static
  analysis encounters each new group of declarations, it transfers
  information from fixity declarations to the corresponding
  environment structure.  And then, when static analysis
  encounters one of the "deferred infix" expressions like the one
  above, it can use the environment not only to look up each of
  the operator names (to make sure they are actually defined
  somewhere in the program), but also to determine the associated
  fixities, if any.  After that, it's a relatively simple matter
  to use those fixity details to eliminate "deferred infix" nodes
  from the AST and then move on to type checking and beyond.

> Is it even desirable to admit local fixity declarations, or is it just an unnecessary complication? I think that local fixity declarations are useful, exactly because the don't corrupt the global scope.

I've probably made the above sound complicated, but in my
experience:

- There are probably very few programs that use or need local
  fixity declarations, so it's not a big deal for most Haskell
  programmers (although it's nice that you can put fixity
  decls with the rest of the code that they refer to instead
  of having to lift them to the top of the file).

- From an implementor's perspective, this was a definite win,
  rationalizing the code for handling infix operators, and
  eliminating awkward special cases.

Hope this helps!

Mark




More information about the bitc-dev mailing list