[bitc-dev] Re-examining pair consing

Jonathan S. Shapiro shap at eros-os.com
Tue Jul 8 09:51:34 CDT 2008

In an earlier version of BitC, functions were specified as taking
exactly one argument, and the application syntax

  (f a b c)

were defined (respectively) as syntactic sugar for

  (f ())
  (f (pair a (pair b c)))

I abandoned this approach for three reasons:

  1. C procedures can take zero arguments, and I was uncomfortable at
     the time with recognizing unit types as a compatibility hack. And
     yes, somebody did suggest this to me at the time, and yes I was
     being foolish, now that I understand what is actually needed.

  2. I was concerned about optimization. In particular, if emitting
     the necessary pair-consing operations in C led to a lot of data
     copies that would then need to be removed, the performance impact
     of this could be fairly severe. I was particularly concerned that
     this would be an issue for the C code generator.

     I have since come to realize that this was a misplaced concern.
     It is perfectly okay for the back end to *type* the application
     as a pair-consed operation but *emit* the operation as a
     conventional procedure call. The only real issue is what to do
     when one sees something of the form

        (f single-object-of-the-pair-consed-type)

     and of course, the pair-consed type can be unpacked into elements
     for calling purposes if doing so seems sensible.

  3. I was slightly concerned about calling convention compatibility. In
     particular, the alignment constraints for structures do not always
     match those for procedure parameters on some platforms. At the
     time, I had not yet recognized the possibility that we could unpack
     the parameters at call time.

  4. In the presence of a type variable in the last parameter, position,
     implicit pair-consing defeats the ability to check that the call
     site has the correct number of parameters. That is, given:

       (proclaim f: (fn (int 'a) 'b))

     we effectively get a compile-time varargs expansion:

        (f 1 #\c)
        (f 1 #\c #\d)   'a bound to (pair char char)

  5. In the presence of pair-consing, the presence of by-reference
     types becomes problematic. At the moment, by-ref is restricted to
     appear only in procedure argument types, and it is only safe there
     because we can presume a form of region-based safety that is
     guaranteed by the nature of the calling convention.

I suppose that in hindsight points 1..3 this should all have been
obvious, but it wasn't.

Having the quasi-varargs behavior available is actually very useful in
some cases (consider printf), and I would therefore like to re-consider
the pair-consing definition of procedure argument composition. If we do
this, I think that there are two issues we need to resolve:

  1. How do we deal with the fact that by-ref is being generalized
     and must not escape?

  2. How do we preserve some ability to check application arity (the
     number of arguments given) so that we can head off certain kinds
     of errors? In a polymorphic language we really are likely to see
     trailing arguments of alpha type, which is the case where this
     can arise.

Concerning arity checking, there is no "correct" solution. Any syntactic
assist by the compiler is necessarily just that: an assist. Given:

  (proclaim f: (fn (int 'a) 'b))

there isn't any semantic difference between:

  (f 1 #\c #\c)
  (f 1 (pair #\c #\c))
  (f (pair 1 (pair #\c #\c)))

But I am considering the following syntax modification:

   1. In the lambda form, add the ability to write:

          (lambda (x y ...) body)

      with the intended meaning that there is no maximal
      expected arity for this procedure.

   2. In the application:

        (f arg arg ... arg)

      we only pair-cons up to the expected arity. If additional
      parameters remain, it is a compile-time error.

   3. Possibly add new syntax

      (apply-1 f arg)

      in which the underlying parameter pair type is merely checked
      for compatibility without checking for arity.

      I would actually prefer to omit this.

I would appreciate reactions to this proposal, in particular:

  1. In the experience of Haskell/ML programmers, is checking
     arity worth the bother? Am I possibly worrying about nothing?

     The problem I see is that in ML/Haskell this is effectively done
     by the type checker, but in our case the implicit pair-consing
     defeats that.

  2. Does it seem likely that adding apply-1 is useful/important?

  3. Is the introduction of "varargs" as part of the parameter type
     likely to create more problems than it solves?


More information about the bitc-dev mailing list