[bitc-dev] Re-examining pair consing
Jonathan S. Shapiro
shap at eros-os.com
Tue Jul 8 09:51:34 CDT 2008
In an earlier version of BitC, functions were specified as taking
exactly one argument, and the application syntax
(f a b c)
were defined (respectively) as syntactic sugar for
(f (pair a (pair b c)))
I abandoned this approach for three reasons:
1. C procedures can take zero arguments, and I was uncomfortable at
the time with recognizing unit types as a compatibility hack. And
yes, somebody did suggest this to me at the time, and yes I was
being foolish, now that I understand what is actually needed.
2. I was concerned about optimization. In particular, if emitting
the necessary pair-consing operations in C led to a lot of data
copies that would then need to be removed, the performance impact
of this could be fairly severe. I was particularly concerned that
this would be an issue for the C code generator.
I have since come to realize that this was a misplaced concern.
It is perfectly okay for the back end to *type* the application
as a pair-consed operation but *emit* the operation as a
conventional procedure call. The only real issue is what to do
when one sees something of the form
and of course, the pair-consed type can be unpacked into elements
for calling purposes if doing so seems sensible.
3. I was slightly concerned about calling convention compatibility. In
particular, the alignment constraints for structures do not always
match those for procedure parameters on some platforms. At the
time, I had not yet recognized the possibility that we could unpack
the parameters at call time.
4. In the presence of a type variable in the last parameter, position,
implicit pair-consing defeats the ability to check that the call
site has the correct number of parameters. That is, given:
(proclaim f: (fn (int 'a) 'b))
we effectively get a compile-time varargs expansion:
(f 1 #\c)
(f 1 #\c #\d) 'a bound to (pair char char)
5. In the presence of pair-consing, the presence of by-reference
types becomes problematic. At the moment, by-ref is restricted to
appear only in procedure argument types, and it is only safe there
because we can presume a form of region-based safety that is
guaranteed by the nature of the calling convention.
I suppose that in hindsight points 1..3 this should all have been
obvious, but it wasn't.
Having the quasi-varargs behavior available is actually very useful in
some cases (consider printf), and I would therefore like to re-consider
the pair-consing definition of procedure argument composition. If we do
this, I think that there are two issues we need to resolve:
1. How do we deal with the fact that by-ref is being generalized
and must not escape?
2. How do we preserve some ability to check application arity (the
number of arguments given) so that we can head off certain kinds
of errors? In a polymorphic language we really are likely to see
trailing arguments of alpha type, which is the case where this
Concerning arity checking, there is no "correct" solution. Any syntactic
assist by the compiler is necessarily just that: an assist. Given:
(proclaim f: (fn (int 'a) 'b))
there isn't any semantic difference between:
(f 1 #\c #\c)
(f 1 (pair #\c #\c))
(f (pair 1 (pair #\c #\c)))
But I am considering the following syntax modification:
1. In the lambda form, add the ability to write:
(lambda (x y ...) body)
with the intended meaning that there is no maximal
expected arity for this procedure.
2. In the application:
(f arg arg ... arg)
we only pair-cons up to the expected arity. If additional
parameters remain, it is a compile-time error.
3. Possibly add new syntax
(apply-1 f arg)
in which the underlying parameter pair type is merely checked
for compatibility without checking for arity.
I would actually prefer to omit this.
I would appreciate reactions to this proposal, in particular:
1. In the experience of Haskell/ML programmers, is checking
arity worth the bother? Am I possibly worrying about nothing?
The problem I see is that in ML/Haskell this is effectively done
by the type checker, but in our case the implicit pair-consing
2. Does it seem likely that adding apply-1 is useful/important?
3. Is the introduction of "varargs" as part of the parameter type
likely to create more problems than it solves?
More information about the bitc-dev