[bitc-dev] Type Classes: Pragmatics Questions
Mark P Jones
mpj at cse.ogi.edu
Wed Jul 20 02:36:40 EDT 2005
> One claim about type classes is that it is often possible to derive the
> concrete type at compile time, and if you know this you can partially
> evaluate out the wrapper procedure and the runtime dictionary
> 1. I would really appreciate it if someone with greater experience in
> this than we have could give some intuition about how much trouble we
> should expect in establishing concrete integral types in the normal
This could indeed be a problem if you don't take steps to address it.
Suppose, for example, that you write code to do some calculation with
32-bit integers, but the type inference engine finds that it can give
a more general type that would allow your code to be used, as is, on
rationals, bignums, and complex numbers. While it's nice to have that
flexibility, you didn't actually need or want it. Even worse ... you
hoped that your code would compile down into a short sequence of simple
machine language instructions, but instead you get something that is
doing all kinds of nefarious (and expensive) dictionary manipulations
and dispatching on higher-order functions.
One way to deal with this is to prevent users from defining overloaded
functions unless they include some explicit indication in their code
that this is really what they want. An idiomatic way to do this in a
Haskell like language would be to require users to provide explicit
type signatures for overloaded functions. Thus the following definition
would be rejected:
-- inferred type is (Num a) => a -> a
f x = x + 1
But you could get the compiler to accept it in its most general
form, indicating explicitly that you are aware of the presence of
dictionaries in the implementation, by adding the type signature:
f :: (Num a) => a -> a
f x = x + 1
Or, alternatively, you could add a type signature that picks out
a specific type, as in the following:
f :: Int32 -> Int32
f x = x + 1
Haskell doesn't do this in full generality, but it does have a
feature called the "monomorphism restriction" that uses this rule
in the special case where the left hand side of a function definition
is a single variable, as in the following"
f = \x -> x + 1
Let's just say that the monomorphism restriction isn't much appreciated
by the Haskell community ... beginners find it hard to understand, and
more experienced users often find it to be a pain because they don't
want to write down a type signature. It's not too uncommon to see
programmers add a "dummy" argument to a function definition, simply to
avoid this problem.
It might be worth mentioning that other choices are available on the
language design front. For example, instead of asking for a type
signature, you could add an extra "overloaded" keyword and only allow
the definition of "f" above to be overloaded if it is written as:
overloaded f x = x + 1
I think these are representative of your options if you want to solve
the problem as part of the language design. However, if you want to
tackle this as part of the language implementation, then you have other
options to consider ...
> 2. It is moderately distressing that performance of abstract containers
> should rely so intensively on optimization. Is there any published,
> coherent discussion of how to compile this sensibly and efficiently?
I'll leave you to determine whether it is coherent or not, but I had
a paper in PEPM 94 called "Dictionary-free Overloading by Partial
Evaluation" that showed how to implement type class overloading without
any runtime dictionaries by generating specialized versions of
overloaded functions. (The earlier tech report version of this work
may be more interesting because it has material that I had to leave
out of the conference version for lack of space. You'll find the
tech report at http://www.cse.ogi.edu/~mpj/pubs/RR-959.pdf, and the
PEPM version at http://www.cse.ogi.edu/~mpj/pubs/pepm94.html.)
You might think that this would cause a "code explosion" because now
your compiled code contains multiple versions of the functions in the
original program. However:
- in practice, a decent inliner would have inlined many of those
functions anyway ...
- intuitively, the bigger/more complex a function gets, the less
polymorphic it is likely be, and hence the less likely you are
to need multiple copies at different types ...
- when you eliminate dictionaries, you also eliminate a whole bunch
of unnecessary dependencies, which means that you can also do a
better job of tree shaking (i.e., removing dead code from the
final program). For example, a program that uses only addition
doesn't need to include code for all the other operations that
would be packaged up (but never used) in a dictionary ...
My implementation didn't do any fancy optimizations but, nevertheless,
in every case, specialization actually produced smaller programs, a
result that I attributed to the improved tree shaking mentioned above.
Truth and honesty time: this ain't perfect either:
- it makes your implementation more complex, which isn't so good
if you're aiming for high assurance and not planning to overlook
any assurance arguments for your compiler ...
- there are (mostly pathological) cases in Haskell where you can't
specialize away dictionaries at compile-time ... this occurs in
programs that require a potentially infinite sequence of dictionaries
to be constructed a runtime. I can give examples, but perhaps we'll
do that on demand, and in a separate email ...
- specialization is a whole program analysis, which means that the
technique might have problems scaling to deal with large systems.
(This doesn't rule out separate compilation, but it does mean that
you are going to have to rely on some fairly smart linker/whole
program optimizer ...) Fortunately for us, our interest is in
compiling a *micro*kernel ... with any luck, our programs won't
be too big (assurance would become a problem too in that case)
and whole program analysis should still be a viable option...
Hope this helps!
All the best,
More information about the bitc-dev