[bitc-dev] Extent of type class instances (MPJ look)
Mark P. Jones
mpj at cs.pdx.edu
Mon Jun 5 03:40:55 EDT 2006
> We have reached the point where we need to make a decision about the
> extent of type class instances. There appear to be three possible
> choices:
I will try to explain the choices that were made in the design of
Haskell. I think they were reasonable choices, motivated largely
by practical considerations, but I think one could also choose to
draw the lines in different places (as I once tried to do long ago
with Gofer).
> 1. Instances are purely lexically scoped. The set of available instances
> is determined by the set of instances that are lexically visible at the
> point of instantiation (where "visible" means "you imported the
> interface that defined it" -- instance declarations escape their
> exporting interfaces as soon as the interface is imported)
This is pretty much what Haskell does. Here's a brief quote from the
Haskell report (Section 5.4, p74) on this topic:
"Instance declarations cannot be explicitly named on import or export
lists. All instances in scope within a module are always exported and
any import brings all instances in from the imported module. Thus, an
instance declaration is in scope if and only if a chain of import
declarations leads to the module containing the instance declaration."
> 2. Instances are scoped according to the outermost current unit of
> compilation. An instance can appear later than its use, and this will be
> resolved successfully.
I'm not sure I fully understand this suggestion. Are you perhaps thinking
of a system that would allow something like the following pair of modules?
module N where
class C a where c :: a -> a
twice :: C a => a -> a
twice x = c (c x)
twiceFalse = twice False
module M where
import N
instance C Bool where c = not
main = print twiceFalse
Note here that the definition of twiceFalse in module N seems to require
an instance of C for Bool, but there is no such instance in N.
Technically, it is still possible to type check the code in N; the principal
type of twiceFalse is: C Bool => Bool -> Bool.
A Haskell compiler, however, would reject the definition of N. I believe
this choice was made as a pragmatic compromise between the desire to be as
flexible as possible and the desire to allow early detection of errors.
In other words, the designers thought that code like this was most likely
to be a programmer error (forgetting to define or import an instance of C
for Bool in N). While it would be possible to assign the C Bool => ...
type to twiceFalse, that could prevent the (presumed) error from being
detected until the top of the call graph (intermediate points on the call
graph would also inherit the extra C Bool constraint).
So, to get this to work in Haskell, you would either move the instance
of C Bool from M to N, or else the definition of twiceFalse from N to M.
I'll say again that this is a pragmatic compromise, and a somewhat
arbitrary and arguably inconsistent decision in the language design:
- Haskell rejects the original version because it determines that the
C Bool constraint inferred for twiceFalse is not satisfiable given
the instances that are in scope in N;
- but Haskell accepts the modified program that we get by moving the
definition of twiceFalse into N, even though the type of twice in
N has a constraint (C a) that is not satisfiable in that module
(because there are no instances of C in scope in N).
The decision to accept the code in the second case, of course, is not
arbitrary: it's a fairly common idiom to define "generic" operators
that work on all instances of a given class, even if there are no
concrete instances of the class in scope at that point.
> 3. Instances have universal scope. Only one instance for a particular
> class over a particular set of types may exist in an entire program.
Haskell works that way:
"A type may not be declared as an instance of a particular class
more than once in the program." (Haskell Report, Section 4.3.2, p52)
A compiler typically enforces this somewhat indirectly as a result of
the "viral" nature of instance declarations. If a Main program contains
two modules that define "instance C T" for some class C and type T,
then there will be some module M in the program in which both instances
are visible (M could be Main, for example). The type checker will fail
to type check M and the compiler will not compile the program until one
of the instance definitions has been removed.
The Haskell approach is consistent with the view of type classes as a
mechanism for supporting *overloading* in which, for any given type T,
there is *at most one* implementation of the operations in class C.
From a Haskell perspective, the idea is that a class declaration and
a set of instances together define a *single* function whose meaning
is uniquely determined by the type at which it is instantiated. As such,
it doesn't make sense to have multiple instances of a class at a single
type because then type information alone would not be sufficient to
determine which one should be used at a particular point in the program.
There have been calls for Haskell to allow local definitions
(or redefinitions) of type class instances. For example, given a
library function sort :: (Ord a) => [a] -> [a], some have suggested
that it would be handy if you could use different instances of Ord Int
in different parts of a program so that data could be sorted in
correspondingly different ways. Others have argued that the inclusion
of such a function in a standard library is poor design, and that the
library should instead provide a sort function that is parameterized
on the choice of comparison operator:
sortBy :: (a -> a -> Bool) -> [a] -> [a]
Give this, we can then define sort = sortBy (<=) as the generic sort
function to be used when the (uniquely defined, overloaded) <= operator
is good enough, and we can resort to using the parameterized version
explicitly when we need something more. This gives the best of both
worlds except that it is specific to sort.
This topic has been debated on and off for many years, but Haskell has
stuck to the original notion of type classes as a means for supporting
overloading rather than a device for flexible, implicit parameterization.
There have also been some proposals that aim for a middle ground by, for
example, providing special syntax to indicate when a type-based decision
about a choice of instance should be overridden. To date, however, none
of these has been widely adopted. I'm not sure, for example, if any have
been tested in practical implementations. If you're looking for a
solution that is "tried and tested", the current Haskell approach is
probably your only choice. If you're looking to explore some different
parts of the design space, there's plenty of territory left to explore.
All the best,
Mark
More information about the bitc-dev
mailing list