[bitc-dev] Problem: mutable literals
Jonathan S. Shapiro
shap at eros-os.org
Fri May 19 01:06:01 EDT 2006
At the moment, we have a problem in the specification concerning the
string type: it is entirely missing! A discussion with Mark Miller leads
me to realize that we may have another lurking problem with sequence
initializers more generally. This in turn led to a discussion with
Swaroop that made me question whether we may not have a problem with
mutable literals in general.
Believe it or not, what follows has implications for string literal
representation!
First, consider:
(let ((x:(mutable int32) 1)) ...)
This is okay because the LHS is a *copy* of the RHS.
But the following may not be okay:
(let ((v (vector (the (mutable int32) 1))) ...)
The problem is that THE is a type constraint, not a copy operator. This
raises the question: can literals be given a mutable type? If not, how
do we initialize vectors of mutable element type? If so, how should we
understand initializers for reference types having mutable content?
For integers, characters, and other value types, it doesn't really hurt
anything to say "yes", because the re-instantiation is free. In all of
these cases, we could interpret (for example) the literal #\c as an
application that returns a value. Each appearance produces a new copy of
#\c.
But for reference types, things become a bit more complex. When we see:
(let ((v (vector 1 2 3))) ...)
our intuition is that every invocation will create a new vector. Note,
however, that this is unnecessary. In this case, we are producing a
"pure" vector (one with immutable content), and it would be conceptually
reasonable for the compiler to partially evaluate the constructor and
return the same vector at each invocation.
Assuming we allow mutable literals, this is NOT an acceptable
optimization for:
(let ((v (vector (the (mutable int) 1) 2 3))) ...)
if we partially evaluate this construction (and therefore share the
vector), the procedure ceases to be pure -- it is as if the procedure is
closed over a mutable variable.
For vectors of mutable elements, I think it is very clear that we must
run the constructor each time. It is already possible to express
captured sharing by writing something like:
(define f (let ((captured-v (vector ...))) (lambda ...)))
however, if we allow constructors over mutable elements to be partially
evaluated it becomes impossible to write pure functions that have
certain kinds of internal mutability.
HOWEVER: the partial evaluation of *immutable* content is actually
important. When we see:
(let ((v (vector (the int 1) 2 3))) ...)
is the compiler obligated to allocate a new vector on each entry?
It is tempting to imagine that no operator will reveal the partial
evaluation, but consider:
(define v-save:(mutable (vector int)) (vector 1))
(define (f)
(let* ((v (vector 1 2 3))
(result (eq? v v-save)))
(set! v-save v)
result))
If partial evaluation is permitted, the first invocation of F will
return #f, but the second will return #t. However, in the minor
alternative:
(define v-save:(mutable (vector (mutable int))) (vector 1))
(define (f)
(let* ((v (vector (the (mutable int) 1) 2 3))
(result (eq? v v-save)))
(set! v-save v)
result))
no invocation of F will ever return #t.
For vectors, this doesn't really look all that exciting, but now
consider:
(define (f) (let ((x "abc")) x))
do we really want to mandate that a new string be allocated on each
entry to this procedure? Conversely, do we wish to mandate that it must
not? If we choose to re-use the string, are we doing it because string
literals are somehow different, or is this a more general property of
expressions?
If we permit reuse, is it restricted to the same occurrence of the
literal? What is the result of:
(let ((s1 "a")
(s2 "a")) (eq? s1 s2))
What about if they appear in separate files?
Here are the options that I can see:
1. Special case strings. Strings have constant members (no string-set!),
and behave similarly to bignums. Strings are NOT a reference type.
2. Specify that an implementation MAY, but is not required, to partially
evaluate any constructor returning a reference type whose content is
deeply immutable.
3. More generally, specify that an implementation is entitled to do this
for any expression evaluation generally, provided the content of the
returned value is deeply immutable.
4. State that an implementation MUST do (2) for initialization forms.
5. State that an implementation MUST do (2 and 3) for initialization
forms -- this may be a pragmatic requirement for programs that are
obliged to run in constant space.
My inclination is to permit the compiler to partially evaluate these
cases, but not (for the moment) to require it unless compiling for a
static-space target application.
Note this means that for two objects of type (ref 'a) allocated by the
same lexical occurrence of an initializer, where 'a is deeply immutable,
it is not defined whether EQ returns #f or #t. Where 'a is NOT deeply
immutable, it is guaranteed that EQ returns #f.
Okay, so why does all of this matter for strings?
If strings are a funny vector-like sequence type over char, then there
is no STRING-SET!, and the pressure to choose a representation in order
to deal with constant-space requirements mostly goes away. It is still
possible to introduce a mutable string-like type using type classes, but
string literals contain constant characters.
Okay. At this point I should stop and solicit reactions.
shap
More information about the bitc-dev
mailing list