[bitc-dev] Semantics of unboxed
Mark Miller
markm at cs.jhu.edu
Thu Oct 21 07:56:04 EDT 2004
EricNorthup wrote:
> MarkM wrote:
>>Jonathan S. Shapiro wrote:
>>>If this is really correct, then we can omit all consideration of
>>>"boxedness" from the language semantics...
>>
>>Until the language has side effects, EQ, or infinite rational trees
>>(cyclic containment). Obviously, it would be strange to support
>>unboxed cyclic containment.
>
> Can you elaborate on how side-effects muddle the picture?
I should have said "local side effects", and it depends on the semantics of
the constructs that introduce the side effects. I'm not sure of the semantics
you guys have in mind, so I'll illustrate the semantics I was assuming with a
C++/Java-like-notation example. For explicitness, I will qualify all variable
declarations with either "final" (not directly assignable to, as in Java) or
"var" (directly assignable to).
struct Foo {
final int a;
var int b;
};
final Foo foo1 = { 3, 5 }; // foo1.a is 3, foo1.b is 5
final Foo foo2 = foo1; // foo2.a is 3, foo2.b is 5
foo1.b := 6 // foo1.b is now 6
If foo2.b now 5 or 6? If we think of the above code in C++ terms, then Foo is
an unboxed type, the assignment copied the struct, and foo2.b is still 5.
If we change "struct" to "class" and think of the above code in Java terms,
the initialization of foo1 allocated a new Foo, but foo2 was initialized to
share (by pointing) this same Foo structure. foo2.b would then be 6.
If we wish to preserve the more Java/Smalltalk/ML/Scheme-like semantics of
boxed structures, and view the unboxed annotations as an optimization, I
suggest that it should be a static error to declare the above Foo object
unboxed, since it contains an assignable field. unboxed should only be
applicable to types all of whose direct (one level) fields are final. So let's
declare a similar Bar structure:
struct Bar {
final int a;
final int b;
};
var Bar bar1 = { 3, 5 }; // bar1.a is 3, bar1.b is 5
final Bar bar2 = bar1; // bar2.a is 3, bar2.b is 5
In order to support the efficient updating of unboxed structs with a semantics
explainable in terms of boxed objects, if we had a derivation operator like
the "but" operator of
<http://www.agorics.com/Library/agoricpapers/ie/ieappendix.html>, so that the
expression (making up a truly horrible syntax)
bar1 but .b is 6
evaluates to a Bar object just like bar1, but with its .b field set to 6; and
if we interpret
bar1 but= .b is 6
to mean
bar1 := bar1 but .b is 6
then, whether Bar is boxed or unboxed, afterwards bar1.b is 6 and bar2.b is
still 5. Implementationally, the unboxed implementation copied a Bar when it
initialized bar2, and the "assignment" was then able to update bar1
efficiently in place. The boxed implementation copied a Bar structure (into a
newly heap-allocated Bar) during the assignment. Both implement the same
semantics.
Even for the boxed implementation, one could imagine an implementation that
could detect statically that, at the time of the assignment, bar1 was the only
remaining live reference to this Bar structure, in which case it could still
destructively update it in place, since afterwards the old Bar would no longer
be reachable. Various languages (like Vijay's "Linear Janus") have employed
such optimizations to good effect. ACL2's stobj stuff can be seen as a special
case of this.
> And what you mean by "EQ" ?
Classically, "pointer equality", though we need a description in terms of the
program's semantics rather than an assumed set of implementation techniques.
(Lisps and even Scheme try to split this hair even finer, and have half a
dozen different equality constructs to mix different degrees of
semantics-based vs implementation-based considerations into the definition of
the construct. I consider this to be an unmitigated disaster.)
My favorite interpretation of EQ is in terms of "creation identity": It's as
if each object, on creation, were given an implicit extra final field which
was a unique, comparable, and unforgeable identity token. (In an
implementation of heap allocated objects, the object's memory address is
typically taken to encode this implicit field.) So we could then interpret
EQ(foo1,foo2)
as
identityEQ(getIdentity(foo1), getIdentity(foo2))
where identityEQ is defined only over these identity objects. Note that
creation identity is somewhere between purely functional and imperative: it is
side effect free but not transparently memo-izable. For example, in Java
new Bar(3,5) == new Bar(3,5)
must return false, even though the two Bar objects are forever identical in
all other respects, because they have different creation identities.
*If* we wish to strictly preserve the semantics of a boxed system for any
program we accept, then I propose that declaring Foo to be unboxed causes it
to lack such an identity field, in which case EQ(foo1,foo2) would be
statically rejected. (I do *not* suggest adding an extra creation identity
field to unboxed types in order to simulate the EQ behavior of boxed types,
but see the discussion of "unboxed Selfish" below.)
Another option is to not have any operation in the language that obtains or
tests the creation identity of anything. If you don't have a desperate need
for such a primitive, this is the cleanest option.
Yet another choice is to have no EQ primitive in the language, but instead
introduce Henry Baker's similar EGAL primitive
<http://home.pipeline.com/~hbaker1/ObjectIdentity.html>. This is approximately
what E does. In E terminology, objects are either Selfish (carry an implicit
creation identity used for comparison, normally implemented according to the
object's address) or Selfless (compare by a recursive comparison of their
contents). In a C-like notation, given
Bar* bar1Ptr = ...;
Bar* bar2Ptr = ....;
EGAL(bar1Ptr, bar2Ptr) would mean approximately
if (Bar is a Selfless type) {
*bar1Ptr == *bar2Ptr
} else if (Bar is a Selfish type) {
bar1Ptr == bar2Ptr
} else {
error
}
In the BitC context, we should assume the above type dispatch happens statically.
If unboxed implicitly implies Selfless, this would mean that the unboxed
annotation by itself changes the semantics of a program. For BitC, I suggest
instead that all types by default be neither Selfish nor Selfless, but rather
that both are introduced by explicit annotation that is considered to effect
the program's semantics. By the above definition, EGAL of unannotated types
would be a static error. We could either make it a static error to declare a
type both unboxed and Selfish, or we could implement unboxed-Selfish by adding
an extra final creation-identity field to the implementation of such
structures (to simulate pointer equality behavior).
> I took what Shap wrote to mean "the langauge semantics can describe all
> containment relationships as being 'boxed' -- and thus programmers don't
> specify boxed/unboxed -- but the compiler is free to unbox things when
> it is safe to do so." And not a statement that "everything can be unboxed".
My points above regard only types explicitly marked unboxed by the programmer.
Of course, I agree that internal optimizations of unannotated types must not
effect the semantics of a program. Scheme, Lisp, and Smalltalk each provide
the same wonderfully horrible example of what not to do. Given that i and j
are both integers, in Scheme or Lisp implementations
(EQ (+ i j) (+ i j))
or in Smalltalk implementations
(i + j) == (i + j)
will give different responses depending on whether the sum is big (implicitly
boxed) or small (implicitly unboxed). The languages are specified to allow but
not require this horrible behavior, leaving the language spec
non-deterministic, which is even worse.
Do we wish BitC to eventually become an object-capability language? I'm *not*
recommending this, unless we wish to become a language lab *instead of* an OS
lab, which I'm also not recommending. I've already got an object-capability
language. I need an object-cap OS more than I need yet another object-cap
language.
If BitC does wish to grow into an object-cap language, we should recognize
that EQ or EGAL is much like the identity comparison provided by KeyKOS's
DISCRIM, and we should read
<http://www.erights.org/elib/equality/grant-matcher/> before proceeding.
--
Text by me above is hereby placed in the public domain
Cheers,
--MarkM
More information about the bitc-dev
mailing list