[bitc-dev] Semantics of unboxed

Mark Miller markm at cs.jhu.edu
Thu Oct 21 07:56:04 EDT 2004


EricNorthup wrote:
> MarkM wrote:
>>Jonathan S. Shapiro wrote:
>>>If this is really correct, then we can omit all consideration of
>>>"boxedness" from the language semantics...
>>
>>Until the language has side effects, EQ, or infinite rational trees
>>(cyclic containment). Obviously, it would be strange to support 
 >>unboxed cyclic containment.
> 
> Can you elaborate on how side-effects muddle the picture?

I should have said "local side effects", and it depends on the semantics of 
the constructs that introduce the side effects. I'm not sure of the semantics 
you guys have in mind, so I'll illustrate the semantics I was assuming with a 
C++/Java-like-notation example. For explicitness, I will qualify all variable 
declarations with either "final" (not directly assignable to, as in Java) or 
"var" (directly assignable to).

struct Foo {
     final int a;
     var int b;
};

final Foo foo1 = { 3, 5 }; 	// foo1.a is 3, foo1.b is 5

final Foo foo2 = foo1; 		// foo2.a is 3, foo2.b is 5

foo1.b := 6			// foo1.b is now 6

If foo2.b now 5 or 6? If we think of the above code in C++ terms, then Foo is 
an unboxed type, the assignment copied the struct, and foo2.b is still 5.

If we change "struct" to "class" and think of the above code in Java terms, 
the initialization of foo1 allocated a new Foo, but foo2 was initialized to 
share (by pointing) this same Foo structure. foo2.b would then be 6.

If we wish to preserve the more Java/Smalltalk/ML/Scheme-like semantics of 
boxed structures, and view the unboxed annotations as an optimization, I 
suggest that it should be a static error to declare the above Foo object 
unboxed, since it contains an assignable field. unboxed should only be 
applicable to types all of whose direct (one level) fields are final. So let's 
declare a similar Bar structure:

struct Bar {
     final int a;
     final int b;
};

var Bar bar1 = { 3, 5 }; 	// bar1.a is 3, bar1.b is 5

final Bar bar2 = bar1; 		// bar2.a is 3, bar2.b is 5

In order to support the efficient updating of unboxed structs with a semantics 
explainable in terms of boxed objects, if we had a derivation operator like 
the "but" operator of 
<http://www.agorics.com/Library/agoricpapers/ie/ieappendix.html>, so that the 
expression (making up a truly horrible syntax)

     bar1 but .b is 6

evaluates to a Bar object just like bar1, but with its .b field set to 6; and 
if we interpret

     bar1 but= .b is 6

to mean

     bar1 := bar1 but .b is 6

then, whether Bar is boxed or unboxed, afterwards bar1.b is 6 and bar2.b is 
still 5. Implementationally, the unboxed implementation copied a Bar when it 
initialized bar2, and the "assignment" was then able to update bar1 
efficiently in place. The boxed implementation copied a Bar structure (into a 
newly heap-allocated Bar) during the assignment. Both implement the same 
semantics.

Even for the boxed implementation, one could imagine an implementation that 
could detect statically that, at the time of the assignment, bar1 was the only 
  remaining live reference to this Bar structure, in which case it could still 
destructively update it in place, since afterwards the old Bar would no longer 
be reachable. Various languages (like Vijay's "Linear Janus") have employed 
such optimizations to good effect. ACL2's stobj stuff can be seen as a special 
case of this.


> And what you mean by "EQ" ?

Classically, "pointer equality", though we need a description in terms of the 
program's semantics rather than an assumed set of implementation techniques. 
(Lisps and even Scheme try to split this hair even finer, and have half a 
dozen different equality constructs to mix different degrees of 
semantics-based vs implementation-based considerations into the definition of 
the construct. I consider this to be an unmitigated disaster.)

My favorite interpretation of EQ is in terms of "creation identity": It's as 
if each object, on creation, were given an implicit extra final field which 
was a unique, comparable, and unforgeable identity token. (In an 
implementation of heap allocated objects, the object's memory address is 
typically taken to encode this implicit field.) So we could then interpret

     EQ(foo1,foo2)

as

     identityEQ(getIdentity(foo1), getIdentity(foo2))

where identityEQ is defined only over these identity objects. Note that 
creation identity is somewhere between purely functional and imperative: it is 
side effect free but not transparently memo-izable. For example, in Java

     new Bar(3,5) == new Bar(3,5)

must return false, even though the two Bar objects are forever identical in 
all other respects, because they have different creation identities.

*If* we wish to strictly preserve the semantics of a boxed system for any 
program we accept, then I propose that declaring Foo to be unboxed causes it 
to lack such an identity field, in which case EQ(foo1,foo2) would be 
statically rejected. (I do *not* suggest adding an extra creation identity 
field to unboxed types in order to simulate the EQ behavior of boxed types, 
but see the discussion of "unboxed Selfish" below.)

Another option is to not have any operation in the language that obtains or 
tests the creation identity of anything. If you don't have a desperate need 
for such a primitive, this is the cleanest option.

Yet another choice is to have no EQ primitive in the language, but instead 
introduce Henry Baker's similar EGAL primitive 
<http://home.pipeline.com/~hbaker1/ObjectIdentity.html>. This is approximately 
what E does. In E terminology, objects are either Selfish (carry an implicit 
creation identity used for comparison, normally implemented according to the 
object's address) or Selfless (compare by a recursive comparison of their 
contents). In a C-like notation, given

     Bar* bar1Ptr = ...;
     Bar* bar2Ptr = ....;

EGAL(bar1Ptr, bar2Ptr) would mean approximately

     if (Bar is a Selfless type) {
         *bar1Ptr == *bar2Ptr
     } else if (Bar is a Selfish type) {
         bar1Ptr == bar2Ptr
     } else {
         error
     }

In the BitC context, we should assume the above type dispatch happens statically.

If unboxed implicitly implies Selfless, this would mean that the unboxed 
annotation by itself changes the semantics of a program. For BitC, I suggest 
instead that all types by default be neither Selfish nor Selfless, but rather 
that both are introduced by explicit annotation that is considered to effect 
the program's semantics. By the above definition, EGAL of unannotated types 
would be a static error. We could either make it a static error to declare a 
type both unboxed and Selfish, or we could implement unboxed-Selfish by adding 
an extra final creation-identity field to the implementation of such 
structures (to simulate pointer equality behavior).


> I took what Shap wrote to mean "the langauge semantics can describe all
> containment relationships as being 'boxed' -- and thus programmers don't
> specify boxed/unboxed -- but the compiler is free to unbox things when
> it is safe to do so."  And not a statement that "everything can be unboxed".

My points above regard only types explicitly marked unboxed by the programmer. 
Of course, I agree that internal optimizations of unannotated types must not 
effect the semantics of a program. Scheme, Lisp, and Smalltalk each provide 
the same wonderfully horrible example of what not to do. Given that i and j 
are both integers, in Scheme or Lisp implementations

     (EQ (+ i j) (+ i j))

or in Smalltalk implementations

      (i + j) == (i + j)

will give different responses depending on whether the sum is big (implicitly 
boxed) or small (implicitly unboxed). The languages are specified to allow but 
not require this horrible behavior, leaving the language spec 
non-deterministic, which is even worse.


Do we wish BitC to eventually become an object-capability language? I'm *not* 
recommending this, unless we wish to become a language lab *instead of* an OS 
lab, which I'm also not recommending. I've already got an object-capability 
language. I need an object-cap OS more than I need yet another object-cap 
language.

If BitC does wish to grow into an object-cap language, we should recognize 
that EQ or EGAL is much like the identity comparison provided by KeyKOS's 
DISCRIM, and we should read 
<http://www.erights.org/elib/equality/grant-matcher/> before proceeding.

-- 
Text by me above is hereby placed in the public domain

     Cheers,
     --MarkM



More information about the bitc-dev mailing list