[bitc-dev] Box, Unbox, and Ref

Jonathan S. Shapiro shap at eros-os.org
Mon Feb 14 22:30:49 EST 2005


Scott, Swaroop, and I had a lengthy conversation today about the
operators BOX and UNBOX. It started with the following question:

  BOX is currently defined to return a boxed reference to a copy
  of its input.

  Does UNBOX need to copy its input, or can it return a reference
  in-place?

Not surprisingly, the conversation became somewhat involved. Let me see
if I can go through the issues:


** BOX

The signature of the BOX operation is

	box: (fn ('T) -> (box 'T))

That is, it accepts an arbitrary type 'T and returns an object whose
type is the corresponding boxed type. Note that the type 'T may already
be boxed. BOX is therefore a constructor of boxes around arbitrary
types.

The input argument *may* be on the stack. This introduces the following
choice of requirements:

  EITHER

    We must restrict the propagation of the reference to ensure
    that the reference does not outlive the unboxed object

  OR

    We must copy the object into the heap to ensure that the
    reference *cannot* outlive the object.

For BOX, we chose the second path. BOX returns a boxed reference to a
COPY of its input.

** UNBOX

The signature of the UNBOX operation is

	unbox: (fn ('T') -> (unbox 'T))

That is, the type 'T must either be intrinsically a boxed type or it
must be the result of a BOX operation.

The issue of copy behavior for UNBOX is subtle. Somewhat surprisingly,
the UNBOX operation does NOT need to copy its return value. This is
because argument passing, return values, and assignment all have copy
semantics. The expression

  (let ((ux (unbox x)))
    ... body ...)

rewrites to

  ((lambda (ux) ... body ...) (unbox x))

and the act of passing the argument guarantees the copy.

The other issue we need to be concerned about is capture. That is, is it
okay when we execute something like:

  (let ((ux (unbox x)))
    (lambda (arg) ...body...))

Note that the lambda escapes as a return value and is closed over an
unboxed variable. Here again, though, the unboxed reference is to a
stack-allocated object in the closure, which is perfectly okay -- the
LET operation has made a copy by virtue of the argument pass-by-value
rule.


** Non-Inverse Property

It is NOT the case (and not intended to be the case) that

	(unbox (box x)) == x

It is also not true that:

	(box (unbox x)) == x

Both are false because BOX is a copying operation.

** Missing case

Unfortunately, there is a bad missing case in all of the above. Consider
the situation where X is a reference to a reference value with a mutable
field 'a', such as a structure. The operations described above mean that

  (set! (unbox x).a 5)

and

  (let ((ux (unbox x)))
    (set! ux.a 5))

do not have the same effect. The second case performs a by-value copy in
the let binding, and the mutation is performed on the copy. This is, at
best, counter-intuitive.

This suggests that we need to consider introducing a notion of "by-
reference" parameter passing. The difficulty is that we would like to
have a mutable structure that can be passed to a manipulating function,
where mutations done in the receiving function are visible after exit.

One way to do this is to say that the type of a lambda argument can be
marked "by reference":

  (lambda (x : (by-reference (unbox x)))
    ... body ...)

This would apply equally to any non-core form that is rewritten into a
lambda. It is NOT permitted to declare a non-parameter type as BY-
REFERENCE.

If a parameter or local is BY-REFERENCE, the restriction is that it
cannot be closed over by a lambda that is returned or is passed as a
procedure parameter. Strictly speaking this restriction is too strong,
but it is statically checkable and I want to make sure that the
restriction can be statically checked. The purpose of this restriction
is to ensure that the reference created by the BY-REFERENCE cannot
outlive its unboxed object.

Note that since assignment has copy semantics, it is *okay* to assign a
by-reference value to a not-by-reference value, and it is okay to return
a by-reference value because return values similarly have copy
semantics.

If we introduce this, my personal view is that transparency should be
reserved. A mutable argument should not be passed by reference without
explicit identification at the call sight. Therefore, given the
procedure definition

  (define (f (x : (by-reference T)) ...body...)

the call site should also indicate that the argument is intended to be
passed by reference. This can be either by directly passing a type that
is already a by-reference type, or by explicitly wrapping the argument,
as in:

  (f (by-reference some-arg))

In practice, BY-REFERENCE is very inconvenient to type, and we will
probably abbreviate it as REF.


One way to think about BY-REFERENCE is that it is like BOX except that
it does not copy it's input. It is nearly truefollows that

  (unbox (by-reference x)) === x
but not
  (by-reference (unbox x)) === x

the reason for the latter is that the output of

  (by-reference (unbox x))

has restrictions on capture that X alone does not have.


shap



More information about the bitc-dev mailing list