[bitc-dev] Representation and memory model

Jonathan S. Shapiro shap at eros-os.org
Mon Aug 29 15:39:04 EDT 2005

The subject line of Constantine's note note was badly chosen. Please
note that I have changed it, and please try to use the new subject line.

I want to try to provide a partial response to Constantine.

On Sun, 2005-08-28 at 18:41 +0700, Constantine Plotnikov wrote:
> I hope that that the section will answer the following questions:
> - What are different memory classes supported? (so far single heap and 
> per thread stack as I understand, but there might be other memory 
> classes like constant pool)

There is a single heap. There is no language-supported concurrency, so
there is one stack and zero threads.

The language specification does not require, suggest, or imply the
creation of a constant pool. The compiler is free to unify identical
literal constants, but is not required to do so.

> - When stack and heap memory are alloacted? (list of memory allocation 
> primitives possibly with just references to other sections of the 
> specification)

This is a tricky question. My view is that we *must* specify this for
the "manual storage" operating mode, but that we should not specify this
for the "managed storage" mode.

(1) The specification does not require a stack. Procedure frames are
allocated on entry to the procedure. In implementations using a stack,
this corresponds to allocation of stack space.

(2) Heap allocation is performed by any expression that applies a
constructor to arguments IF the return type of that constructor is a
reference type.

Note the phrase "... to arguments ...". Even for reference types, no
heap allocation is performed for arity zero constructions. That is,
there is only one instance of NIL for a given list type.

(3) Finally, heap allocation may be performed (in effect) at static
compile time as a consequence of type specialization (for arity zero
constructors), and procedure specialization. HOWEVER, when compiling for
use with managed storage, the compiler is free to defer these
allocations to run time.

I should note that I have come to the conclusion that we need to think
through three modes of execution:

  1. Explicit: the only runtime storage allocation arises from cases
     (1, 2) above, which are explicitly under user control.

  2. Non-collecting: case (3) is permitted, but only when the resulting
     storage is not collectible. This would permit runtime procedure
     instantiation and runtime dictionary construction.

  3. Collecting -- cases (1,2,3) are permitted.

The reason we need to consider the "non-collecting" case is to force us
to think through some issues; I don't intend to implement this.

Note that "explicit" mode is definitely going to require whole program

> - When and how memory is deallocated? (spec seems to assume garbage 
> collector, but it is not spelled out explicitly)

In explicit mode: when you say so.
In collecting mode: when we feel like it, and the spec will not say (at
this time).

However: I've already built a simple mark/sweep allocator, and it
appears to be feasible to build several storage managers that can use
mostly the same code implementation. More precisely, there is an
optimization threshold below which this remains feasible.

> - Any rules for referecing memory locations? For example, could value at 
> botton of stack refer to value at top of stack (with boundary cases like 
> letrec)?

There is no mechanism in the language that would permit this. We briefly
contemplated a RESTRICTED type modifier that would allow upward
references into the stack. It was a mess, and we took it out.

> - What is cannonical mapping from values to bytes?

This is actually specified already, with some exceptions:

1. The size of the WORD type is unspecified
2. Byte order is machine dependent
3. Layout of stack frames is unspecified -- in practice it is not useful
   to specify this, since the actual stack frame size and construction
   rely very heavily on the degree of optimization.

> Parts of this information is spreaded across the specification, other 
> seems to be assumed. I would like to see such information in one place.

Yes. Part of my hesitation to specify too early is that I want to know
that we can implement the requirements.

Also: you missed one issue that is really important: when are
stack-based pointers nullified?

It is very easy to have retained garbage resulting from stack frame
pointers that are no longer in scope but have not been explicitly
cleared. This is further complicated in BitC by the fact that NULL is
not always a legal pointer value (compare REF to OPTION), so there are
cases where the nullification MUST be inserted by the compiler. Matters
are still further complicated by the fact that the compiler may do
temporary consolidation, which makes deciding when to nullify
challenging for the user to get right.

There is a conceptually straightforward resolution to this, but we
haven't implemented it yet.

More information about the bitc-dev mailing list