[coyotos-dev] Interpreter techniques
Jonathan S. Shapiro
shap at eros-os.com
Wed Feb 7 20:28:16 CST 2007
This may be a well-known trick, but we didn't know about it. Credit for
the core idea goes to Jonathan Adams.
In one kind of interpreter, execution proceeds by first building an AST
and the having the interpreter walk the AST. Interp() proceeds in
recursive-descent form (ignoring procedure calls) with each call to
Interp() producing some object of tagged union type (perhaps named
Value) giving the result. Our original implementation of Interp() in
mkimage returned a GCPtr<Value>.
In the current mkimage, you can see the Value object at:
http://www.opencm.org/opencm/~/PUBLIC/coyotos/DEV/ccs/1100/nmkimage/Value.hxx
A difficulty with this approach is dealing with assignment. To handle
assignment in this way, you generally need to introduce some sort of
Value type that denotes locations of values rather than the values
themselves. Once this is introduced, you discover that you now need to
implicitly dereference these when they appear in an r-value position in
order to obtain the current value at that location. Moderately awkward,
and even more so when you notice that there is no straightforward way in
C to express the location of a bitfield.
Today, we introduced the following trick into the mkimage interpreter:
Interp() returns a structure that is a pair. The pair consists of two
pointers "rval" and "setter". The "rval" is a pointer to the Value
object that was previously being returned directly. The interesting part
is the setter field.
Setter is a pointer to some instance of Setter, which contains a single
method set(GCPtr<Value>). In essence, the setter instance encapsulates a
location, but by using a set() method it is able to deal sensibly with
bitfields. The idea here is that there are multiple subclasses of
Setter, and each is closed over the necessary state to implement setting
on whatever the target field is. In our case, the reason for multiple
setters is that we are manipulating some native object fields directly.
In this scheme, the behavior of assignment is to simply assign via the
Setter object. Expressions that return l-values return their value in
the rval slot and the corresponding location in the setter slot.
Expressions that do NOT return l-values return UnitSetter in the setter
slot (which fails on all assignments).
You can see this scheme in action by looking at:
http://www.opencm.org/opencm/~/PUBLIC/coyotos/DEV/ccs/1100/nmkimage/Interp.cxx
the corresponding .hxx, and Setter.[ch]xx in that same directory.
It isn't elegant or fast in execution terms, but it was a very quick
evolution relative to the original interpreter, and it's a scheme that
works very well in a scripting context where get/set on funny native
fields may be required.
shap
More information about the coyotos-dev
mailing list