[bitc-dev] Addition of DEFREPR
Jonathan S. Shapiro
shap at eros-os.org
Sun May 21 11:21:29 EDT 2006
On Sun, 2006-05-21 at 09:38 -0400, Swaroop Sridhar wrote:
> Jonathan S. Shapiro wrote:
> > I would like your opinion on whether this can be made to work, and how
> > much effort it would take. I'm sure it will change before we implement
> > it, so I wouldn't go rushing ahead. I'm particularly interested in two
> > parts:
> >
> > How much of a pain in the ass are the constructors going to be
> > from the compiler perspective?
>
> In all passes other than the code generator, the defrepr is can be
> treated as the following transformation:
>
> (defrepr dr
> ((tag swizzled unswizzled)
> (tag left right)
> (case (swizzled (i:int32 j:int32))
> (unswizzled (b:bool)))
> (invariant d:double)
> (case ((swizzled left) (k:int32)))))
>
> ||
> V
>
> (defunion dr
> (dr-swizzled-right i:int32 j:int32 d:double)
> (dr-swizzled-left i:int32 j:int32 d:double k:int32)
> (dr-unswizzled-right b:bool d:double)
> (dr-unswizzled-left b:bool d:double))
>
> The constructor applications will also work naturally:
>
> (dr swizzled right 5 6 2.0 7)
>
> ||
> V
>
> (dr-swizzled-right 5 6 2.0 7)
Small nit: the separator must be something other than '-', because
dr-swizzled-right is a legal identifier. One solution might be to use:
__con-dr-swizzled-right
but I seem to remember that we had to deal with this once before...
More generally, I like your idea for a different reason. If we add a
mechanism to DEFSTRUCT to describe "fill" regions (which we need to do
anyway), then you can use the same internal renamings for the anonymous
structure types.
>
> So, *probably* the right thing to do is to transform -- a copy of --
> defreprs into defunions (or defstructs if there are only invariants) and
> let the compiler alone...
I think that the following is a (small) counterexample to what you say:
(defrepr dr
((tag swizzled unswizzled)
d:double
(tag left right)
(case (swizzled (i:int32 j:int32))
(unswizzled (b:bool)))
(case ((swizzled left) (k:int32)))))
[Note that the 'd' field has moved up between the tags. Also note that
the (invariant ...) form was an error in the spec. Such forms have no
special annotation.]
Because in this case the constructor would be written:
(dr swizzled 2.0 right 5 6 7)
However, this may not actually be important, because the ordering
between the tags and the values is independent. That is, the left to
right sequencing of tag-ids must be preserved, and the left to right
sequencing of values must be preserved, but (at least conceptually) we
*could* decide that the following two constructor calls are equivalent
in meaning to the one I gave above:
(dr swizzled right 2.0 5 6 7)
(dr 2.0 5 6 7 swizzled right)
However, I think this can only be rationalized if we define the tags to
be some sort of label (i.e something other than expressions), because it
violates the usual rules for application.
The other possibility that occurs to me is that the tag field can be
treated as an invariant enumeration field whose enumeration elements are
the labels. These labels are promoted to the declaring scope (which I do
not like, but that is a completely separate issue). At that point, we
can treat 'swizzled' and 'right' as normal values, and we would get:
(__con_dr-swizzled-right swizzled 2.0 right 5 6 7)
and all fields would type check properly at construction time.
> Only the code generator will have to emit the
> structure/union and constructors correctly. This may seem like an
> explosion of constructors...
I am not greatly concerned about that, since the overwhelming majority
of these will inline successfully.
> , but it will hopefully be contained in practice.
>
> > How much of a pain in the ass are the anonymous stucture types (see
> > reprcase) going to be?
>
> By anonymous I think you mean a gen-sym structure.
I mean a structure that has no type name as far as the user is
concerned. Indeed the compiler will need to assign it an internally
generated name.
> In BitC all
> structures must have a name because structure types are unified based on
> their names. In order to achieve this, probably the right thing to do is
> to change the above transformation as:
>
> (defunion dr
> (dr-swizzled-right (__dr-swizzled-right i:int32 j:int32 d:double))
> (dr-swizzled-left (__dr-swizzled-left i:int32 j:int32 d:double
> k:int32))
> (dr-unswizzled-right (__dr-unswizzled-right b:bool d:double))
> (dr-unswizzled-left (__dr-unswizzled-left b:bool d:double)))
>
> where each of the __dr... forms are structures whose definitions the
> compiler will add prior to the defrepr.
For typing purposes this seems fine (again with the caveat that we need
to deal with FILL, but that is minor). For code generation purposes this
is definitely not correct, because it causes the tag field to move, and
the tag field is part of the layout.
Here is another possible way to handle this (I am simply thinking out
loud here):
1. Adopt the view that the tags are arity-0 enumeration constructors, as
I described above.
2. Assign field names to the tags, so my example above would now change
to:
(defrepr dr
(swizzled: (tag yes no)
d:double
direction: (tag left right)
(case ((swizzled yes) (i:int32 j:int32))
((swizzled no)(b:bool)))
(case ((or (swizzled yes)
(direction left)) (k:int32)))))
[Yes. Syntax is horrible and needs refinement. There is a hidden
simplification here, which is that we can very easily go from this to:
swizzled (tag bool)
and just do explicit value matching. I don't actually thing that 'TAG'
should be part of a type, but let us stick with this for the
discussion.]
3. Take your strategy, but drop the wrapping defunion. Instead, convert
them into:
(defstruct __con_dr-swizzled-right
swizzled:__tag_dr_swizzled
d:double
direction:__tag_dr_direction
i:int32 j:int32))
[I'm only doing your first example, but I'm sure you get the idea]
4. Now introduce a special rule into the type checker stating that if
"dr" is a DEFREPR type and "__con_dr_xxxxx" is a structure type, then
dr <- __con_dr_xxxxx
is a legal assignment/binding (but not the other way).
If I got all of that right (and I probably made a mistake somewhere)
this still seems to keep the internal damage in the compiler fairly well
contained, but it keeps all of the tags in the right places.
> So,
> (reprcase e ((id swizzled right id.d))
> (otherwise 0.0))
>
> ||
> V
>
> (reprcase e ((id:__dr-swizzled-right swizzled right id.d))
> (otherwise 0.0))
>
> As you have noted in the spec, we may do this for all unions as well.
Not quite, but I figured out this morning how to deal with the last
complication, and it becomes very easy if the tag slots have names.
Given a DEFREPR of the form:
(defrepr something
...
tagname : (tag id1 id2)
(case ((tagname id1) ...)
...)
...)
we can add a declaration of some sort indicating that the tag can be
Cardelified. Perhaps:
(defrepr something
...
tagname : (tag id1 id2)
(declare (cardelli tagname))
(case ((tagname id1) ...)
...)
...)
[That is the first time I can think of a proper name being turned into a
keyword. I suspect Luca would be horrified. :-)]
This applies only to the CASE form that positionally follows the tag.
> > also, your opinion on what scope the case-id tags should have would be
> > appreciated.
> In the above scheme, the case tags are scoped within the defrepr,
> because they are always mentioned along with the defrepr's name. So, it
> is as if we wrote these tags as dr-swizzled-right, etc, which are
> guaranteed to be unique at top level. Right?
I also thought that this could be made to work, but it has strange
consequences. Consider:
(let ((yes 1))
(dr yes 2.0 left 5 6))
where the intention is that this will turn into:
(__con_dr-swizzled-right __tag_dr-swizzled-yes 2.0
__tag-dr-direction-left 5 6)
but it is not clear why the original "yes" tag is not hidden here. I
believe that this is essentially the same ambiguity that arises for
constructors in applicative position elsewhere in the type checker.
Can you articulate a clear-cut story for this, or if not, should we
consider making the initializer labels bound in the defining scope.
My intuition is that we want to go for types on the tags and value-based
comparison. This has the effect of making the tag initializers
positional in the constructors. I do not have any strong feeling about
this, except to say that I suspect this is more a matter of usability
and "least surprise" than it is a fundamental complication.
> Is the rule:
> > A case form may make reference to any tagid that is mentioned in any
> > lexically preceding tag form within the same defrepr
>
> necessary? Conceptually, the compiler can process all tag forms first,
> so that all legs can make use all tags. I think, in the case of GDT
> entries, some fields come before the tag.
Hmm. Now that is an *excellent* question, and I need to think about
that. It follows from the ability to flatten to structures that you hvae
identified, and that I had completely failed to see. Even if we decide
to do named fields for tags, we can adopt the rule that they "hoist to
the left in preorder" for purpose of forming constructor arguments.
However, I think that we must impose the requirement that a CASE form
can only reference a tag that appears in a lexically containing BODY
form. Otherwise it will not be possible to do case-dispatch reliably in
REPRCASE.
I will now disclose an ugly secret: the omission of ':' from identifiers
was intentional. It allows us to do labeled arguments for constructors.
Also for procedures with only a minor change in the (fn...) syntax.
> The specification does not say that the tag field must be placed at the
> location where it is declared, although I think this is the intention.
That is definitely the intention -- indeed the requirement.
> We may have to include a tag-type for each tag declaration.
Agreed.
> Also, the
> bits of a single tag may be broken into two parts and stored at
> different locations within the structure (gate descriptor?) we may have
> to make provisions for this also.
No. This can be captured by breaking the tag into two parts and using
nested CASE declarations.
However, I would also consider allowing AND and OR in the case-matching
form, which has the hidden advantage of simplifying the emission of the
case-checking code.
> In the definition of defrepr body,
>
> > (tag tagid1 ... tagidn)
> > nm[:ty]
> > (case (tagid (body))
> > ...
> > ((tagid1 ... tagidj) (body))
> > ...
> > (tagid (body))
>
> Is the nm supposed to be an invariant field name? If so, I guess it
> should be written as:
>
> (invariant nm:type)
>
> where the type is not optional.
Yes, it is an invariant. The INVARIANT keyword in the example was a
mistake.
The type is optional or non-optional exactly as it is for DEFSTRUCT. I
followed the defstruct grammar here, but it is possible that the
specification is stale in this regard. I think what happens with
defstruct is that we assign an unmatchable type, which is also okay
here.
shap
More information about the bitc-dev
mailing list