[coyotos-dev] First Class Messages

Jonathan S. Shapiro shap at eros-os.org
Thu Jan 19 00:18:02 EST 2006


This note describes a new possible messaging abstraction for Coyotos:
first-class receive blocks (FCRBs). It has some basis in scheduler
activation events as in nemesis, and also some similarity to the
first-class messages of Hydra.


WHAT IS A FIRST CLASS RECEIVE BLOCK?

An FCRB is a user-allocated, kernel-managed structure that is bound to a
particular process. An FCRB can be named by a capability. The effect of
invoking this capability is to deliver [*] the FCRB to it's bound
receiving process [by issuing an activation]. An FCRB can also be placed
on a kernel queue. If an in-kernel event occurs that causes the members
of the queue to be awakened, all FCRBs that are currently queued on that
queue are delivered to their respective processes. If an FCRB is
enqueued and also has an outstanding sender capability, invoking the
sender capability causes the FCRB to become dequeued.

  [*] Subject to some possible restrictions, details below.

Note that our current focus in FCRBs is reliable asynchronous message
delivery. We are not currently considering Psyche-like or Nemesis-like
thread scheduling. We think that can be done, but it will require
further thought beyond what I am describing here.

An FCRB carries the following application-manipulated state:

  1. A process capability to the bound process, set at
     initialization time by the receiver.
  2. A one-word "event type", set at initialization time. The three
     least significant bits of the event type are reserved with the
     following meanings:

        Bit 0: Indicates that the FCRB is a "one shot" FCRB.
        Bit 1: Indicates that no PP match is required.
        Bit 2: Indicates whether the FCRB denotes a long or short
               receive block.
  3. A protected payload, which may be altered using the
     FCM's control capability.
  4. [Possibly] R words of sender-provided payload.
     The value of R is architecturally fixed, probably 2 or 4.
  
  In addition, the following state exists that is manipulated only
  by the kernel:

  5. The allocation count (same as in EROS, KeyKOS, CapROS)
  6. A capability identifying the object, if any, that the FCM
     is currently blocked on. In the internal representation this
     may be interchanged with a queue link pointer chain.

The following behavioral description is conceptual. In the common case,
it can be optimized down to something very efficient by eliminating most
of these steps.

When an FCRB capability is invoked, the following things occur:

If the "PP match" field is set, the PP of the invoked capability is
compared to the PP stored in the FCRB. If these do not match the
sender's capability behaves as if an invalid capability had been
invoked.

The PP value of the invoked send capability is now copied to the FCRB PP
slot.

Next, the "short message" bit is examined. If this is a short message,
the first R data register arguments provided by the sender are stored to
the sender payload slots.

Next, the FCRB PP field is updated as follows:

  PP := PP | (evt_type & 1)

That is, the least bit of the PP is forced to 1 if the FCRB is a "one
shot" receive buffer. This ensures that attempted retransmissions are
discarded, but requires that the application use only even PP values and
set the "match required" bit on the FCRB.

Immediate delivery is then attempted. Immediate delivery is possible if
the bound receive process is currently in the "activations enabled"
state. The receiver PC/SP is saved in the usual way for activations. In
addition, R+2 other registers are saved. These registers are then made
to hold the protected payload value, the event type, and the
sender-provided payload. Finally the activation handler PC/SP pair is
loaded, the receiving process is switched to the "activations disabled"
state, and the receiving process is dispatched. Once the activation
handler entry point is dispatched, the event has been delivered.

If the receiving process is in the "activations disabled" state at the
time of firing, the FCRB is placed into the global ready queue. When it
comes up for scheduling, the dispatcher will again attempt redelivery.
This will continue until the FCRB is delivered successfully.

   [The details of this need refinement. There should be a way to
    queue the FCRB on the receiving process without losing prompt
    delivery if the receiver wants it. Still need to work through the
    details. If the FCRB goes on the ready queue, then it probably
    needs a schedule capability slot.]

In order to invoke the sender's capability to an FCRB, the allocation
count of the sender capability must match the allocation count of the
FCRB (else the capability is invalid). In addition, the protected
payload of the sender capability must match the protected payload of the
FCRB (else the invocation is silently discarded). The effect of such
invocation is to "fire" the FCRB as previously described. Multiple
firings between deliveries are collapsed into one received message at
the receiver. If an FCRB is fired a second time, any sender-provided
payload associated with the second firing is discarded.


The mechanism described to this point has two primary uses:

  1. Delivery of one-shot preemptive messages with very small payloads.
     This use-case may not be motivated, given the additions I will
     make below.

  2. Delivery of repeatable preemptive messages, with the caveat
     that multiple firings may be collapsed (comparable to UNIX
     signals).

Note that, with care, FCRBs can be reused without reallocation. Short
messages do not make provision for any reply capability.

For those who remember ancient email trails, FCRBs are similar in some
ways to the "small message" activity structure variation that was
contemplated at one point in EROS, with the corrections that FCRBs are
first-class data structures, are allocated using space banks, and are
intended to be paid for from the resources of the receiver.

Here come the interesting parts....

LONG MESSAGES

If the message type is "long message", the value of the message type
field is interpreted as a pointer value (as though the least three bits
were zeroed). This pointer names the virtual address relative to the
bound receive process's address space of an "extended receive
descriptor". Readers familiar with L4 should think of these as the
"receive items" and "receive virtual registers" of the L4 UTCB, with the
important distinction that there can be multiple outstanding receive
UTCBs.

EROS and KeyKOS readers won't know what the hell I'm talking about with
these UTCBs. The "extended receive descriptor" is the location where the
address and size of the received data string are defined, and also the
location where any incoming data registers beyond the N provided in the
event structure are stored. The design deviates from EROS/KeyKOS
slightly: the term "registers" here is software defined. The OS
architecture defines a fixed number of "virtual registers". Of these,
the first N (as described for event messages above) will be delivered to
the receiving process in hardware registers (and are also *sent* from
hardware registers).

I provisionally believe that this implementation can be nearly as fast
as a synchronous IPC implementation.

I shall describe various use cases below.

RECEIVE QUEUES

A receive queue is a queue named by a capability. The controlling
capability permits FCRBs to be enqueued on the message queue. Invoking
the sender capability proceeds by selecting an (implementation
determined) FCRB from the queue and then proceeding as though this FCRB
had been directly invoked by the sender. *however*, for this purpose it
is the protected payload of the receive queue send capability that is
compared to the stored protected payload in the FCRB.

If no FCRB is enqueued, and the sender is willing to block, the sender
blocks on the receive queue until an FCRB is enqueued.

The receive queue subsumes the endpoints of the previous design, but is
purely stateless.

The protocol for invoking a receive queue capability is identical to
that for invoking an FCRB, with the noted exception of how the PP value
is handled.


THE "DISPATCH" MESSAGE TYPE

The FCRB mechanism allows multiple FCRBs to name the same bound receiver
process. It is therefore possible to use an FCRB to record the fact that
a process has been placed on a ready queue (or even more than one).
Depending on security considerations, this may or may not want to cause
an activation, but it can be used to dispatch the process from the
scheduler.

For checkpointing purposes, it follows that the list of enqueued FCRBs
replaces the running process list of KeyKOS/EROS.


CAPABILITY INVOCATION PRIMITIVE:

Given FCRBs, the capability invocation mechanism needs to be revised.
The send phase is nearly unchanged:

  [Blocking] [Replyable] SEND invoked-cap
       R<=N<=MAX data regs, 
       C<=M<=MAX caps
       PP-VALUE

If the [Replyable] control bit is set, the first capability argument is
overwritten by the kernel with a send capability to the receive FCRB
with the specified PP-VALUE in the capability's protected payload field.

The receive phase description is now an ASYNCHRONOUS receive phase:

  ASYNC-RECEIVE on FCRB-OWNER-CAP with EVENT-TYPE

with the interpretation that the previously given PP-VALUE will be
stored into the FCRB, and the specified EVENT-TYPE will likewise be
stored into the FCRB. In the general case, the EVENT-TYPE will actually
be the address of the extended receive descriptor plus some control
bits.

Finally, there is an optional bit indicating whether the invoker wishes
to immediately continue execution or wishes to cease executing
instructions until the next incoming activation:

  [Run]


USE-CASES:

Start Capabilities:

A KeyKOS/EROS start capability is captured by an FCRB with an arbitrary
PP-VALUE. The FCRB send capability holds a protected payload that
corresponds to the KeyKOS "data byte" or the EROS "key info" field. The
corresponding FCRB control bits are set to "not one shot", "no match",
and "long message".

Resume Capabilities:

A KeyKOS/EROS resume capability is captured by an FCRB with an even PP
value. The FCRB send capability must hold a matching PP value. The FCRB
control bits are set to "one shot", "match", and "long message". Whether
an invocation of this capability is prompt is determined by the
"blocking" control bit of the invocation.

This mechanism relies on the application to increment the PP-VALUE by
two when performing the CALL-like operation (described next).

CALL operation:

The application must maintain a user-mode value for use as the resume
capability PP. This value starts at 0 and is incremented by 2 on every
call. If this value overflows, the FCRB must be severed and a new FCRB
acquired. Ignoring this case, the CALL operation is simulated as
follows:

  Increment ResumePP by 2
  Perform a blocking replyable send, supplying the incremented ResumePP
  Perform an AsyncReceive on a supplied FCRB control capability
     specifying a long-message event type.
  !Run

RETURN operation:

  Perform a non-blocking, non-replyable send
  Perform an async receive on the incoming request FCRB control
    capability, specifying a long-message event type.
  !Run

FORK/SEND operation:

  Perform a blocking, non-replyable send
  Do not perform an asynchronous receive
  Run


Okay. I know that is bloody complicated. I'ld really like people to give
it a close look, because we are considering this pretty seriously. I am
*certain* that I have missed many details. One is that I've omitted any
discussion of where the send payload goes. The answer is "registers and
sender stack". Probably some others. At the moment I want to know if the
basic concept is sound or what is wrong with it.

Particular thanks to Mothy Roscoe for making me buy him beers at various
conferences while he has patiently described Nemesis to me.


shap



More information about the coyotos-dev mailing list