[coyotos-dev] Scratch Pages, Fault on First Reference

Jonathan S. Shapiro shap at eros-os.org
Fri Jan 13 22:25:07 EST 2006


I left this off of my list by accident, but it is by far the easiest to
explain.

Consider a movie player. It is a relatively small application that
manipulates a relatively large frame buffer cache. While in use, the
frames of this cache will likely be pinned.

Note that the bits in this cache are "ephemeral". If needed, they can be
regenerated by reprocessing the video stream. If the machine crashes for
some reason there is really no reason to restore them -- we have lost
our real time movie delivery in any case. In consequence, there is
absolutely no reason to checkpoint these bits.

There is a similar problem in password handling -- in situations where
we will have an in-memory clear-text password, it is not desirable for
that page to be written to the disk (helps avoid disk forensics).

The initial proposal was for the application to have a way to say "I am
designating the following data page(s) as exempt from checkpoint." On
reflection, I believe that this should be specified at page allocation
time:

	spacebank->allocScratchPage() => pgCap

This allocates a disk page in the usual way, which (also in the usual
way) is initially zero. In contrast to other pages, this page has an
attribute bit stating that its content should be forgotten when writing
it back to disk. That is, there is no requirement that the state of this
page be recoverable -- on the contrary, the requirement is that any
changes to its state must NOT be recoverable.

At checkpoint time, such pages are simply ignored altogether. They do
not become part of the snapshot image.

When the ager reclaims a scratch page, it is treated as though it had
been read-only, and is not written to disk -- not even to the checkpoint
area.

A scratch page can be destroyed using the usual mechanisms.



The scratch page itself isn't hard to explain, but it has some
interesting consequences if it isn't pinned.

Consider an application that has some large piece of state that can be
efficiently regenerated. The idea here is that you'ld like to hang on to
it in the absence of memory pressure, but it's low priority to keep it
in memory. The application uses scratch pages for this purpose, and
simply does not pin them.

Suppose further that your luck runs out, and sure enough the ager whacks
one of these frames. The problem now is that there is some object in the
cache that is at least partially corrupted (in the sense that the bits
are gone), but the application doesn't get any notice of this.

One possible mechanism for handling this is "fault on first reference".
There is always some reference that constructs the first valid page
table entry for the scratch page. When this entry is constructed, a
"first reference fault" (FRF) is delivered to the referencing process.
The FRF fault is a new class of page fault. It advises the application
that the state of this page has been dropped since it was last
referenced. How this should be handled is left to the application
runtime to decide.


At the moment, it is unclear to me whether "fault on first reference"
should be a property of a page or a protection-like attribute of a
mapping. I think this will turn out to be related to the PinSet/FrameSet
discussion which will follow shortly.


shap



More information about the coyotos-dev mailing list