[coyotos-dev] Safe states for userspace

Jonathan S. Shapiro shap at eros-os.com
Wed Aug 1 09:57:07 EDT 2007


On Wed, 2007-08-01 at 12:04 +0200, Pierre THIERRY wrote:
> As I was presenting my paper on persistence in a workshop of ECOOP, the
> mention of orthogonal persistence triggered a question about the
> relation between checkpointing and userspace applications: the
> checkpoint is guaranteed to save only consistent states of the system,
> but is there a way for an application to declare when it has reached
> such a consistent state itself?
> 
> Else, what would happen when the application is in a inconsistent state,
> the system is checkpointed, shut down, and restarted?

In a systemwide checkpoint, there is no such thing as an inconsistent
application state. Any point where the application can be preempted is
good enough. So long as the application restarts with its state intact,
it will simply proceed on to the next instruction as usual.

Exceptions:
  Wall clock times cannot be checkpointed.
  Transactions that cross system boundaries have to be handled with
    care by applications.

Here is a uniprocessor intuition for how to think about a system wide
checkpoint:

  You preempt the current process. At this moment, any "ready" process
    is resumable.
  Instead of resuming the next ready process, you make a copy of the
    whole system memory and write it down.

The rest is refinements:

  + In reality, you make that copy of system memory lazily.
  + In reality, there is a lot of unmodified state in memory, and
    kernel state that can be reconstructed on demand. You don't write
    that down.

There is a catch: it is MUCH harder if you do not have atomic system
calls. In order to checkpoint without recording kernel state, you need
to be able to take any process that is currently in a system call and
force it to restart the system call. The atomicity is important because
you need to know that the effects of partially completed calls won't be
a problem.

> But in Coyotos, if the application crashes because of an inconsistent
> state, won't its storage be reclaimed? In which case, well, game is
> over.

Applications don't crash because of checkpointing, and no, their storage
is not automatically reclaimed.



More information about the coyotos-dev mailing list