[coyotos-dev] Fixing the kernel heap size

Jonathan S. Shapiro shap at eros-os.com
Thu Dec 20 14:16:57 EST 2007


After discussion with JWA and some thought, some conclusions about
Godfrey's "use larger pages" idea. Summary: I think that it works.

First, to elaborate, what Godfrey was proposing is the following:

  1. Observe that device page frames are a very peculiar special
     case, generally mapped by hardware-aware clients, and usually
     presented at the physical bus layer at relative large alignment
     boundaries (at least in comparison to hardware page size) and
     relatively large and contiguous amounts.

  2. Observe that Coyotos imposes a *least* hardware page size, but
     has provision -- not currently used -- in the architecture for
     larger software-defined page sizes. These can be used to ensure
     large hardware page mappings in some cases, but that is orthogonal
     to the discussion at hand.

  3. Observe that we generally know by prior knowledge about the
     physical motherboard the total *number* N of slots where a board
     might be inserted. The source of pressure on the kernel heap lies
     in the fact that we don't have a useful upper bound in the number
     of hardware pages that each board can present at the bus. PCI video
     cards currently exist that present memory maps from as little as
     64 Kbytes to as much as 750 Mbytes.

  4. Observe that when large memory boards are presented, they generally
     don't have some weird number of frames. If your board exports a big
     memory, it will be measured in multiples of megabytes, not
     multiples of kilobytes. That is, there won't be some small amount
     of memory at the end of some quasi-random size.

  5. Godfrey's Proposal: require that device memory be presented to the
     OS layer in units of larger soft-pages, thereby reducing the number
     of StructPage structures required to manage them.

     My refinement: impose a requirement that a given board must export
     its memory using at most K page frames to express the memory being
     exported. This merely re-frames Godfrey's proposal to let us place
     a bound on K.

Okay, so what are the issues in implementing this?

  1. Support (in the sense of machine independent mapping traversal
     support) for this pages already exists.

  2. These pages are non-persistent, so we do not need to consider
     any concerns about fragmentation in the persistent store.

  3. We require that all capabilities to a given page frame
     be of a single size to avoid some aliasing-related problems.
     That is not difficult, but it probably requires us to add an
     l2g field to the StructPage structure. We currently have
     room to do that with no change in structure size.

  4. At the device range layer, we must now associate an l2g requirement
     for each device memory range. This is certainly not a problem.

  5. It is possible to establish a 4 Kbyte local window or background
     window that overlays some inner sub-page of a large page. The
     only bit of this that is tricky is to ensure that we build hardware
     mappings for the correct sub-page region. Some code is needed in
     the machine dependent translation logic to deal with this, but
     all of the needed information is currently being generated by
     the MI space walker.

     Note that this is actually good. It means that we can simulate the
     smaller page size when this is desired for reasons of application
     layer mapping alignment requirements that may be imposed by
     compatibility issues.

  6. At the (application level) device mapper level, when we advise
     the kernel about the existence of a device range, we need to make
     a good policy decision about how to set the page size. If there
     is a hardware-provided large page size that we can use, that is
     a good choice. This is not difficult.

Picking a number completely out of my ass, I propose that we begin by
trying k=8 or k=16. For a modern PC, you're looking at 8 slots or less,
so this imposes a total load of (64 bytes * 8 slots * 16 StructPages) or
8 KILOBYTES (total). That's small enough that we should just pre-reserve
it and declare victory. Heck, lets be generous and go to k=64 for 32
kilobytes. Memory is cheap. Let's splurge! :-)

Now, if you have one of those multiple PCI bus multiprocessors, you're
obviously going to have to tune N upwards, but if all five owners of
those machines have to make customer support calls I can live with the
cost of that.

Does anybody see a fatal flaw here?

Does anybody feel that 8Kbytes is too much to spend? :-)


shap



More information about the coyotos-dev mailing list