[coyotos-dev] Fixing the kernel heap size
Jonathan S. Shapiro
shap at eros-os.com
Thu Dec 20 14:16:57 EST 2007
After discussion with JWA and some thought, some conclusions about
Godfrey's "use larger pages" idea. Summary: I think that it works.
First, to elaborate, what Godfrey was proposing is the following:
1. Observe that device page frames are a very peculiar special
case, generally mapped by hardware-aware clients, and usually
presented at the physical bus layer at relative large alignment
boundaries (at least in comparison to hardware page size) and
relatively large and contiguous amounts.
2. Observe that Coyotos imposes a *least* hardware page size, but
has provision -- not currently used -- in the architecture for
larger software-defined page sizes. These can be used to ensure
large hardware page mappings in some cases, but that is orthogonal
to the discussion at hand.
3. Observe that we generally know by prior knowledge about the
physical motherboard the total *number* N of slots where a board
might be inserted. The source of pressure on the kernel heap lies
in the fact that we don't have a useful upper bound in the number
of hardware pages that each board can present at the bus. PCI video
cards currently exist that present memory maps from as little as
64 Kbytes to as much as 750 Mbytes.
4. Observe that when large memory boards are presented, they generally
don't have some weird number of frames. If your board exports a big
memory, it will be measured in multiples of megabytes, not
multiples of kilobytes. That is, there won't be some small amount
of memory at the end of some quasi-random size.
5. Godfrey's Proposal: require that device memory be presented to the
OS layer in units of larger soft-pages, thereby reducing the number
of StructPage structures required to manage them.
My refinement: impose a requirement that a given board must export
its memory using at most K page frames to express the memory being
exported. This merely re-frames Godfrey's proposal to let us place
a bound on K.
Okay, so what are the issues in implementing this?
1. Support (in the sense of machine independent mapping traversal
support) for this pages already exists.
2. These pages are non-persistent, so we do not need to consider
any concerns about fragmentation in the persistent store.
3. We require that all capabilities to a given page frame
be of a single size to avoid some aliasing-related problems.
That is not difficult, but it probably requires us to add an
l2g field to the StructPage structure. We currently have
room to do that with no change in structure size.
4. At the device range layer, we must now associate an l2g requirement
for each device memory range. This is certainly not a problem.
5. It is possible to establish a 4 Kbyte local window or background
window that overlays some inner sub-page of a large page. The
only bit of this that is tricky is to ensure that we build hardware
mappings for the correct sub-page region. Some code is needed in
the machine dependent translation logic to deal with this, but
all of the needed information is currently being generated by
the MI space walker.
Note that this is actually good. It means that we can simulate the
smaller page size when this is desired for reasons of application
layer mapping alignment requirements that may be imposed by
compatibility issues.
6. At the (application level) device mapper level, when we advise
the kernel about the existence of a device range, we need to make
a good policy decision about how to set the page size. If there
is a hardware-provided large page size that we can use, that is
a good choice. This is not difficult.
Picking a number completely out of my ass, I propose that we begin by
trying k=8 or k=16. For a modern PC, you're looking at 8 slots or less,
so this imposes a total load of (64 bytes * 8 slots * 16 StructPages) or
8 KILOBYTES (total). That's small enough that we should just pre-reserve
it and declare victory. Heck, lets be generous and go to k=64 for 32
kilobytes. Memory is cheap. Let's splurge! :-)
Now, if you have one of those multiple PCI bus multiprocessors, you're
obviously going to have to tune N upwards, but if all five owners of
those machines have to make customer support calls I can live with the
cost of that.
Does anybody see a fatal flaw here?
Does anybody feel that 8Kbytes is too much to spend? :-)
shap
More information about the coyotos-dev
mailing list