[coyotos-dev] The joys of hardware bugs

Jonathan S. Shapiro shap at eros-os.com
Sat Sep 22 22:47:33 EDT 2007


Over the past three days I have been wrestling with a fairly obscure
bug. Here was the scenario:

Through stupidity, we were not zeroing the page directory for kernel
space. The page table was fully initialized, as was the PDPT (where
applicable), but the page directory was not. Even so, every entry that
we actually *used* in the page directory was getting set up properly.
Unused entries could contain garbage bits. This is obviously a bug, but
given that we never actually traversed those uninitialized entries the
behavior was surprising.

Under QEMU and Bochs things worked fine when legacy mapping was used.
When PAE mapping was enabled, then under obscure conditions, QEMU would
issue an exception through vector 0xf. Note this is a reserved exception
vector. Note further that the exception was not delivered until
interrupts were first enabled. Definitely an emulation bug. Bochs did
not complain. Shouldn't ever happen!

Real hardware in the office would reboot the minute the PAE mappings
went live, and would do random display weirdness when the legacy mapping
system was used (the same page directory page is used in both cases, and
we failed to zero it in both cases).

Real hardware at home would run quite a while, and then would assertion
check, complaining that the 'user accessable' bit was set in an entry.
This was what finally allowed me to trace the problem down.

Now the truly strange part of all this is that every entry we were
actually *using* was being initialized. So the question is: why should
the hardware object?

The answer: for PAE I was failing to zero the upper word of the page
directory entry for the first kernel page table. Under the emulators, it
chanced that the region of memory in question held zeros. On the real
machines it held junk.

But I *still* don't know why I got weirdness for legacy mode, where no
such error was occurring. It's "fixed" now that the entire page
directory is properly initialized, but the best one can say is that the
behavior was decidedly weird.


shap



More information about the coyotos-dev mailing list