[bitc-dev] Libraries and separate compilation

Jonathan S. Shapiro shap at eros-os.com
Thu Mar 27 15:23:25 EDT 2008


I have been thinking about separate compilation, trying to decide what
to do about it. The current compiler supports a fully static compile,
and it provides a form of "tree shaking" as a side effect of its compile
methodology. But that is a whole-program strategy, and it will not
really work for dynamic libraries.

BitC does not really support separate compilation. We can check a single
unit of compilation up to the point where we know that the rest of the
compile will proceed without error, but there are two problems with
separate compilation:

  1. Multiple units of compilation may instantiate the same procedure.
     If the linker does not support link-once semantics, and the whole
     program is not code generated at once, this can lead to code space
     explosion.

     It MAY be possible to handle this using gnu.linkonce sections in
     binutils ld, but support for link-once behavior seems to be
     incomplete even in that linker.

  2. At link time, we need to cross-check that incompatible typeclass
     instantiations are not present.

     Note that violations of this rule are the only errors that can
     arise at link time. All other errors are generated when the unit
     of compilation is compiled.

For dynamic libraries, the problem is a bit more complicated. Any
polymorphic procedure or type that can (transitively) be instantiated
through the dynamic library interface may be instantiated by the user
with types that were not visible when the dynamic library was compiled.
In some cases we may actually know enough to do all possible
instantiations, but in others we can't.

There seem to be four possible resolutions to this:

  1. Include the ASTs (or equivalent) for that stuff in the
     dynamic library. Generate the code at dynamic load time
     or at run time.

     Issue: run-time subsystem complexity.
     Issue: LLVM is written in C++, so it would be difficult to ensure
            that the runtime is safe. On the other hand, there really
            doesn't appear to be any better alternative.

  2. Include the ASTs (or equivalent) in the static archive
     library that accompanies the dynamic library. Generate
     any missing code at static link time or (if the linker
     supports link-once semantics) at module compile time.

     Issue: This has an unintended consequence: any instance that
            is pre-instantiated by the library now becomes part of
            the link layer interface contract, because the runtime
            system cannot instantiate it at run time. If a future
            version of the dynamic library fails to instantiate that
            version, dynamic-link-time symbol resolution will fail.

     Issue: This approach leads to problems of version drift. If the
            implementation in the library changes, the application
            binary will contain versions from a library with a different
            version from the current library. Depending on the rationale
            for the change, this can be a source of bugs.

            For this reason, I think that this is not a good approach.

  3. Use a compilation strategy in which all argument sizes are
     fully parameterized. This is likely to yield low performance.

  4. Compile down to a native-code "template" in which sizes and
     external references remain unresolved. Implement a low-level
     run-time instantiator for these templates.

My current inclination is to say that we should run all of the compiler
correctness checks when we compile each unit of compilation, but we
should then emit an "object file" that contains the resulting AST rather
than object code. We should then perform code generation at link time.

More precisely: we should emit object code at compile wherever we know
that it will later be needed at link time, but we should also emit into
a separate section any ASTs that might require further expansion at
static or dynamic link time.  We should then implement a bitc linker
that "thins" the emitted code (implementing link-once) before passing
the result on to the native linker.


shap



More information about the bitc-dev mailing list