[bitc-dev] Choice of runtime/VM
Ben Kloosterman
bklooste at gmail.com
Thu Jul 29 02:34:51 PDT 2010
There is a question and a (mistaken, or at least incomplete) assumption
about sourced of performance loss in safe languages. I don't want to take
this religious war up at the moment, so let me make some general assertions:
* In the current BitC benchmarks, which are probably not predictive,
the major losses are on I/O and bounds checks. The I/O issue has a
workaround in ArrayRef, but safe I/O tends to involve more copies than
unsafe I/O. The bounds check issue can be largely removed when we introduce
dependent vector bounds in BitC v2. In the interim there is a bunch we can
do that will improve it when we do a for-real code generator.
* GC is not, on average, a source of performance overhead.
Unfortunately, there is no GC that is good for all classes of applications,
and there are pathological examples for any given GC. That's a real problem.
I don't have any silver bullets. MMKit is promising, because at least it
offers a common framework for code generation.
. JIT compilation is a nice thing to have, but it entails a warm-up cost,
and a lot of applications exit before they warm up. What this mostly tells
us is that we need to look harder at byte-code design. Byte code that is
easy to emit (i.e. stack machine code) does not necessarily offer a
low-latency compile strategy offering decent performance.
re the summary , I would agree the bounds check is > the GC . I would not
say its insignificant if you include the GC mark , sweep and compaction
costs ( which a lot of benchmarks leave out) last I looked ( which may be
out of date) it varies between 1%-20% norm about 4% that is not
insignificant and GC pauses can affect the total time. That said concurrent
GCs should remove the majority of this cost.
Jit or AOT doesn't really matter , JIT can do better in lining and bounce
check elimination but AOT can do better compilation , horses for courses
here. .
>GC This partly depends on the choice of target environment. CLR more or
less defines the GC for us. There are at least two precise collectors on
LLVM, and either the Harmony GCv5 collector or the dis collector might be
adaptable for our purposes.
What is worth discussing though is the write barrier .. used in most
generational GCs. The cost of this is increasing the cost of memory heap
writes but it allows a Nursery ( fast small objects) , Collection
generations , not visiting all heap pages and significantly lower memory
bandwidth. The impact is on a lot of micro benches using continuous write
performance will be much worse but the fast nursery often makes allocations
faster and in most programs the additional writes can be moved around ( just
not in benchmarks) . You may correctly say micro benches like Sieve don't
count but people sadly still rate languages on them.
Target Still not decided. Native code is important to us, so probably not
CLR first (though that's a good exercise). LLVM most likely, or possibly a
port of the Go back end. I'ld actually prefer the Go back end, but it
doesn't currently support read barriers, write barriers, or GC root
tagging/preservation. In spite of these deficiencies, the Go back-end (a
derivative of the Plan9 back-end) is in C, which would make it much easier
to "lift" it into BitC. I do know that Rob et al would be interested in
adding support for these things to the Go back end, and would probably take
a well-crafted patch back into the origin tree.
Precise collection can also be obtained by other means without help from the
back end, most notably by means of the Henderson techniques.
The situation for LLVM has evolved considerably in the past year; matters
may not be as bad now as Ben is assuming.
Not sure why native is an issue as mono can output native code, though you
can't with the Windows CLR.
I need to decide between LLVM and mono also .
My issues with LLVM are
- No hookins for concurrent GCs ( nor do they intend to last I
heard on their board - they want more people to use GCs before committing
though things are beginning to move ) . This is the most recent
http://llvm.org/releases/2.5/docs/GarbageCollection.html#feature
- They are compiler focused
- No Runtime
The fact that Mono now supports LLVM means you get
- Mono Runtime which ensure fewer managed /native calls. If I just
wrap Clibs performance is poor.
- Mono GC
- Introduce easy LLVM stuff.
The fact the guys at Mono (and MS) have spent years on GCs which slowly
evolve makes me suspect it's a much harder problem bolting in a GC ,though
with the hookins it seems simple enough ( seems is a dangerous word) ..2.8
Mono with improvements in bounds checking and a Gen GC ( 3 years in
development now) is the first mono I would even consider.. The performance
difference between the MS compacting and generational GC from .NET 1.0 ( 10
years ago) and Mono Boehm and Monos own ( slightly better than Boehm) is
quite significant ( keenly awaiting the first 2.8 benchmarks with the new
GC) .
The Library The BitC library is a really interesting question. We (CS as a
field) have very little experience with languages that want to mix pure and
imperative programming, so the right structure for the library isn't
immediately clear. The early BitC applications from our perspective are
mostly "zero footprint" applications. I'm starting to toy with building
pieces to self-host the compiler, but we're going to figure this out as we
go along. I look forward to exploring the ride with all of you.
So I have no real answer for you, except that the BitC library is going to
grow in the same way that every other library in history has grown. First we
screw around with the language for a while. Then we stop, establish a
consensus about what seems to work, and attempt an orderly borrowing of good
ideas from elsewhere. Then we'll under undergo ad hoc growth just like
everyone else. If we get the architecture decisions close to right in the
consensus phase we'll survive. If not, we won't.
Yes I was considering the same but I kind of feel unless you have a decent
runtime it's hard to get tractions and decent means web calls , tcp ,
threading, web apps , GUI apps , XML etc which is a monumental task and it
is not so simple as creating native wrappers , Languages like OCaml have
evolved there runtime over many years . The 2 common successful GC
languages to date C# and Java both have extensive defauit libraries when
launched and these libs have little native interface also having a good
large base lib has to date helped these languages and they avoid the C/C++
lib mess.
If you go with Mono you could do something like F# and get an extensive
Runtime complete with native interfaces , but also your own runtime even if
you use LLVM as the compiler ( and you can easily change or obsolete bits
using partial classes) BUT you need to be very careful what to support and
what to deprecate. .else if the language is successful you also get some
abysmal features since the .NET lib has some good design but also bad bits
where native wrappers were done for convenience.
Anyway I better stop distracting you, at least you know we are paying
attention J
Regards,
Ben
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.coyotos.org/pipermail/bitc-dev/attachments/20100729/672d2c85/attachment-0001.html
More information about the bitc-dev
mailing list