[bitc-dev] initial interest

Jonathan S. Shapiro shap at eros-os.org
Sat Dec 10 15:27:43 EST 2005


On Sat, 2005-12-10 at 15:44 +0000, David Hopwood wrote:
> Jonathan S. Shapiro wrote:
> > A bunch of issues are getting mixed up here:
> > 
> >   1. What is the size of "char"
> >   2. What is the internal-to-memory representation of strings?
> >   3. What is the *external* representation of strings during
> >      serialization.
> > 
> > Answers:
> > 
> >   1. Char *must* be 32 bits, because char needs to be able to
> >      represent all code points. I chose very explicitly NOT to
> >      support a legacy character type, because it will invite
> >      people to write code badly.
> > 
> >   2. String internal representation is not specified, but the
> >      plan is to use either UTF8 or some variation of the ICU
> >      strategy.
> > 
> >   3. External representation of strings is UTF32.
> 
> UTF-32 is an extremely inefficient encoding, whether used internally
> or externally. The general trend in protocol design is to use UTF-8
> externally.

Yes. Excuse me. That was a typo! External string representation is
UTF-8. Internal representation of **individual characters** is UTF-32.



More information about the bitc-dev mailing list