[bitc-dev] initial interest
Jonathan S. Shapiro
shap at eros-os.org
Sat Dec 10 15:27:43 EST 2005
On Sat, 2005-12-10 at 15:44 +0000, David Hopwood wrote:
> Jonathan S. Shapiro wrote:
> > A bunch of issues are getting mixed up here:
> >
> > 1. What is the size of "char"
> > 2. What is the internal-to-memory representation of strings?
> > 3. What is the *external* representation of strings during
> > serialization.
> >
> > Answers:
> >
> > 1. Char *must* be 32 bits, because char needs to be able to
> > represent all code points. I chose very explicitly NOT to
> > support a legacy character type, because it will invite
> > people to write code badly.
> >
> > 2. String internal representation is not specified, but the
> > plan is to use either UTF8 or some variation of the ICU
> > strategy.
> >
> > 3. External representation of strings is UTF32.
>
> UTF-32 is an extremely inefficient encoding, whether used internally
> or externally. The general trend in protocol design is to use UTF-8
> externally.
Yes. Excuse me. That was a typo! External string representation is
UTF-8. Internal representation of **individual characters** is UTF-32.
More information about the bitc-dev
mailing list