[bitc-dev] initial interest
David Hopwood
david.nospam.hopwood at blueyonder.co.uk
Sat Dec 10 10:44:38 EST 2005
Jonathan S. Shapiro wrote:
> A bunch of issues are getting mixed up here:
>
> 1. What is the size of "char"
> 2. What is the internal-to-memory representation of strings?
> 3. What is the *external* representation of strings during
> serialization.
>
> Answers:
>
> 1. Char *must* be 32 bits, because char needs to be able to
> represent all code points. I chose very explicitly NOT to
> support a legacy character type, because it will invite
> people to write code badly.
>
> 2. String internal representation is not specified, but the
> plan is to use either UTF8 or some variation of the ICU
> strategy.
>
> 3. External representation of strings is UTF32.
UTF-32 is an extremely inefficient encoding, whether used internally
or externally. The general trend in protocol design is to use UTF-8
externally.
--
David Hopwood <david.nospam.hopwood at blueyonder.co.uk>
More information about the bitc-dev
mailing list