[bitc-dev] String encoding, again
Jonathan S. Shapiro
shap at eros-os.org
Wed Mar 23 23:45:37 PDT 2011
On Wed, Mar 23, 2011 at 11:27 PM, Ben Kloosterman <bklooste at gmail.com>wrote:
> Allows string to be programmer friendly and Ustring for indexing
> performance. Why define the views ? Some string implementations may not
> support direct indexing ie only vs higher levels or is GetUString8() ,
> GetUString16() or GetUString32() regarded as a view ?
My thought was that one might call String.GetUCS1(ndx), String.GetUCS2(ndx),
and so forth, and that there would be suitable conversion functions to
extract a UString from a String.
There clearly need to be a bunch of converters and accessors on both String
and UString, and I haven't yet taken the time to work those out.
> Question why not define Ustring as a vector /array?
The constant-time access constraint effectively means that I did. But I'm
still reserving the possibility that vectors may need to be chunked in order
to support real-time collection. Also, vectors are mutable where strings are
not. All that being said, I suspect that a common representation of String
and UString will be Vector<UCS2> and Vector<UCS4> respectively, with String
using a UTF16 encoding.
String should wrap or at least support Ustring
Actually, String should *not* wrap UString. Having an identical underlying
representation and encoding is an intentionally permissible approach, but
the working assumption here is that String has an underlying representation
that typically will not match UString. It is not a wrapper object.
> The vector is used for interop...
The intent is that String is used for interop. The definition given is
weasily enough that almost any underlying encoding can be jiggered into
working, which leaves us free to use .Net strings directly (for example).
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the bitc-dev