[bitc-dev] String encoding, again

Ben Kloosterman bklooste at gmail.com
Thu Mar 24 00:12:38 PDT 2011


 

 

 

On Wed, Mar 23, 2011 at 11:27 PM, Ben Kloosterman <bklooste at gmail.com>
wrote:

Allows string to be programmer friendly  and Ustring for indexing
performance. Why define the views ? Some string implementations may not
support direct indexing ie only vs higher levels  or is GetUString8()  ,
GetUString16() or GetUString32() regarded as a view ?


My thought was that one might call String.GetUCS1(ndx), String.GetUCS2(ndx),
and so forth, and that there would be suitable conversion functions to
extract a UString from a String.

There clearly need to be a bunch of converters and accessors on both String
and UString, and I haven't yet taken the time to work those out.

 

 

To improve programmer habits , you may want to   disallow
String.GetUCS1(ndx) and only allow  getting the array.


 

Question why not define Ustring as a vector /array?


The constant-time access constraint effectively means that I did. But I'm
still reserving the possibility that vectors may need to be chunked in order
to support real-time collection. 

 

 

Why not push that GC intereaction to string , leave uString light weight. If
you really want to do it do it ..You may need to write a 64K string for
interop anyway. 

 

 

Also, vectors are mutable where strings are not. All that being said, I
suspect that a common representation of String and UString will be
Vector<UCS2> and Vector<UCS4> respectively, with String using a UTF16
encoding.

 

String should wrap or at least support Ustring


Actually, String should not wrap UString. Having an identical underlying
representation and encoding is an intentionally permissible approach, but
the working assumption here is that String has an underlying representation
that typically will not match UString. It is not a wrapper object.

 

If by wrapper you mean that it sting should not always contain 0.1 Ustring
than I agree , I would state though that a UString must be able to be
extracted from string.


 

The vector is used for interop...


The intent is that String is used for interop. The definition given is
weasily enough that almost any underlying encoding can be jiggered into
working, which leaves us free to use .Net strings directly (for example).

 

 




shap

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.coyotos.org/pipermail/bitc-dev/attachments/20110324/b1860764/attachment-0001.html 


More information about the bitc-dev mailing list