[bitc-dev] Encoding of string literals
Dominique Quatravaux
dom at kilimandjaro.dyndns.org
Fri May 19 05:53:38 EDT 2006
Jonathan S. Shapiro wrote:
>The difficulty with this encoding is that there is no portable way in C
>to write a literal initializer for it.
>
Surely you mean a *human-readable* literal initializer? afaict one can
definitely initialize an arbitrary string of bytes or 32-bit longs in C
in a portable fashion.
> This drove us to copy the strings
>at run time.
>
Let this slow thinker get this straight. You mean that the generated C
code has, say, UTF-8 strings in it that are converted into UTF-32 at
compiled-program startup time?
In another mail you wrote:
>It appears (to me) that the only safe encoding
>if we care about EBCDIC-based C compilers is to emit *everything* using
>octal escapes, and perhaps emit comments for the sake of the human
>reader.
>
>For character literals this will work fine, but for string literals it
>is a complete nuisance.
>
I cannot (yet?) see what is wrong with this approach. As an aid towards
legibility of the intermediate code (which need not even be a design
goal imho, but oh well) you could stash all the UTF-32 string literals
as a kind of symbol table at the bottom of the emitted C file, e.g.
(sorry for my pidgin C):
static const STRING_T
literal_number_2_from_bitc_source_file_at_line_312; // "Beyonc\x{E9}"
...
static void func343() {
STRING_T *string1 =
&literal_number_2_from_bitc_source_file_at_line_312; // "Beyonc\x{E9}"
}
...
/** Here starts the table of all string literals in use in
bitc_source_file **/
static const STRING_T
literal_number_2_from_bitc_source_file_at_line_312 =
// This is supposed to represent "Beyonc\x{E9}" in UTF-32 encoding
{ 0x42L, .....
}
--
<< Tout n'y est pas parfait, mais on y honore certainement les jardiniers >>
Dominique Quatravaux <dom at kilimandjaro.dyndns.org>
More information about the bitc-dev
mailing list