[bitc-dev] Encoding of string literals

Dominique Quatravaux dom at kilimandjaro.dyndns.org
Fri May 19 05:53:38 EDT 2006


Jonathan S. Shapiro wrote:

>The difficulty with this encoding is that there is no portable way in C
>to write a literal initializer for it.
>
Surely you mean a *human-readable* literal initializer? afaict one can
definitely initialize an arbitrary string of bytes or 32-bit longs in C
in a portable fashion.

> This drove us to copy the strings
>at run time.
>
Let this slow thinker get this straight. You mean that the generated C
code has, say, UTF-8 strings in it that are converted into UTF-32 at
compiled-program startup time?

In another mail you wrote:

>It appears (to me) that the only safe encoding
>if we care about EBCDIC-based C compilers is to emit *everything* using
>octal escapes, and perhaps emit comments for the sake of the human
>reader.
>
>For character literals this will work fine, but for string literals it
>is a complete nuisance.
>
I cannot (yet?) see what is wrong with this approach. As an aid towards
legibility of the intermediate code (which need not even be a design
goal imho, but oh well) you could stash all the UTF-32 string literals
as a kind of symbol table at the bottom of the emitted C file, e.g.
(sorry for my pidgin C):

    static const STRING_T
literal_number_2_from_bitc_source_file_at_line_312; // "Beyonc\x{E9}"
 
    ...

    static void func343() {
         STRING_T *string1 =
&literal_number_2_from_bitc_source_file_at_line_312; // "Beyonc\x{E9}"
    }

    ...

    /** Here starts the table of all string literals in use in
bitc_source_file **/
    static const STRING_T
literal_number_2_from_bitc_source_file_at_line_312 =
       // This is supposed to represent "Beyonc\x{E9}" in UTF-32 encoding
       { 0x42L, .....
       }

-- 
<< Tout n'y est pas parfait, mais on y honore certainement les jardiniers >>

			Dominique Quatravaux <dom at kilimandjaro.dyndns.org>




More information about the bitc-dev mailing list