[bitc-dev] Newline conventions
David Hopwood
david.nospam.hopwood at blueyonder.co.uk
Sat Feb 18 15:44:09 EST 2006
Jonathan S. Shapiro wrote:
> Before somebody feels compelled to point out how horrible it is, let me
> say that I am already coming to *hate*
>
> #\{linefeed}
>
> More importantly, I am coming to hate strings like
>
> "Hello, world\{linefeed}"
>
> The entire current convention for character syntax is a nightmare. After
> reviewing the conventions used by Scheme, here is the revised plan that
> will be coming up in the next revision of the language specification:
>
> Characters:
>
> #\X is the character X provided X is printable
> #\U+XXXX is a unicode code point
> #\tab
> #\newline
> #\space
>
> Strings, between the outer double quotes:
>
> X is a character if X is printable
> \n -- newline
> \r -- carriage return
> \t -- horizontal tab
> \\ -- backslash
> \f -- formfeed
> \b -- backspace (?)
> \" -- double quote embedded in the string
> \U+XXXX -- unicode code point.
Unicode code points go up to U+10FFFF, so the syntax used in the standard
allows U+XXXXX and U+XXXXXX (but not less than 4 hex digits).
This does not create any ambiguity for characters, but it does for
embedded Unicode escapes in strings. There are several options:
1. Only support code points up to U+FFFF in strings.
2. Use a longest-match rule, so that "\U+10ABCD" is a string with a
single character, and it would be necessary to write
"\U+10AB\U+0043\U+0044" for the 3-character string "ႫCD".
3. Use \uXXXX and \U00XXXXXX. This is ugly, but consistent with
at least Java, C, C++, Python and Javascript.
4. Use a syntax with an explicit end-delimiter for Unicode escapes
(I would suggest either space or ';').
1 and 2 are pretty awful, so I think it has to be 3 or 4.
--
David Hopwood <david.nospam.hopwood at blueyonder.co.uk>
More information about the bitc-dev
mailing list