[bitc-dev] BitCC-0.9.1 strings

Jonathan S. Shapiro shap at eros-os.org
Sat Feb 18 01:15:08 EST 2006


The string support in BitCC-0.9.1 is sadly lacking, and it will be
improving over the next few days. Hopefully by Monday we will have
proper UNICODE support in both the input lexer and the emitted programs.

As of a few minutes ago, stdio.read-char and stdio.write-char handle
multibyte UTF-8 characters correctly. This means that character I/O will
work properly for UTF-8 input. As a consequence of doing this, I now
know how to pattern match UTF-8 encoded strings in the BitCC lexer, and
I'll tackle that over the weekend.

For those who may be wondering, the reason that string is not just
(vector char) is that most code points are only 8 bits and char is a
32-bit type. The intention here is that the string type should have the
option to use a compact encoding. In the interest of fixing the current
issues preventing proper unicode string handling, I've decided that we
can live with the space-inefficient implementation of strings in the
bootstrap compiler.

When I fix the lexer I will probably update the character literal syntax
so that unicode code points are entered as

	#\{U+XXXX}

following the usual conventions for unicode encoding. There is no
benefit to making up a new syntax here.


shap



More information about the bitc-dev mailing list