UTF2(5) | File Formats Manual | UTF2(5) |
utf2
— Universal
character set Transformation Format encoding of runes
ENCODING |
"UTF2" |
New applications should not use UTF2.
The UTF2
encoding is based on a
proposed X-Open multibyte FSS-UCS-TF (File System Safe Universal Character
Set Transformation Format) encoding as used in
Plan 9 from Bell
Labs. Although it is capable of representing more than 16 bits, the
current implementation is limited to 16 bits as defined by the Unicode
Standard.
UTF2
representation is backwards
compatible with ASCII, so 0x00-0x7f refer to the ASCII character set. The
multibyte encoding of runes between 0x0080 and 0xffff consist entirely of
bytes whose high order bit is set. The actual encoding is represented by the
following table:
[0x0000 - 0x007f] [00000000.0bbbbbbb] -> 0bbbbbbb [0x0080 - 0x07ff] [00000bbb.bbbbbbbb] -> 110bbbbb, 10bbbbbb [0x0800 - 0xffff] [bbbbbbbb.bbbbbbbb] -> 1110bbbb, 10bbbbbb, 10bbbbbb
If more than a single representation of a value exists (for example, 0x00; 0xC0 0x80; 0xE0 0x80 0x80) the shortest representation is always used (but the longer ones will be correctly decoded).
The final three encodings provided by X-Open:
[00000000.000bbbbb.bbbbbbbb.bbbbbbbb] -> 11110bbb, 10bbbbbb, 10bbbbbb, 10bbbbbb [000000bb.bbbbbbbb.bbbbbbbb.bbbbbbbb] -> 111110bb, 10bbbbbb, 10bbbbbb, 10bbbbbb, 10bbbbbb [0bbbbbbb.bbbbbbbb.bbbbbbbb.bbbbbbbb] -> 1111110b, 10bbbbbb, 10bbbbbb, 10bbbbbb, 10bbbbbb, 10bbbbbb
which provides for the entire proposed ISO-10646 31 bit standard are currently not implemented.
October 11, 2002 | Mac OS X 12 |