Character Data
ASCII
- maps characters to 7-bit numbers (0-127)
- dates from 1963
-
| decimal |
binary |
characters |
0-31 |
0000000-0011111 |
control |
32-47 |
0100000-0101111 |
punctuation |
48-57 |
0110000-0111001 |
numerals |
58-64 |
0111010-1000000 |
punctuation |
65-90 |
1000001-1011010 |
uppercase |
91-96 |
1011011-1100000 |
punctuation |
97-122 |
1100001-1111010 |
lowercase |
123-126 |
1111011-1111110 |
punctuation |
127 |
1111111 |
Delete |
- to type control chars 0 - 31 use Ctrl key with:
@ a-z [ / ] ^ _
ISO-8859-1
- 8-bit extension to ASCII
- adds Latin characters and punctuation for most European languages
- dates from 1985
- many variants exist
Unicode
- multi-byte coding for all characters
- dates from 1991
- code points
- represented by
U+hexnum
- Basic Multilingual Plane
U+0000 through U+FFFF
- Supplementary Multilingual Plane
U+10000 through U+1FFFF
- Supplementary Ideographic Plane
U+20000 through U+2FFFF
- other supplementary planes exist
- UTF-8
- Unicode Transformation Format, variable width, 1-4 bytes
- 0-127: single byte, same as ASCII
- 128-191: 2nd, 3rd, or 4th byte of a multibyte sequence
- 192-223: 1st byte of a 2-byte sequence
- 224-239: 1st byte of a 3-byte sequence
- 240-247: 1st byte of a 4-byte sequence
- UTF-16
- Unicode Transformation Format, variable width, 1 or 2 2-byte words
- byte order mark must be prepended, or UTF-16BE or UTF-16LE specified
- U+0000 through U+D7FF and U+E000 through U+FFFF: Basic Multilingual Plane
- U+D800 through U+DBFF: supplementary planes, high word
- U+DC00 through U+DBFF: supplementary planes, low word
- UTF-32
- Unicode Transformation Format, fixed 4 byte words