Home COMSC-171 <- Prev Next ->

Character Data

ASCII

maps characters to 7-bit numbers (0-127)
dates from 1963
decimal binary characters
0-31 0000000-0011111 control
32-47 0100000-0101111 punctuation
48-57 0110000-0111001 numerals
58-64 0111010-1000000 punctuation
65-90 1000001-1011010 uppercase
91-96 1011011-1100000 punctuation
97-122 1100001-1111010 lowercase
123-126 1111011-1111110 punctuation
127 1111111 Delete
to type control chars 0 - 31 use Ctrl key with: @ a-z [ / ] ^ _

ISO-8859-1

8-bit extension to ASCII
adds Latin characters and punctuation for most European languages
dates from 1985
many variants exist

Unicode

multi-byte coding for all characters
dates from 1991
code points
represented by U+hexnum
Basic Multilingual Plane U+0000 through U+FFFF
Supplementary Multilingual Plane U+10000 through U+1FFFF
Supplementary Ideographic Plane U+20000 through U+2FFFF
other supplementary planes exist
UTF-8
Unicode Transformation Format, variable width, 1-4 bytes
0-127: single byte, same as ASCII
128-191: 2nd, 3rd, or 4th byte of a multibyte sequence
192-223: 1st byte of a 2-byte sequence
224-239: 1st byte of a 3-byte sequence
240-247: 1st byte of a 4-byte sequence
UTF-16
Unicode Transformation Format, variable width, 1 or 2 2-byte words
byte order mark must be prepended, or UTF-16BE or UTF-16LE specified
U+0000 through U+D7FF and U+E000 through U+FFFF: Basic Multilingual Plane
U+D800 through U+DBFF: supplementary planes, high word
U+DC00 through U+DBFF: supplementary planes, low word
UTF-32
Unicode Transformation Format, fixed 4 byte words