Computers use binary code (a series of 0s and 1s) to represent all types of data, including text. Each character you see on the screen is actually stored as a unique binary number inside the computer. This assignment of binary numbers to characters is managed by character sets.
Text Representation
Hello à [H] [e] [l] [l] [o]
H |
e |
l |
l |
o |
01001000 |
01100101 |
01101100 |
01101100 |
01101111 |
Each letter in "Hello" is converted into a unique binary code.
Character Sets: ASCII and Unicode
Character sets are standards that assign these binary numbers to characters so that computers can communicate text accurately.
ASCII (American Standard Code for Information Interchange)
ASCII is one of the earliest character sets used to encode text data in computers. It assigns a unique 7-bit binary number to each character, allowing for 128 possible characters. These include:
- Upper and lowercase English letters
- Numbers
- Punctuation marks
- Control characters (like newline and carriage return)
ASCII Table (simplified)
Character |
Decimal |
Binary |
A |
65 |
01000001 |
B |
66 |
01000010 |
a |
97 |
01100001 |
b |
98 |
01100010 |
0 |
48 |
00110000 |
1 |
49 |
00110001 |
Example: ASCII Representation
- 'A' = 65 in decimal = 01000001 in binary
Unicode
Unicode was developed to address the limitations of ASCII and to provide a universal character set that can represent text from all writing systems worldwide. Unicode uses different encoding forms, the most common being UTF-8, UTF-16, and UTF-32, which can use from 8 to 32 bits for each character.
Unicode Features:
- Includes over 140,000 characters
- Supports almost all languages and symbols
- Compatible with ASCII (the first 128 characters of Unicode are the same as ASCII)
Unicode Table (simplified)
Character |
Unicode |
Hexadecimal |
Binary |
A |
U+0041 |
0041 |
00000000 01000001 |
? (Japanese) |
U+3042 |
3042 |
00110000 01000010 |
? (Devanagari) |
U+0905 |
0905 |
00001001 00000101 |
Example: Unicode Representation
- 'A' = U+0041
- '?' = U+0905
Unicode allows for over 140,000 characters, covering most of the world's writing systems.
Let's generate some images to visually illustrate these concepts!