Text Representation in Computers

OLevel

Computer Science 2210

Text Representation in Computers

Text Representation in Computers

Computers use binary code (a series of 0s and 1s) to represent all types of data, including text. Each character you see on the screen is actually stored as a unique binary number inside the computer. This assignment of binary numbers to characters is managed by character sets.

 

Text Representation

Hello à [H] [e] [l] [l] [o]

H

e

l

l

o

01001000

01100101

01101100

01101100

01101111

 

Each letter in "Hello" is converted into a unique binary code.

 

Character Sets: ASCII and Unicode

Character sets are standards that assign these binary numbers to characters so that computers can communicate text accurately.

ASCII (American Standard Code for Information Interchange)

ASCII is one of the earliest character sets used to encode text data in computers. It assigns a unique 7-bit binary number to each character, allowing for 128 possible characters. These include:

  • Upper and lowercase English letters
  • Numbers
  • Punctuation marks
  • Control characters (like newline and carriage return)

ASCII Table (simplified)

Character

Decimal

Binary

A

65

01000001

B

66

01000010

a

97

01100001

b

98

01100010

0

48

00110000

1

49

00110001

 

Example: ASCII Representation

  • 'A' = 65 in decimal = 01000001 in binary

Unicode

Unicode was developed to address the limitations of ASCII and to provide a universal character set that can represent text from all writing systems worldwide. Unicode uses different encoding forms, the most common being UTF-8, UTF-16, and UTF-32, which can use from 8 to 32 bits for each character.

Unicode Features:

  • Includes over 140,000 characters
  • Supports almost all languages and symbols
  • Compatible with ASCII (the first 128 characters of Unicode are the same as ASCII)

 

Unicode Table (simplified)

Character

Unicode

Hexadecimal

Binary

A

U+0041

0041

00000000 01000001

? (Japanese)

U+3042

3042

00110000 01000010

? (Devanagari)

U+0905

0905

00001001 00000101

 

Example: Unicode Representation

  • 'A' = U+0041
  • '?' = U+0905

Unicode allows for over 140,000 characters, covering most of the world's writing systems.

Let's generate some images to visually illustrate these concepts!

 

© 2019-2023 O’Level Academy. All Rights Reserved