Characters

Characters

We have seen that a binary value can mean many things. For example,

Bits

Meaning

Description

1001 0001

\(145_{10}\)

Positive Integer

1001 0001

\(-111_{10}\)

Two’s Complement

1001 0001

\(91_{16}\)

Hexadecimal

A single byte can means many things. A general binary value can have almost any value assigned to it. We could assign colors, sounds, or letters to these binary values.

One of the most common things we need a computer to deal with is text. On an 8-bit computer, the mapping between letters and numbers is called ASCII. ASCII stands for American Standard Code for Information Interchange. Modern computers can work with more than 8-bits. The Unicode standard allows for 16-bit and 32-bit characters, but the first subset of characters in these exactly match ASCII.

The ASCII character set has 127 characters. There are characters in the extend set which cover positions 128-255, but we will not be using them. Using all 255 characters is called the extended ASCII set. The base ASCII characters are shown below.

Notice that not all the number are for printing characters. The number 127 is given to pressing the delete key. The number 13 is used to tell a printer to carriage return. These special characters have meanings when printing, typing, or sending text over network.

ASCII Table

Since characters are just binary numbers, we can do math on them. For example, adding 32 to a character changes it from upper to lower case! We will use the subscript \(c\) to clarify when we are talking about ASCII characters.

\( \begin{align} A_{c} + 32_{10} =& a_{c} \end{align} \)

All the operations we saw for binary numbers can also be used to manipulate characters.

A sequence of characters is called a string. It is common to specify strings using double quotes. For example, “apple” would be a string of characters. A string is also a number, since it is a sequence of numbers in an order. We can convert every character to a number.

Characters

a

p

p

l

e

Base-10

97

112

112

108

101

Hexadecimal

61

70

70

6C

65

Each character in the string is an 8-bit number. We already saw that the largest base-10 number that could be stored in 8 bits was 255. That means a string is really just a base-256 number! We can figure out what number apple is representing.

\( \begin{align} \text{apple}_{256} =& 97*256^{4} + 112*256^{3} + 112*256^{2} + 108*256^{1} + 101*256^{0} \\ =& 418,498,243,685 \end{align} \)

All words, books, etc are just big base-256 numbers. If you generated every number in a certain range, you could also generate every book written with a maximum number of characters.

The Rape of Lucrece is one of William Shakespeare’s longest sonnets. It has 82,820 characters in the Project Gutenberg Edition. That means if you were to generate all integers between 0 and \(256^{82821}-1\) you would get every shorter sonnet along the way.

The Library of Babel takes this concept and generates all possible books with a certain number of characters.