What Happens to My Code

Digital Communication - from Morse to Unicode

Early Encodings Timeline

1844 – Morse Code : First long-distance digital communication. Each character represented by a pattern of dots and dashes.
1870s – Baudot Code : 5-bit code used in teleprinters. Allowed automated text transmission.
1963 – ASCII : 7-bit American Standard Code for Information Interchange. Dominant in early computing.
1991 – Unicode (UTF-8) : Universal encoding covering all languages and symbols, backwards compatible with ASCII.

Encoding Comparison: The Letter "A"

System Year Encoding of "A" Bits
Morse Code 1844 · − Variable-length
Baudot Code 1870s 00011 5 bits
ASCII 1963 01000001 7 bits
UTF-8 1991 01000001 8 bits (same as ASCII for Latin letters)

Visual Examples

Morse code

This morse code table shows all letters, digits and some punctuation. Much more than that is not relevant to the simple equipment used.

Morse code

Morse code is send over a telegraph line using a key on one end, and a sounder on the other. A key is basically just a switch with a spring, a sounder is an electromagnet with a metal bar that makes a clicking sound.

Baudot code

Baudot invented a device that looked like a 5-key piano. The timing of pressing the keys had to be exactly right, a clicking sound would let the user know when to release the keys. The speed at which codes could be entered would become known as Baud Rate.

Baudot clavier

Notice that the code itself is stateful. By default, you would be entering letters, but with the figures command, subsequent code could represent a digit or punctuation. These type of instruction codes are known as control code. Another example would be the bell code, that actually rang a bell.

Baudot alphabet

Codes were written to a paper tape, later devices could also use paper tape as input and have Baud Rates much higher than humans could type. Typewriter-like devices could write text on a paper based on the paper tape. Even later, all this functionality would be combined in a single device called a Teletype.

Baudot tape

Ascii

With the introduction of the 7-bit ASCII standard, now there was room for lowercase letters and more special characters and control codes. Ascii was the reason that the 8-bit byte became the norm for not just data communication, but computer architecture in general.

Ascii Table

The Ascii control codes were used both for instructing teletypes such as shown here, as well as other types of hardware terminals. Control codes such as Start of Heading (SOH), Start Of Text (STX), Acknowledge (ACK), etc were used to structure messages send over a network.

Teletype Model 33
Teletype Model 33

Unicode

With the globalization of the internet, a universal character set was needed to represent all possible languages and catface emoji's. 😻
The 64-bit character set is continually growing, which proves the need to have encodings like UTF-8 to prevent a random text to have 4x the size of Ascii text.

UTF-8 encoding example

UTF-8 is a superset of Ascii, so all code based on Ascii would stil work as expected. That is, except for counting the number of characters, because 1 character can now suddenly be multiple bytes.

Resources