Numerical Systems
When we want to represent or communicate information in a computer system, we need to make decisions about how that information should be encoded.
One option that we will look at in detail is using binary. However, in principle we are not limited using binary to encode information. For instance, we encode data in a set of traffic lights or in the spins of subatomic particles. This is part of a branch of mathematics and computer science called information theory.
Our goal in this lecture is to be able to design simple encodings for arbitrary data. For example, in the lecture notes you will see the example of encoding binary files as twitter posts containing only emojis. On the way, we will learn about:
- How computers represent data
- Binary, octal, and hexadecimal number systems
Learning Outcomes
- Interpret a number in any base written using positional notation
- Understand binary can represent text and data
- Encode ASCII-encoded text as binary
- Apply logarithms to find the inverse of exponentiation
- Calculate how many digits are needed to write a given number
- Calculate how many numbers can be stored with a given number of digits
- Interpret binary numbers encoded using sign and magnitude
- Find the Ones complement of a binary number
- Find the Twos complement of a binary number
- Perform binary addition
- Perform binary subtraction
- Perform binary multiplication
- Convert from any base to decimal
- Convert from decimal to any base
- Convert directly between bases that are powers of two
Introduction
Binary is a way of writing numbers using only ones and zeroes. Everyone knows that computers store data in binary.
Of course, in reality computers store data in the physical properties of objects: hard drives, solid state drives, or even tape drives. In a hard drive, areas of a metal disk are magnetised or demagnetised to encode data, which gives two states that can be reliably differentiated. Binary is a way of abstracting this two-state physical system.
The smallest unit of information, a 0
or 1
is called a bit. Eight of these together is called a byte. The number of bits of information in a system provides an upper bound on the unique number of states that it can represent.
We can think of systems of all kinds as storing information, and they need not include only two states. However, the unit used to quantify information in any system is the bit. For instance, in physics one can calculate the information density of the universe to be 1 bit per plank area (\( A_p = 10^{-70}m^2 \)) of the boundary of the universe, leading to a theoretical maximum on the amount of information the universe can contain. To simulate the observable universe would require a computer to store approximately \( 10^{10^{123}} \) bits of information.
However, we will constrain ourselves to much more practical scales. How can we use binary to encode information in our computing systems? The most straightforward way to use binary is to represent numbers, and this is what we will look at first. The rules by which binary does this are the same as many number systems: positional notation.
Overview
To be able to encode binary or text data as emojis we will first address smaller problems which will let us build the necessary knowledge.
We will begin by learning about binary and the way binary numbers are represented using positional notation. We will learn how to encode text and other data in binary.
We will then extend this to representing data in other number systems such as octal and hexadecimal. As part of this we will learn to convert between different number systems.
Positional Notation
When we want to work with numbers we need to write them down somehow. The Romans did this using Roman numerals: I, II, III, IV, etc. However, we need a form of notation that also lets us efficiently express large numbers such as $24,674,349$. The system that we use was first invented by the ancient Babylonians, and is called positional notation.
To start with, we need a set of symbols (digits) to use to represent numbers. The decimal system has 10 symbols (called base 10) \( \{ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9 \} \). Binary uses two symbols (base 2) \( \{ 0, 1 \} \).
When we write a number, we write an ordered string of digits. Each digit contributes to the overall value of the number. The value each digit contributes is dependant upon three things:
- the digit used (e.g.
0
,1
,2
, etc.) - the position of that (e.g. rightmost, second from right, third from right, etc.)
- the base (here 10, as we are working in decimal)
We sum the values to find the value of the number overall. For example, the number 423 in decimal (base 10) is equivalent to $4 \times 100 + 2 \times 10 + 3$. We can also write this:
Which is four hundred and twenty three.
Each digit is multiplied by a positional value or 'weight' that depends upon its position in the number. This positional value is the base (here 10) raised to the power of the position of the digit, counting from the right. The right most digit has position 0, so has weight $10^0 = 1$. This position is also called the 'units'; position 1 (with weight $10^1$) is also called the 'tens'; and so on.
Decimal Number | 0 | 0 | 0 | 0 | |
---|---|---|---|---|---|
Positional Value | \( 10^3 \) | \( 10^2 \) | \( 10^1 \) | \( 10^0 \) | |
Positional Value | \( 1000 \) | \( 100 \) | \( 10 \) | \( 1 \) | |
Total | \( 0 \) | \( 0 \) | \( 0 \) | \( 0 \) | = 0 |
Binary in Positional Notation
Binary is also expressed using the rules of positional notation. However, in binary, rather than the positional values being increasing powers of 10 as in decimal, they are increasing powers of 2.
If it is not obvious why, consider that a one digit binary number can store only two values 0
and 1
. By the time we need another digit the next number to represent is 2
. Similarly, if we have two digits, they can represent $ 2 \times 2 $ values (0-3), so the next number we need to represent is $4$. Three digits can represent $2 \times 2 \times 2$ values, so the next number to represent is $8$, and so on in increasing powers of 2.
Binary Number | 0 | 1 | 0 | 1 | 0 | |
---|---|---|---|---|---|---|
Positional Value | \( 2^4 \) | \( 2^3 \) | \( 2^2 \) | \( 2^1 \) | \( 2^0 \) | |
Positional Value | \( 16 \) | \( 8 \) | \( 4 \) | \( 2 \) | \( 1 \) | |
Total | \( 0 \) | \( 8 \) | \( 0 \) | \( 2 \) | \( 0 \) | = 10 |
When writing numbers that are not in base 10, we will often notate them as $(n)_b$, where $n$ is the number and $b$ is the base.
It is helpful to have memorised the first several powers of 2 starting from $2^0: 1, 2, 4, 8, 16, 32, 64, 128, 256, 512, 1024, \dots$. These are the positional values in binary and knowing them off-by-heart will allow you to read binary numbers in your head.
Binary numbers
What are the values of the following binary numbers in decimal?
Encoding text and binary data in binary
How do we store data in binary? For instance, say I have a text file containing the text:
the quick brown fox jumps over the lazy dog
How would this be stored in binary?
For most data, such as text, it is common to group bits together into groups of fixed length. For instance, bits might be grouped into bytes (8 bits) or words (commonly 32 bits or 64 bits, depending on processor architecture). Primitive data types in most programming languages have a fixed length. For example, in C a char
is 8 bits and an int
is 32 bits.
If we group bits into groups of 7 bits, we can work with our data as an array or list of numbers between 0 and 127. This is because 0000000
, or 0 is the smallest number that can be stored and 1111111
, or 127 is the largest number.
Let us associate each of these 128 values with a character. There is a standard way of doing this called ASCII. ASCII includes all English alphabet letters (upper and lower case), numbers, and several non-printing characters such as line breaks. We can encode text in ASCII by storing each character as the associated ASCII value in binary.
Decimal | Character | Binary | Hexadecimal |
---|---|---|---|
0 |