Numerical Systems lecture

Numerical Systems

When we want to represent or communicate information in a computer system, we need to make decisions about how that information should be encoded.

One option that we will look at in detail is using binary. However, in principle we are not limited using binary to encode information. For instance, we encode data in a set of traffic lights or in the spins of subatomic particles. This is part of a branch of mathematics and computer science called information theory.

Our goal in this lecture is to be able to design simple encodings for arbitrary data. For example, in the lecture notes you will see the example of encoding binary files as twitter posts containing only emojis. On the way, we will learn about:

  • How computers represent data
  • Binary, octal, and hexadecimal number systems

Mathematics and Problem Solving

1. Numerical Systems

David Gundry

Learning Outcomes

    Encode numbers and data with binary and other number systems
    1. Interpret a number in any base written using positional notation
    2. Understand binary can represent text and data
    3. Encode ASCII-encoded text as binary
    4. Apply logarithms to find the inverse of exponentiation
    5. Calculate how many digits are needed to write a given number
    6. Calculate how many numbers can be stored with a given number of digits
    Manipulate positive and negative numbers written in binary
    1. Interpret binary numbers encoded using sign and magnitude
    2. Find the Ones complement of a binary number
    3. Find the Twos complement of a binary number
    4. Perform binary addition
    5. Perform binary subtraction
    6. Perform binary multiplication
    Convert numbers between different bases
    1. Convert from any base to decimal
    2. Convert from decimal to any base
    3. Convert directly between bases that are powers of two

Learning Outcomes

  1. Encode numbers and data with binary and other number systems
  2. Manipulate positive and negative numbers written in binary
  3. Convert numbers between different bases

Introduction

Binary is a way of writing numbers using only ones and zeroes. Everyone knows that computers store data in binary.

Of course, in reality computers store data in the physical properties of objects: hard drives, solid state drives, or even tape drives. In a hard drive, areas of a metal disk are magnetised or demagnetised to encode data, which gives two states that can be reliably differentiated. Binary is a way of abstracting this two-state physical system.

The smallest unit of information, a 0 or 1 is called a bit. Eight of these together is called a byte. The number of bits of information in a system provides an upper bound on the unique number of states that it can represent.

We can think of systems of all kinds as storing information, and they need not include only two states. However, the unit used to quantify information in any system is the bit. For instance, in physics one can calculate the information density of the universe to be 1 bit per plank area (\( A_p = 10^{-70}m^2 \)) of the boundary of the universe, leading to a theoretical maximum on the amount of information the universe can contain. To simulate the observable universe would require a computer to store approximately \( 10^{10^{123}} \) bits of information.

However, we will constrain ourselves to much more practical scales. How can we use binary to encode information in our computing systems? The most straightforward way to use binary is to represent numbers, and this is what we will look at first. The rules by which binary does this are the same as many number systems: positional notation.

Encoding

Overview

To be able to encode binary or text data as emojis we will first address smaller problems which will let us build the necessary knowledge.

We will begin by learning about binary and the way binary numbers are represented using positional notation. We will learn how to encode text and other data in binary.

We will then extend this to representing data in other number systems such as octal and hexadecimal. As part of this we will learn to convert between different number systems.

Overview

  1. Positional Notation
  2. Negative numbers in binary
  3. Binary arithmetic
  4. Number systems beyond binary
  5. Conversion between bases

Positional Notation

When we want to work with numbers we need to write them down somehow. The Romans did this using Roman numerals: I, II, III, IV, etc. However, we need a form of notation that also lets us efficiently express large numbers such as $24,674,349$. The system that we use was first invented by the ancient Babylonians, and is called positional notation.

To start with, we need a set of symbols (digits) to use to represent numbers. The decimal system has 10 symbols (called base 10) \( \{ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9 \} \). Binary uses two symbols (base 2) \( \{ 0, 1 \} \).

When we write a number, we write an ordered string of digits. Each digit contributes to the overall value of the number. The value each digit contributes is dependant upon three things:

  • the digit used (e.g. 0, 1, 2, etc.)
  • the position of that (e.g. rightmost, second from right, third from right, etc.)
  • the base (here 10, as we are working in decimal)

We sum the values to find the value of the number overall. For example, the number 423 in decimal (base 10) is equivalent to $4 \times 100 + 2 \times 10 + 3$. We can also write this:

\[ 4 \times 10^2 + 2 \times 10^1 + 3 \times 10^0 = 423 \]

Which is four hundred and twenty three.

Each digit is multiplied by a positional value or 'weight' that depends upon its position in the number. This positional value is the base (here 10) raised to the power of the position of the digit, counting from the right. The right most digit has position 0, so has weight $10^0 = 1$. This position is also called the 'units'; position 1 (with weight $10^1$) is also called the 'tens'; and so on.

Decimal Number0000
Positional Value\( 10^3 \)\( 10^2 \)\( 10^1 \)\( 10^0 \)
Positional Value\( 1000 \)\( 100 \)\( 10 \)\( 1 \)
Total\( 0 \)\( 0 \)\( 0 \)\( 0 \)= 0

\[ (0000)_10 = 0 \times 10^3 + 0 \times 10^2 + 0 \times 10^1 + 0 \times 10^0 \]\[ = \]\[ = 0 \]

Positional Notation

How do we represent number?
Digit0123456789

Positional Notation

Decimal Number423
Positional value$10^2$$10^1$$10^0$
Value namehundredstensunits
Positional Value100101
Total400203

Binary in Positional Notation

Binary is also expressed using the rules of positional notation. However, in binary, rather than the positional values being increasing powers of 10 as in decimal, they are increasing powers of 2.

If it is not obvious why, consider that a one digit binary number can store only two values 0 and 1. By the time we need another digit the next number to represent is 2. Similarly, if we have two digits, they can represent $ 2 \times 2 $ values (0-3), so the next number we need to represent is $4$. Three digits can represent $2 \times 2 \times 2$ values, so the next number to represent is $8$, and so on in increasing powers of 2.

Binary Number01010
Positional Value\( 2^4 \)\( 2^3 \)\( 2^2 \)\( 2^1 \)\( 2^0 \)
Positional Value\( 16 \)\( 8 \)\( 4 \)\( 2 \)\( 1 \)
Total\( 0 \)\( 8 \)\( 0 \)\( 2 \)\( 0 \)= 10

\[ (01010)_2 = 0 \times 2^4 + 1 \times 2^3 + 0 \times 2^2 + 1 \times 2^1 + 0 \times 2^0 \]\[ = 8 + 2 \]\[ = 10 \]

When writing numbers that are not in base 10, we will often notate them as $(n)_b$, where $n$ is the number and $b$ is the base.

It is helpful to have memorised the first several powers of 2 starting from $2^0: 1, 2, 4, 8, 16, 32, 64, 128, 256, 512, 1024, \dots$. These are the positional values in binary and knowing them off-by-heart will allow you to read binary numbers in your head.

Positional Notation in Binary

Binary Number01010
Positional Value\( 2^4 \)\( 2^3 \)\( 2^2 \)\( 2^1 \)\( 2^0 \)
Positional Value\( 16 \)\( 8 \)\( 4 \)\( 2 \)\( 1 \)
Total\( 0 \)\( 8 \)\( 0 \)\( 2 \)\( 0 \)= 10

Binary numbers

What are the values of the following binary numbers in decimal?

    Question

    What are the values of the following binary numbers in decimal?

      Encoding text and binary data in binary

      How do we store data in binary? For instance, say I have a text file containing the text:

      the quick brown fox jumps over the lazy dog

      How would this be stored in binary?

      For most data, such as text, it is common to group bits together into groups of fixed length. For instance, bits might be grouped into bytes (8 bits) or words (commonly 32 bits or 64 bits, depending on processor architecture). Primitive data types in most programming languages have a fixed length. For example, in C a char is 8 bits and an int is 32 bits.

      If we group bits into groups of 7 bits, we can work with our data as an array or list of numbers between 0 and 127. This is because 0000000, or 0 is the smallest number that can be stored and 1111111, or 127 is the largest number.

      Let us associate each of these 128 values with a character. There is a standard way of doing this called ASCII. ASCII includes all English alphabet letters (upper and lower case), numbers, and several non-printing characters such as line breaks. We can encode text in ASCII by storing each character as the associated ASCII value in binary.

      DecimalCharacterBinaryHexadecimal
      00000000000
      10000000101
      20000001002
      30000001103
      40000010004
      50000010105
      60000011006
      70000011107
      80000100008
      9 0000100109
      10 000010100A
      11 000010110B
      12 000011000C
      13 000011010D
      14000011100E
      15000011110F
      160001000010
      170001000111
      180001001012
      190001001113
      200001010014
      210001010115
      220001011016
      230001011117
      240001100018
      250001100119
      26000110101A
      27000110111B
      28000111001C
      29000111011D
      30000111101E
      31000111111F
      32 0010000020
      33!0010000121
      34"0010001022
      35#0010001123
      36$0010010024
      37%0010010125
      38&0010011026
      39'0010011127
      40(0010100028
      41)0010100129
      42*001010102A
      43+001010112B
      44,001011002C
      45-001011012D
      46.001011102E
      47/001011112F
      4800011000030
      4910011000131
      5020011001032
      5130011001133
      5240011010034
      5350011010135
      5460011011036
      5570011011137
      5680011100038
      5790011100139
      58:001110103A
      59;001110113B
      60<001111003C
      61=001111013D
      62>001111103E
      63?001111113F
      64@0100000040
      65A0100000141
      66B0100001042
      67C0100001143
      68D0100010044
      69E0100010145
      70F0100011046
      71G0100011147
      72H0100100048
      73I0100100149
      74J010010104A
      75K010010114B
      76L010011004C
      77M010011014D
      78N010011104E
      79O010011114F
      80P0101000050
      81Q0101000151
      82R0101001052
      83S0101001153
      84T0101010054
      85U0101010155
      86V0101011056
      87W0101011157
      88X0101100058
      89Y0101100159
      90Z010110105A
      91[010110115B
      92\010111005C
      93]010111015D
      94^010111105E
      95_010111115F
      96`0110000060
      97a0110000161
      98b0110001062
      99c0110001163
      100d0110010064
      101e0110010165
      102f0110011066
      103g0110011167
      104h0110100068
      105i0110100169
      106j011010106A
      107k011010116B
      108l011011006C
      109m011011016D
      110n011011106E
      111o011011116F
      112p0111000070
      113q0111000171
      114r0111001072
      115s0111001173
      116t0111010074
      117u0111010175
      118v0111011076
      119w0111011177
      120x0111100078
      121y0111100179
      122z011110107A
      123{011110117B
      124|011111007C
      125}011111017D
      126~011111107E
      127011111117F

      This 7-bit standard is extended in 8-bit ASCII, which uses 8 bits per character (256 possible symbols). Here, by adding one bit (the bit with positional value $2^7$) we gain an additional $2^7 = 128$ possible characters.

      Another extension of ASCII, UTF-16, uses 16 bits per character, and so can represent \( 2^{16} = 65536 \) symbols.

      The string

      would be stored in ASCII as:

      01110100 01101000 01100101 00100000 01110001 01110101 01101001 01100011 01101011 00100000 01100010 01110010 01101111 01110111 01101110 00100000 01100110 01101111 01111000 00100000 01101010 01110101 01101101 01110000 01110011 00100000 01101111 01110110 01100101 01110010 00100000 01110100 01101000 01100101 00100000 01101100 01100001 01111010 01111001 00100000 01100100 01101111 01100111

      This lets us encode text into binary. However, say we want to encode text in a different way - say using emojis? The next thing we need to know is how to represent numbers and data in other number systems, which we will be able to generalise to novel encoding systems.

      How do we use binary to store data?

      ASCII

      DecimalCharacterBinaryHexadecimal
      00000000000
      10000000101
      20000001002
      30000001103
      40000010004
      50000010105
      60000011006
      70000011107
      80000100008
      9 0000100109
      10 000010100A
      11 000010110B
      12 000011000C
      13 000011010D
      14000011100E
      15000011110F
      160001000010
      170001000111
      180001001012
      190001001113
      200001010014
      210001010115
      220001011016
      230001011117
      240001100018
      250001100119
      26000110101A
      27000110111B
      28000111001C
      29000111011D
      30000111101E
      31000111111F
      32 0010000020
      33!0010000121
      34"0010001022
      35#0010001123
      36$0010010024
      37%0010010125
      38&0010011026
      39'0010011127
      40(0010100028
      41)0010100129
      42*001010102A
      43+001010112B
      44,001011002C
      45-001011012D
      46.001011102E
      47/001011112F
      4800011000030
      4910011000131
      5020011001032
      5130011001133
      5240011010034
      5350011010135
      5460011011036
      5570011011137
      5680011100038
      5790011100139
      58:001110103A
      59;001110113B
      60<001111003C
      61=001111013D
      62>001111103E
      63?001111113F
      64@0100000040
      65A0100000141
      66B0100001042
      67C0100001143
      68D0100010044
      69E0100010145
      70F0100011046
      71G0100011147
      72H0100100048
      73I0100100149
      74J010010104A
      75K010010114B
      76L010011004C
      77M010011014D
      78N010011104E
      79O010011114F
      80P0101000050
      81Q0101000151
      82R0101001052
      83S0101001153
      84T0101010054
      85U0101010155
      86V0101011056
      87W0101011157
      88X0101100058
      89Y0101100159
      90Z010110105A
      91[010110115B
      92\010111005C
      93]010111015D
      94^010111105E
      95_010111115F
      96`0110000060
      97a0110000161
      98b0110001062
      99c0110001163
      100d0110010064
      101e0110010165
      102f0110011066
      103g0110011167
      104h0110100068
      105i0110100169
      106j011010106A
      107k011010116B
      108l011011006C
      109m011011016D
      110n011011106E
      111o011011116F
      112p0111000070
      113q0111000171
      114r0111001072
      115s0111001173
      116t0111010074
      117u0111010175
      118v0111011076
      119w0111011177
      120x0111100078
      121y0111100179
      122z011110107A
      123{011110117B
      124|011111007C
      125}011111017D
      126~011111107E
      127011111117F
      01110100 01101000 01100101 00100000 01110001 01110101 01101001 01100011 01101011 00100000 01100010 01110010 01101111 01110111 01101110 00100000 01100110 01101111 01111000 00100000 01101010 01110101 01101101 01110000 01110011 00100000 01101111 01110110 01100101 01110010 00100000 01110100 01101000 01100101 00100000 01101100 01100001 01111010 01111001 00100000 01100100 01101111 01100111
      74 68 65 20 71 75 69 63 6B 20 62 72 6F 77 6E 20 66 6F 78 20 6A 75 6D 70 73 20 6F 76 65 72 20 74 68 65 20 6C 61 7A 79 20 64 6F 67

      Binary Arithmetic

      Now that we have seen how to use encode positive and negative numbers in binary, we will take a look at what we can do with these numbers.

      Binary Arithmetic

      Padding with zeroes

      When performing arithmetic on unsigned numbers it is sometimes necessary to pad numbers to the left with 0s. So long as these 0s go to the left (the most significant bit), and the number is unsigned, they do not change the value of the number.

      Addition

      To be able to add binary numbers, we need to be able to add together three bits: one from each number, and a carry bit. We start with the least significant bit of each number to add and our carry bit is 0. The sum is a single bit; any overflow goes into the carry bit.

      1. Add lowest pair of bits
      2. Determine sum and carry
      3. From then on, add each pair of bits with previous carry
      4. Final carry is overflow

      This is how addition works in a digital circuit. A full adder is a circuit used to add two bits of data together. It takes in two bits to add and a carry bit and returns a sum bit and a carry bit. A half adder is the same, but does not take in a carry bit. To add two 8 bit numbers together we need 1 half adder and 8 full adders. The table below shows the truth table for a full adder. The truth table for a half adder is the same as the top half of the table, but it does not take a carry bit input.

      AB\( C_{in} \)Sum\( C_{out} \)
      00000
      10010
      01010
      11001
      00110
      10101
      01101
      11111

      For example, to add $0001$ and $1011$, we first add $1$ and $1$, for a sum of $0$ and a carry of $1$. We then move to the next column. We add $0$ and $1$ and our carry bit ($1$), for a sum of $0$ and a carry of $1$. We then add $0$ and $0$ and the carry bit ($1$), for a sum of $1$ and a carry of $0$. We then add $0$ and $1$ and the carry $0$ for a sum of $1$ and a carry of $0$.

        0001
      + 1011
      ------
        1100

      Addition

      As easy as $1 + 1 = 10$

      Addition



                1110
      + 11011
      ------------
      101001

      Half Adder (\( A + B \))

      AB$Sum$\( C_{out} \)
      0000
      1010
      0110
      1101

      Full Adder (\( A + B + C_{in} \))

      AB\( C_{in} \)Sum\( C_{out} \)
      00000
      10010
      01010
      11001
      00110
      10101
      01101
      11111

      Binary Addition

        Question

        Solve the following binary addition questions

          Subtraction

          To subtract a number, e.g. $a$, we can add its additive inverse, i.e. $-x$. When we represent negative numbers using twos complement, we can perform subtraction using the same algorithm as addition. However, if we represent negative numbers in a different way, we need to use a different algorithm for subtraction.

          \[ a - b = a + (-b) \]

          When we are using twos complement, to do $a - b$ we can flip all of the bits of $b$, add 1 (i.e. find its twos complement), and then find $a+b$ using the method described above for binary addition.

          Subtraction

          Add additive inverse

          \[ a - b = a + (-b) \]

          Subtraction



            00001110
          + 11100101
          ------------
          11110011

          Multiplication

          Binary multiplication is addition with a series of shifts. Imagine that we were performing it as a long-form multiplication. Multiply each digit in the bottom number by the top number, and write the answers in a column. Whenever the digit we are multiplying is not the rightmost (least significant bit), we must shift all bits in our answer left by the position of this digit. Once we've multiplied the top number by each digit in the bottom number, we add up the columns to get our answer.

             101
          x   10
          ------
             000
          + 1010
          ------
            1010

          In the example above, the line $000$ is obtained by multiplying the digit $0$ by $101$. The line $1010$ is obtained by multiplying the digit $1$ by $101$ and then shifting all bits left by $1$, because our digit $1$ is really a $10$.

          Multiplication



                  1110
          x 11011
          ------------
          0
          110110
          1101100
          11011000
          ------------
          101111010

          Binary Multiplication

            Question

            Solve the following binary multiplication questions

              Number systems beyond binary

              If you have done any digital image editing or web development, you will be familiar with colours expressed as hexadecimal strings. For example, in the colour picker below, the colour is represented in hexadecimal.

              209
              122
              34

              A hex colour can be read in three parts. The first two digits give the red component, the second two are the blue component, and the the third two are the green component.

              Hexadecimal (also called base 16) uses 16 symbols. We count 0 to 9 and then continue counting on the letters A (representing 10) to F (representing 15). The 16 symbols used in hexadecimal are:

              Hexadecimal0123456789ABCDEF
              Decimal0123456789101112131415

              Hexadecimal Number0000
              Positional Value\( 16^3 \)\( 16^2 \)\( 16^1 \)\( 16^0 \)
              Positional Value\( 4096 \)\( 256 \)\( 16 \)\( 1 \)
              Total\( 0 \)\( 0 \)\( 0 \)\( 0 \)= 0

              \[ (0000)_16 = 0 \times 16^3 + 0 \times 16^2 + 0 \times 16^1 + 0 \times 16^0 \]\[ = \]\[ = 0 \]

              Positional notation can work with any base. Octal, base 8 is another commonly used base. Base 1, unary, is possible so long as numbers can be of varying length.

              Octal01234567
              Decimal01234567

              For any base $b$, the positional value of the $n$th digit from the right is \( b^{n-1} \)`.

              Any other base could be devised, though if the base is higher than 10 it would be necessary to decide on what symbols to use after 9, as is the case with hexadecimal.

              Number Systems Beyond Binary

              Hexadecimal

              209
              122
              34

              Hexadecimal

              Hexadecimal0123456789ABCDEF
              Decimal0123456789101112131415

              Hexadecimal

              Hexadecimal Number0000
              Positional Value\( 16^3 \)\( 16^2 \)\( 16^1 \)\( 16^0 \)
              Positional Value\( 4096 \)\( 256 \)\( 16 \)\( 1 \)
              Total\( 0 \)\( 0 \)\( 0 \)\( 0 \)= 0

              Possible values stored

              Let's say we want to devise a way of storing numbers using a given set of symbols. How many numbers could we store with a given number of characters? How many characters would we need to store a given number?

              For example, say a URL shortening service like Bitly wants to encode a number in a URL. They want to use the fewest characters they can to improve efficiency. However, they are restricted with what characters URLs are allowed to contain. How do they decide on the optimal encoding?

              To start with, lets refresh our intuition about multiplication. If I have a grid of 5 rows and 5 columns, I have $5 \times 5 = 25$ squares. If I have two digits and each digit take $5$ values, there are $5 \times 5 = 25$ pairs of values, or 25 unique 2-digit numbers.

              In a base $b$ number, if we have $2$ digits, there are $b^2$ unique strings of digits and thus $b^2$ numbers can be represented. If we had 3 digits, $b^3$ numbers can be represented, and so on.

              In a base $b$ number with $n$ digits, we can store $b^n$ values.

              We could assign these unique values to any 'state' in a system, be that a numerical value or not. With 2 bits of information we can distinguish 4 different states. If we roll a four sided dice and look at the number, we have learned 2 bits of information.

              If we decide to use these unique values to represent numbers starting from 0, the highest number we get to will be $b^n -1$. Thus the highest number a base $n$ number of $n$ digits can store under positional notation is $b^n-1$

              Possible values stored

              In base $b$:

              How many unique values could we store with $n$ digits?

              Possible values stored

              In a base $b$ number with $n$ digits, we can store $b^n$ values.

              Some examples of possible values stored

              $n$ bits can represent $2^n$ unique combinations or states. Thus if we have 8 bits, we can represent 256 unique states. A file that is 1 KB stores 8,000 bits. Thus there are exactly \( 2^{8000} \) different 1KB files (which, of course, is a very large number).

              To go back to earlier in the lecture, if a universe contains \( 10^{10^{123}} \) bits of information, there is a theoretical maximum of \( 2^{10^{10^{123}}} \) possible states the universe can be in, before we consider the laws of physics.

              Positional Values

              What is the positional value, or 'weight', of the highest digit in the following numbers? How many unique values can the following represent?

              1. 3 digit decimal number
              2. 2 digit binary number
              3. 3 digit octal number
              4. 2 digit hexadecimal number

              Bits Required

              We can now find out what number of bits a given number requires. However, what if we want to reverse the question: how many bits we need to store a given number? What if we want to store $x$ unique values, how many digits do we need?

              We found the number of unique values for a given number of digits using exponentiation. The inverse of this is the logarithm. The logarithm is a mathematical function that, for a particular base, tells you how the 'order of magnitude' of a number, or the number of digits it contains. Logarithms are commonly performed to base 2, base 10, or base $e$ (a constant called Euler's number, this log is also called the natural log). The logarithm of $a$ to the base $b$ is written:

              \[ \log_b a \]

              Explore the interactive graph below. This graph shows values up to $log_b a = 4$. Start by exploring base 10. See how the number of digits in $a$ corresponds to the value of \( log_{10} a \).

              10
              50\[ log_{10}(50) = 1.70 \]

              Logarithm is the inverse of exponentiation. Observe how taking the base $b$ logarithm of both sides of the equation below cancels out the exponentiation:

              \[ a = b^n \]
              \[ \log_b a = n \]

              To store $a$ unique values, you need $ \lceil \log_b a \rceil $ digits

              (The notation $\lceil x \rceil$ means round $x$ up: you can't have half a digit!).

              To store a value $x$ in base $b$, you need $\lceil log_b (x+1) \rceil$ digits.

              Bits required

              In base $b$:

              How many digits to store $x$ unique values?

              Logarithms

              \[ a = b^n \]
              \[ \log_b a = n \]
              10
              50\[ log_{10}(50) = 1.70 \]

              Logarithms

              \[ \#values = base^{\#digits} \]
              \[ \log_{base} \#values = \#digits \]

              Log Example

              If you have a 3 digit number in base 10, how many values can it take?

              \[ 10^3 = 1000 \]

              Log Example

              If you want to represent 1000 values in base 10, how many digits do you need?

              \[ 3 = \log_{10} 1000 \]

              Log Example 2

              With 8 bits what is the largest number can you store?

              \[ 2^8 -1 = 255 \]

              Log Example 2

              To store numbers up to 255, how many bits do you need?

              \[ 8 = \log_{2} (255+1) \]

              Digits required

              To store $a$ unique values, you need $ \lceil \log_b a \rceil $ digits

              To store a value $x$ in base $b$, you need $\lceil log_b (x+1) \rceil$ digits.

              How many digits are required?

              How many digits does it take to store the following unique values in the given bases?

              1. 9 $\rightarrow$ base 3
              2. 123 $\rightarrow$ base 7
              3. 2422 $\rightarrow$ base 11
              4. 14 $\rightarrow$ base 8
              5. 18 $\rightarrow$ base 7
              6. 242 $\rightarrow$ base 17

              Working with encodings

              Let us consider encoding numbers in emoji. Emojis are pictograms, such as 😂, 😃, 🧘🏻‍♂️, 🌍, 🌦️, 🥖, 🚗, 📱, 🎉, ❤️, ✅, and 🏁. According to Wikipedia, the Unicode standard contains 1,424 symbols that are considered emoji. Thus one emoji could be considered a a one digit number in base 1424.

              We know that a $n$ digit number can store $b^n$ unique values. For example, with 2 emoji we can store a number up to $1424^2 = 2027776$ unique values. So we can work out how large a number we can encode in emoji. However, what we really want to know is how much data an emoji can store.

              How many emoji do we need to store one bit of data? To store 2 unique values, we need

              \[ log_{1424} 2 \approx 0.09545871 \text{ emoji/bit} \]

              We can't have a fraction of an emoji, but we can work out from this the number of bits each emoji communicates:

              \[ \frac{1}{log_{1424} 2} \approx 10.4757334349 \text{ bits/emoji} \]

              We can see that one emoji can store slightly more than 10 bits of data (remember \( 2^{10} = 1024 \)). So to store 1 emoji by itself we need $11$ bits (because we can't store a fraction of a bit). However what if we have one tweet's worth of emoji? If one tweet can contain 280 emoji, we can communicate nearly 3000 bits of information.

              \[ 10.4757334349 * 280 = 2933.20536177 \text{ bits} \]

              This is about 366 bytes. Thus if we want to encode information in tweets using just emojis, we can send 366 bytes of data per tweet.

              However, imagine that we actually do want to store a number in emoji. For example, perhaps we have an internet of things (IoT) sensor that is tweeting its current reading, it might be a 16-bit integer in binary (for which we would need 2 emoji). How do we actually go about converting from binary to emoji? More generally, how do we convert from one base to another? We will see this in the next section.

              Minecraft is a game that has a 3D voxel world. Say we know we there 820 blocks in Minecraft, how many bits do we need to store per block? We can use the formula given in the section above.

              \[ \lceil \log_{2} 820 \rceil = 10 \]

              To store the value of a block requires 10 bits. If a minecraft world is 320 blocks high, and 60 million blocks wide and long, we can calculate the theoretical maximum size of the world to be:

              \[ 10 \times 320 \times 60 \times 10^7 \times 60 \times 10^7 \]
              \[ = 1.92 \times 10^{20} \text{ bits} \]

              In practice not all of this is generated and much of what has been generated can be significantly compressed. Only if the world was completely random would you actually need all \( 1.92 imes 10^{20} \) bits to store it.

              A Minecraft world is not truly random, however. We know this because a world can be generated from a seed. Indeed, if the world has been unmodified, we must be able to compress the world down to this seed.

              There is a class of numbers called Computable numbers that work in a similar way. A Computable number is a number that can be computed by an algorithm which is shorter than the number itself. $\pi$ is a good example of a computable number, as a short algorithm can generate the infinite digits of $\pi$. Interestingly, there must also be non-computable numbers - numbers that cannot be generated by any algorithm shorter than the number itself.

              Assuming a Minecraft seed is a 19 digit signed decimal number, there are \( 2 \times 10^{19} -1 \) possible seeds. There can thus only be this many new Minecraft worlds (until they have been changed during play).

              Minecraft is clever in how it compresses its worlds. More detail about this here: https://gaming.stackexchange.com/a/263881

              Negative numbers in Binary

              Computers store data in binary. We can think of this as just storing a great many binary numbers. These numbers have no context. They do not themselves tell us how they should be interpreted. Each time we read a string of bits, it is up to us (or up to the state of the program) how to interpret it.

              In particular, when interpreting a number we need to know whether it is supposed to be a signed number (i.e. one that can represent negative values) or unsigned number (i.e. one that only stores non-negative numbers).

              There is no universal way of indicating that one of our numbers is negative, or that a number should be interpreted as a signed number. We need to know in advance how to interpret a particular string of bits. For signed numbers, this means we need to know which representation has been used to encode it.

              Negative Numbers

              Mapping negative numbers

              8 bits can represent 256 numbers. Typically with think of the numbers represented as being between 0 - 255. But how do we encode negative numbers?

              We can treat some range of values as if they are negative values. To do this we map some range of positive numbers onto the negative numbers

              In other words, we need a transformation that:

              1. takes a positive number e.g. 1, and
              2. returns the number that represents its additive inverse, i.e. -1

              For example 1 might map to 255 (which represents -1)

              When we map some range of positive numbers onto the negative numbers, we want it to:
              • Represent as many numbers as possible with the bits available (be efficient)
              • Be able to use the same algorithm/hardware for addition for positive and negative numbers

              Signed and unsigned numbers

              $2^n$ numbers.
              What do they represent?

              Sign and magnitude

              We might reason that a number is either positive or negative. This information is boolean (true or false) and thus we can represent it with a single bit. One approach to encoding negative numbers is therefore to reserve one of our bits as a sign bit (usually the first). The sign bit encodes whether the number is positive (0) or negative (1). Thus we might have the numbers:

              \[ (0000)_2 = +0 \]
              \[ (0001)_2 = +1 \]
              \[ (0010)_2 = +2 \]
              \[ (1000)_2 = -0 \]
              \[ (1001)_2 = -1 \]
              \[ (1010)_2 = -2 \]

              This is equivalent to mapping the (unsigned) numbers in the range $128$ - $256$ to negative values.

              Stored as0123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127
              Represents0123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127
              Stored as128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223224225226227228229230231232233234235236237238239240241242243244245246247248249250251252253254255
              Represents0-1-2-3-4-5-6-7-8-9-10-11-12-13-14-15-16-17-18-19-20-21-22-23-24-25-26-27-28-29-30-31-32-33-34-35-36-37-38-39-40-41-42-43-44-45-46-47-48-49-50-51-52-53-54-55-56-57-58-59-60-61-62-63-64-65-66-67-68-69-70-71-72-73-74-75-76-77-78-79-80-81-82-83-84-85-86-87-88-89-90-91-92-93-94-95-96-97-98-99-100-101-102-103-104-105-106-107-108-109-110-111-112-113-114-115-116-117-118-119-120-121-122-123-124-125-126-127

              Notice a curious and undesirable consequence: we have two distinct values for $0$: $-0$ and $+0$. This makes the representation inefficient. With 1 bit set aside for the sign bit, the magnitude of the number is stored with 7 bits. We can store a magnitude up to $2^7 = 127$. Thus we can store the range $-127$ to $127$, only 255 numbers. Yet we know that we should be able to store up to $2^8 = 256$ different values with 8 bits.

              Additionally, we require a different algorithm to do addition and subtraction on sign-and-magnitude numbers than we do on unsigned numbers. However, despite these drawbacks, sign and magnitude representation is one of the methods commonly used in microprocessors, mostly for embedded devices.

              Sign and Magnitude

              Sign BitMagnitude
              00000

              Value: +0

              Stored as01234567
              Represents01234567
              Stored as89101112131415
              Represents0-1-2-3-4-5-6-7

              One's Complement

              What else can we do to a number except for flipping one of the bits to represent a signed or unsigned number? How about flipping all of the bits?

              One's complement is a transformation used to represent a negative binary number by flipping every bit (i.e. every 0 becomes 1, every 1 becomes 0). For example, the one's complement of $(10101)_2$ is $(01010)_2$.

              BinaryDecimal
              00000
              11110

              Notice that we can still check whether a number is negative by looking at the first bit. If the first bit is a 1, our number is negative.

              Unfortunately, with this method we still have two zeroes: $(0000)_2$ and $(1111)_2$, thus just as with Sign and magnitude, this method is inefficient.

              One's Complement

              BinaryDecimal
              00000
              11110

              Flip every bit

              Two's complement

              Two's complement is a superior way of encoding negative numbers in binary. We can understand twos complement as an operation that we perform upon a binary number: "flip each bit, and add one".

              BinaryDecimal
              00000
              00000

              With two's complement we now only have a single value for zero, which is $(00000000)_2$. The representation is efficient as (in 8 bits) it makes use of all $2^8 =256$ unique values. We store numbers from $-127$ to $128$. 128 is stored as $(10000000)_2$ (for which the two's complement is $(10000000)_2$)

              Two's Complement

              BinaryDecimal
              00000
              00000

              Flip every bit and add 1

              Radix Complements

              It might be useful to understand twos complement as merely a special case of a radix complement. 'Radix' refers to the number of symbols in a base (which is almost always the same as the base). Thus base 2 (binary) has a radix of 2. Base 10 (decimal) has a radix of 10. The radix complement of an $n$ digit number $y$ in radix $b$ is:

              \[ b^n - y \]

              First, $b^n$ represents the number of numbers that can be stored in $n$ radix $b$ digits. By finding the difference of this and $y$, we get the complement of $y$ with respect to that radix.

              For example, the radix complement of 54 with three decimal digits is $10^3 - 54 = 1000 - 54 = 946$. This is the same as if we found the complement of each digit 9 and add 1. The complement of 0 would be 9, the complement of 1 would be 8, etc. The radix complement in base 10 is called the ten's complement.

                  054  (complement with 9)
                  ---
                  945
              +     1
                  ---
                  946

              Similarly, in binary the radix complement of $(1010)_2 = 10$ in 4 digits is $2^4 - 10 = 16 - 10 = 6 = (0110)_2$. This is the same as if we had found the complement of each bit with 1 (flipped each bit) and added 1. The radix complement in base 2 is called the two's complement.

                   1010  (complement with 1 / flip each bit)
                   ----
                   0101
              +       1
                   ----
                   0110

              Negative numbers

              What is the value of the following 4-bit signed numbers, stored using each of: sign-and-magnitude, one's complement, two's complement.

              1. 1001
              2. 0011
              3. 1111
              4. 1000

              Conversion between bases

              It's little use devising an encoding if we are unable to convert data into it. To start with the simplest case, we have already seen how to convert from any base to decimal. To convert from base $n$ to decimal, we multiply each digit by its positional value, which is $b^n$ for base $b$ and position $n$ (starting from 0 on the right).

              For example, we can convert $1110$ from binary into decimal as follows:

              \[ (1110)_2 = 2^3 + 2^2 + 2^1 = 14 \]

              Conversion between bases

              Decimal to base n

              To convert from decimal to any base $b$, we repeatedly divide our number by $b$, finding the remainder at each step.

              Let us consider the binary case first, as this is the easiest to understand.

               
              128 ÷ 2 = 64, Remainder = 0
              64 ÷ 2 = 32, Remainder = 0
              32 ÷ 2 = 16, Remainder = 0
              16 ÷ 2 = 8, Remainder = 0
              8 ÷ 2 = 4, Remainder = 0
              4 ÷ 2 = 2, Remainder = 0
              2 ÷ 2 = 1, Remainder = 0
              1 ÷ 2 = 0, Remainder = 1
              Result: 10000000

              To convert into binary we repeatedly divide by 2. The first time we do this, the remainder tells us whether 2 divides our number. If there is a remainder, the number was odd and the least significant digit must be 1. There is no other digit that can express this 'oddness' because all other digits are positive powers of 2, which are even.

              Our division by 2 has now effectively shifted our binary number one place to the right, just like dividing by 10 shifts a decimal number one place to the right. So when we find the remainder after dividing by 2, we are asking if this new, shifted number is odd or even, and thus whether it must contain a 1. But because we have shifted one place to the right, we need to shift back so our remainder becomes our _second_ least significant digit.

              This logic extends to other bases as well. Rather than the remainder or modulus operation telling us whether the number is odd, it tells us what is left over that any larger power of $b$ cannot represent. This must be in our least significant digit (as our least significant digit has a positional value of 1).

              Decimal to base $n$

              • Repeatedly divide by $n$.
                • Each time, the remainder becomes the next least significant binary digit.
               
              128 ÷ 2 = 64, Remainder = 0
              64 ÷ 2 = 32, Remainder = 0
              32 ÷ 2 = 16, Remainder = 0
              16 ÷ 2 = 8, Remainder = 0
              8 ÷ 2 = 4, Remainder = 0
              4 ÷ 2 = 2, Remainder = 0
              2 ÷ 2 = 1, Remainder = 0
              1 ÷ 2 = 0, Remainder = 1
              Result: 10000000

              Decimal to emoji-code

              Say we have our internet of things sensor that tweets its current value in emojis. We could apply the above approach to converting decimal numbers into strings of emoji. If there are working with a set of 1424 emoji, we are effectively converting in to base 1424. It turns out, however, that working with Unicode characters is awkward, so the implementation below is still buggy (and is working with a set of only 1421 emoji).

              Output in emoji: 👱

              Between Power-of-2 Bases

              We can convert efficiently between bases that are powers of 2.

              To convert from binary to hexadecimal, we can group bits into 4s, called quartets. We use four digits because to represent 1 hexadecimal digit (16 possible values) in binary requires 4 bits, as $\log_2 16 = 4$.

              If we do not have enough binary bits to fill a quartet, we pad to the left with zeroes. Adding zeroes to the left of a number does not change its value.

              Having grouped the bits into quartets, we convert each quartet into the corresponding hexadecimal digit.

              Binary11111111
              Decimal1515
              HexadecimalFF
              11111111
              FF

              The first (lowest) four binary digits are encoding the same information as the first (lowest) hexadecimal digit, and so on for the second, third, and fourth, etc.

              To go the other way, (i.e. from hexadecimal to binary) follow the process described above in reverse. Convert each hexadecimal digit into binary, and then concatenate them together. To convert between two bases that are both powers of 2, use binary as an intermediary.

              To convert between binary and a base that is a powers of 2, we can group bits. First consider how many bits it requires to represent a single digit in the target base. For instance, in base 8, there are 8 different possible digits, thus we need 3 binary digits, because $2^3 = 8$. Starting with our binary number, we group the bits starting from the least significant bits. E.g. the three least significant bits become the first group, the next four least significant bits after these form the second group. If necessary, the binary number can be padded to the left with zeroes. Each group can then be convened into a single digit of the target base.

              Power-of-2 bases

              When converting between power of 2 bases, e.g.

              • Binary
              • Octal
              • Hexadecimal

              we can use the technique of grouping digits.

              Octal to Binary

              We need 3 bits to store one octal digit

              \[ 2^3 = 8 \qquad \log_2 8 = 3 \]

              To convert binary to octal:

              • group our bits into triplets (threes), starting on the right
              • convert each triplet to its value in hexadecimal.

              Binary011111111
              Decimal377
              Octal377
              11111111
              FF

              Hexadecimal to Binary

              We 4 bits to store one hexadecimal digit

              \[ 2^4 = 16 \qquad \log_2 16 = 4 \]

              To convert binary to hexadecimal:

              • group our bits into quartets (fours), starting on the right
              • convert each triplet to its value in hexadecimal.

              Binary11111111
              Decimal1515
              HexadecimalFF
              11111111
              FF

              Between Power-of-n Bases

              To convert directly between other bases, we can generalise this method. If $n$ is the smaller base, and $b$ is the larger base, the grouping method will work if you get an integer result for the group size:

              \[ g = \log_n b \]

              For example, conversion between base 8 and base 64 is possible as \( g = \log_8 64 = 2 \) and \( 2 \in \mathbb{Z} \). To convert $(212)_8$ to base 64 we group of size $g = 2$ (our groups are therefore $02$ and $12$). To convert from base 64 to base 8, we convert each base 64 digit into two base 8 digits.

              To convert between two bases that are powers of $n$, go via base $n$ as an intermediary.

              Summary

              Summary

              Number Systems

              • Positional notation
              • Representing negative numbers

              Binary Arithmetic

              • Addition
              • Subtraction
              • Multiplication
              • Division

              Conversation between bases

              • Base n to decimal (positional notation)
              • Decimal to base n
              • Between power of 2 bases
              LecturePractical