Numerical Systems

"Steganography is the practice of representing information within another message or physical object, in such a manner that the presence of the information is not evident to human inspection" (Wikipedia)

Finding efficient ways of encoding information is essential when designing communications networks. It is also often essential when planning how applications (e.g. games, websites) communicate across the internet.

Structure of the Practical

You will complete this practical in groups. It is up to you how hard to make it. If you finish early, try to challenge yourself to improve your solution.

If you have finished, remember that you have your assessment problem questions to complete. These can be found on Moodle.

Mathematics and Problem Solving

1. Numerical Systems

David Gundry

Accessing the Practical

Via Moodle: click the link to today's practical

Direct at via the app: Select Practicals > Numerical Systems

Getting started

First, get into groups of 3-5.

Your goal is to encode information in a deck of cards. Here I will show a simple example to give you an idea of how to start. You can treat this as a jumping off point, as there is plenty of room for improvement.

You may need to go back to the lecture content to revise your understanding. This task is designed to help you deepen your understanding of the lecture.

Encoding

Information often needs to be encoded in a different format, e.g. text to binary

01101000 01100101 01101100 01101100 01101111

Playing cards

How can we encode information into a deck of playing cards?

Examples

Let's say we want to encode a decimal number using our deck of cards.

Each decimal digit can take one of 10 values. For simplicity, let's encode each digit as a card between Ace(1) and 9. We will represent 0 with 10. The top of the deck is the most significant digit. Each subsequent card is the next most significant digit. A joker will be interpreted as a decimal point (a joker at the start of the deck means the number is 0.something). A second joker means the end of the number. All other cards must come after the second joker. Finally, if the deck is face-down the number is negative, if the deck of cards is face-up the number is positive.

Thus if we have the deck: Ace, 2, 10, 9, Joker, Joker, ... we would decode this as the number 1209. We would decode Joker, 7, 2, 4, Joker, ... as 0.724.

To get you thinking about how this can be improved, consider these questions:

How many cards are not used to encode information?
What is the largest/smallest number that can be encoded?
Are there any numbers that cannot be encoded with these rules?
Are there any decks of cards that cannot be decoded to a number with these rules?

To get started, you might want to try one of these extensions to this system:

Use the unused cards to make it quicker to encode and decode certain types of number.
Use the unused cards in increase the number of numbers that can be encoded.
Develop a rule set based on this that can encode any decimal number up to a given number of digits.

Goal: encode a string of text in a deck of cards. Let us assume the string can be represented in 7-bit ASCII. This limits the symbols it can contain, so this assumption might cause us difficulties later, but let's go with it for now. Each symbol in the input string can be represented as a number between 0-125.

We have 54 cards. Each card can represent a number between 0-53. However, each card can be face up or face down. This gives us 108 possible states for a card to be in, encoding numbers between 0-107, which is base 108.

Card	Value
Ace of Clubs, face up	0
Ace of Clubs, face down	1
2 of Clubs, face up	2
...	...

A single 7-bit ASCII symbol has $ 2^7 = 126 $ possible values. Let's restrict ourselves to the alphabet letters without case, to which we'll add the punctuation symbols ., , , and a space. This gives 26 + 3 = 29 symbols. We can give rules to translate each symbol into a number.

a	b	c	...
0	1	2	...

Think of each symbol of the input message as a digit in base 29. The three letter sentence "go." would be a three-digit number in base 29. Think of each card (and orientation) as a number in base 108.

g	o	.
6	14	26

There are fewer symbols in base 29 than in base 108. We should therefore be able to store more than one symbol per card. How might we achieve this?

What is possible?

To start with, let's observe that a deck of 54 cards (52 cards plus jokers) has a very large number of possible orders. The first card can be any of 54. The second can be any of the remaining 53. This means the total number of arrangements is 54 factorial, written $ 54! = 54 \times 53 \times 52 \times ... $ which in decimal is

230843697339241380472092742683027581083278564571807941132288000000000000

or a bit over $ 2 \times 10^{71} $. That is a big number. Finding $ \log_2 $ of this number tells us how many binary digits, and therefore bits this is. The answer is a bit over 237. Dividing by 8 gives us approximately 29 bytes and 5 bits.

The question is then how to produce a set of rules for conveniently associating 237 bits of data (which might come in a human-readable format like JSON) with an ordering of a deck of cards.

There are

\[ 54! = 54 \times 53 \times 52 \times ... \approx 2 \times 10^{71} \]

possible arrangements of a deck of playing cards.

That's the equivalent of over 29 bytes of data.

Can you use them efficiently?

Encoding Cards

In this task, your group needs to design an efficient encoding of a chosen data type using a regular pack of playing cards. Your goal is to be able to encode any valid content of your chosen type, and to be able to communicate the largest input you can using only a deck of cards.

The instructions for the task are as follows:

Pick one of the content types listed below, or choose your own. The content should be a string (or something with a given representation as a string, such as image data). The suggested content types are:

IP Addresses
URLs
JSON objects
Images

Devise a way of encoding this data into a deck of cards. You should come up with a system of rules by which you can take any valid instance of your content type and produce an ordered deck of cards that encodes this content.

You may put a limit on the size of the content accepted.
Write down a complete description of your encoding system. This should include rules for encoding and decoding a message. It should allow another group to both encode an arbitrary string of your chosen content type into a deck of cards, and to take a deck of cards and recover the original string.
This may require you to specify restrictions on the size of content your rules can encode.

Generate a string of your chosen content type using one of the tools below (or if you have chosen your own content type, generate an instance of it somehow) and encode this in a pack of cards according to the system you have developed.

You must not adapt the rules of your system to the particular string to encode. Your task is to make rules that can encode any valid input of the correct type.

Pass the pack of cards and the description of your encoding system to another group. This group should now try to decode the content using your rules.
Try to answer the questions below.

You will have achieved your goal if your target groups successfully recovers the content you originally encoded using your rules.

Restrictions on rules are as follows, otherwise you are free to be creative in your solutions.

You cannot customise your rules to a particular message, the rules should handle any message of the content type
You cannot mark or damage the cards
The information must be encoded using the cards (you cannot, for example, write the message on paper)

This task is supposed to be a challenge. While your group might be able to dive straight in to to the problem posed above, it is more likely that you will need to first develop your understanding on simpler goals. Part of solving this task is learning how to solve it. You are in charge of how you choose to work towards this goal.

Encoding data with playing cards

Your task is to come up with a novel encoding for a particular type of information, e.g.

URLs
JSON objects

Write instructions to encode and decode this information into a deck of cards

Goal

Work in groups
Test that another group can follow your instructions to get the correct result
Submit your instructions to me before the end of the practical

Rules

Encode data using your system
Pass the instructions to another group to decode
Lost? Read the "Getting Started" section of the practical.

Further Questions

You can work on these questions at any time during the session, in groups or alone. Some require you to have designed your encoding system before answering them.

Why must you not change your rules when you know the particular content to encode? If you could change your rules in response to the content to send, how efficiently could you encode the content? (How many bits of information would you need to send?) Devise a trivial system that will only send a single particular content instance.

Content types

Pick from one of the content types below, or come up with your own.

In order to make the task easier, you can simplify the content if necessary. You may devise rules that can only encode a subset of possible content.

An IPv4 address is a sequence of four bytes, most commonly seen written as four dot-separated decimal numbers.

An example of a random IPv4 address is:

As IPv4 uses 32 bits to represent an address, there are $ 2^{32} = 4294967296 $ possible addresses. However, many of these addresses are reserved for special purposes.

More details of IPv4 can be found on Wikipedia and in RFC 791

A Uniform Resource Identifier (URI) is a way to uniquely identifying resources. For example, a document on the web is identified with a URI, but URIs can also be used for any other kind of resource or information. URIs follow a standardised syntax defined in RFC 3986

An example of a random URI is

There are 84 characters permitted in an URI, which are the following:

ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789-._~:/?#[]@!$&'()*+,;=

Other characters must be percent-encoded using the symbol % followed by two hexadecimal digits (case-insensitive). For example, a space is represented as %20. See section RFC 3986 section 2.1 for more details.

As the specification of URIs is quite complex, you might which to restrict what kinds of URI your rules will encode. For example, you may want to only consider a restricted subset of HTTP URIs, or you may wish to design rules that can encode a superset of URIs with some theoretical loss in efficiency.

JSON stands for JavaScript Object Notation. A JSON file contains either an object (encased in { }) or an array (encased in [ ]).

An array is a comma-separated list of values

[1, 2, 3, "hello", 1.4, [1, 2], {"name": "John"}, 4]

An object is a comma-separated list of key-value pairs

{ "name": "John", "age": 24, "scores": [1,2,3] }

Keys are strings encased in "". They are followed by a colon. Values can be either strings (encased in ""), numbers, objects, arrays, or true or false.

An example JSON file is below:

{
    "name": "John",
    "height": 1.23,
    "happy": true
}

You can find more details about the syntax of JSON on W3Schools.

Bitmap images can be thought of as two-dimensional grids of pixels. Each pixel has a colour. This colour is usually represented using three values: red, green and blue. Each value is stored using a certain number of bits, which determines the number of different colours that the image can represent.

There are many choices made in deciding how to encode images that affect the quality of the images stored, including how many colour channels to use and how many bits to use for each colour.

Given the limitations of encoding into a deck of cards, what image data can you store?

Advanced ideas

Because in the task above you knew a lot about the kind of data you were encoding, it is possible to make use of data compression techniques, such as Shannon coding. These rely on the data to transmit being non-random

When sending data (whether by cards or otherwise) it is easy to make mistakes. You can use error-correcting codes, such as a Hamming code or Parity bit