Set Theory

In this lecture, we are going to learn the fundamentals and notation of set theory. Set theory is of foundational importance in mathematics. However, it is also critical in understanding relational databases and other data structures. Thus through this lecture we are going to ground our discussion of set theory in its application to relational databases.

Our goal through this lecture will be to formalise a database-like data structure and introduce basic database operations. Along the way we will learn about some key concepts in mathematics, including:

Sets and set theory
Types
Tuples
Set theory operations

Mathematics and Problem Solving

7. Set Theory

David Gundry

Learning Outcomes

Formalise a relational database in set theory

Notate sets, tuples and types
Identify common types and classes of numbers

Evaluate formulas of set theory

Use common set theory relations
Recognise and use the names and symbols of common set theory operations
Apply the above operators to pairs of simple operands
Simplify arbitrarily complicated set theory expressions

Learning Outcomes

Formalise a relational database in set theory
Evaluate formulas of set theory

Introduction

Databases are everywhere. In almost every domain of life where there is a need to store and access data you will find that data stored in some kind of computer database. As a computer scientist, software engineer or games developer, you will need databases. Whether you are building a social network, collecting scientific data, managing assets, or tracking customer data, databases are an essential tool.

Relational databases store data in tables. Many other data structures exist: lists, arrays, graphs, and more. Set theory gives us tools to mathematically formalise many different kinds of data structure.

Why Set Theory?

Set Theory is the mathematical underpinning of relational databases. Understanding the basics of set theory will help you understand the fundamental operations on databases such as JOIN, UNION, IN, EXISTS, and more.

As computer scientists we want to be able to write algorithms for creating, searching, sorting, and manipulating databases. To write and program algorithms we need to formalise the data structures they use. We might want to:

Combine two different versions of a database table
Match data between different tables
Reason about whether records exist

This is what database management software allows us to do by applying concepts from set theory.

Last week we looked at propositional logic. This allows us to model human reasoning. However it is limited in only dealing with propositions. Propositional logic alone cannot tell us what is happening inside a proposition. For example, consider the following proposition:

Growltiger is a cat

If we just analysed this as a proposition $p$, we would not have learned anything. However, if we think about this statement, it seems to have two important parts: "Growltiger", being Growltiger, a thing. And "cat", which is a category or type of thing. If we represent Growltiger as $g$, and the set of cats as $C$, we can formalise the above statement as:

\[ g \in C \]

We can interpret this as meaning Growltiger is an element of the set of cats, or Growltiger is a cat. To do this, we are making use of a powerful branch of mathematics called set theory. Set theory is used whenever we want to reason about collections of things. As above, we can use it to construct propositions which we can reason about the truth value of.

Why set theory?

Set theory lets us describe collections of things

Databases (tables of data)
Functions (pairs of input and outputs)
Meanings (sets of exemplars)
...

Overview

Databases draw on mathematical concepts from set theory, and set theory will be our focus for this lecture. This will also be a necessary foundation for subsequent lectures where we will continue to build on set theory concepts and notation.

We will start off by learning how we can formalise a database using the concept of a set. We will also look into the concept of a type.

Once we have done this we will see how set theory allows us to reason about such sets. We will see how sets can be combined or intersected, how sets can contain other sets, and more.

Overview

The set
Comparing sets
Operations on sets

The Set

The set is a basic mathematical object, like the number. Arguably, sets are even more fundamental than numbers, as numbers can themselves be defined as particular sets.

A set is an unordered collection of elements without repetition. We write out sets by listing the elements it contains within curly braces. For example, this set of elements:

\[ \{ a, b, c\} \]

At the moment, the labels of our elements don't mean anything in particular. Set theory is not concerned with what these elements are but rather the structures in which they are contained. Set theory purists will say that these elements are just other sets. Sets containing sets, perhaps. To the set theorist, it's sets all the way down.

We might visualise the contents of this set as the following table. Each element of the set is a new record in our table, giving us the simplest possible database table. While this doesn't look particularly interesting right now, we will get to more interesting looking sets later.

set
a
b
c

The set with no elements in it is called the empty set and is written $\emptyset$ or $ \{ \} $. A set with exactly one element is called a singleton. Because the elements of a set can be other sets, thus it is quite possible to define a set such as:

\[ \{ \emptyset, \{ \emptyset \} \} \]

The Set

Sets

A set is an unordered collection, of unique elements e.g.

\[ \{ a, b, c, d, e \} \]

Empty set

The set with no elements is written:

\[ \emptyset \]

Properties of Sets

There are two important properties of sets to remember:

Unordered

Sets are unordered, meaning that the order in which we write the elements of a set makes no difference. We could write a set as $ \{a, b, c\} $ and we can write exactly the same set as $ \{c, b, a\} $. Imagine a database table: it doesn't matter which order we write the rows in the table, we still have the same data in the database.

No Repetition

Second, sets cannot contain repeated elements. All elements in a set are unique in that set. If I attempted to write one, such as $ \{a, a\} $, I still would not have a set, I would have a misleading bunch of symbols on a page. Similarly in databases, we do not have repeated records: you would never be able to tell them apart! Database records always differ in at least one variable (and we give each record a unique ID for this purpose).

Extensional and Intensional Set Definitions

The extension of a set is the elements it contains. You can define a set by giving its extension, such as:

\[ A = \{ a, b, c \} \]

This is also called an extensional definition.

We will see in a later lecture how you can also give an intensional definition of a set. This does not list its elements, but instead gives a rule for what counts as being in the set. For example, the following is a set of cats:

\[ \{ x \mid \text{x is a cat} \} \]

Defining Sets

Extensional Definition

Lists the contents of the set, e.g.

\[ A = \{ a, b, c \} \]

Intensional Definition

Defines the set by a rule, e.g.

\[ B = \{ x \mid \Phi(x) \} \]

Tuples

A tuple is an ordered collection that permits repetition, e.g.

\[ (a, b, b, c) \]

Numbers in Set Theory

If you were wondering how you could possibly found mathematics on simple rules about collections of things, consider the example of the von Neumann integers. This is a way of defining the natural numbers using just sets and the empty set:

\[ 0 = \emptyset \]

\[ 1 = \{ \emptyset \} = \{ 0 \} \]

\[ 2 = \{ \emptyset, \{ \emptyset \} \} = \{ 0, 1 \} \]

\[ 3 = \{ \emptyset, \{ \emptyset \}, \{ \emptyset, \{ \emptyset \} \} \} = \{ 0, 1, 2 \} \]

\[ N = \{ 0, 1, \dots, N-1 \} \]

Types

So what are these things in our sets? Yes, $a$, $b$, and $c$ above were labels, but what are they labels for?

In so-called naive set theory, you can put anything you like into a set. Unfortunately, naive set theory falls into paradoxes. One way out of these paradoxes is to introduce types. Types are extremely important in computer science. If you have done any programming, you will likely already be familiar with the concept of a type. You might have seen code in that looks like the following:

let myNumber: number = 4

Here a variable called myNumber is declared that has the type number. This means the variable can only store numbers. If you try to put anything else into the variable, you will get a compilation error; that is, doing so is invalid within the programming language.

Similarly, with our typed version of set theory, we will declare sets as being of a particular type. For now, we are going to assume the following types. We are not going to worry at this point about what these elements really are (e.g. whether the numbers just labels for special nested sets).

Types

Typed Sets

Let $S$ be a set of type $T$

\[ S : T \]

Natural numbers

\[ A : \mathbb{N} \]

\[ A = \{1, 2, 3\} \]

Truth values

\[ B : \{ 0, 1 \} \]

\[ B = \{ 1 \} \]

Common Types

Symbol	Elements
$ \mathbb{N} $	$ \{ 1, 2, 3, 4, 5, \dots \} $
$ \mathbb{R} $	$ \{ \dots, -0.001020\dots, e, \pi,\dots \} $
$ \mathbb{Q} $	$ \{ \dots, \frac{1}{1}, \frac{1}{2}, \frac{2}{3}, \dots \} $
$ \{ 0, 1 \} $	$ \{ 0, 1 \} $
$ \Sigma^* $	$ \{a, b, abc, aabcbdc, \dots \} $

Common Types

name	symbol	elements	description
natural numbers	$ \mathbb{N} $	$ \{ 1, 2, 3, 4, 5, \dots \} $	1 and every subsequent number to positive infinity
real numbers	$ \mathbb{R} $	$ \{ 0.001020\dots, 0, -24.242, \dots \} $	Any number on a number line between negative and positive infinity, to an infinite number of decimal places
boolean truth values	$ \{ 0, 1 \} $	$ \{ 0, 1 \} $	True or false
strings	$ \Sigma^* $	$ \{a, b, abc, aabcbdc, \dots \} $	Any string of characters from the alphabet $\Sigma$

Let's apply one of these types. For example, let's describe the set $A$, which will contain natural numbers. We will notate this as follows:

\[ A : \mathbb{N} \]

$ A : \mathbb{N} $ (read: $ A $ of type $ \mathbb{N} $) might be the set $ \{ 1, 2, 3\} $ as all the elements of this set are of the appropriate type. Note that $A$ could not be $ \{ 1, 2, false \} $ as would not satisfy this type because not all of its elements are natural numbers.

$ A : \mathbb{N} $
1
2
3

Note that because we have defined $A$ as containing only natural numbers we cannot later change our minds and permit something that is not a natural number into $A$. (Just as attempting something similar in a programming language would give us a compilation error). The power of this approach, which is fully realised in a branch of mathematics called Type Theory, is that we can infer all sorts of things from these types. For example, say we take two elements of $ \mathbb{N} $ and add them together. We know that the answer will also be an element of $ \mathbb{N} $. However, for now, we will not delve into Type Theory and will merely gerry-rig types into set theory.

Product Types

Let us take two types, the type of and the type of .

$ \mathbb{N} $ and $ \mathbb{N} $ are types. $ \mathbb{N} \times \mathbb{N} $ is also a type. This type consists of pairs of elements, where the first member of each pair is from $ \mathbb{N} $ and the second member of each pair is from $ \mathbb{N} $. All possible such pairs are members of the type $ \mathbb{N} \times \mathbb{N} $

\[ \mathbb{N} \times \mathbb{N} = \{ (1, 1), (2, 2), (3, 3), ...\} \]

$ \mathbb{N} $	$ \mathbb{N} $
1	1
2	2
3	3
...

Here we see a new structure. The structure $(a, b)$ is called a tuple. Tuples are ordered pairs of elements, though you can also have triples, quadruples, and more generally n-tuples containing $n$ elements in order. You can construct these purely out of sets if you want to; the bracket notation can be seen as a notational convenience.

We could construct the product type out of more than two types at a time, for example $ \mathbb{N} \times \mathbb{R} \times \{0, 1\} $. It would contain an infinite number of elements, including, for example:

\[ \{ (1, 0.01, 1), (1, 23.344, 0), \dots \} \]

Product Type

Product types contain tuples (pairs, triples, etc.), e.g.

\[ A : \mathbb{N} \times \mathbb{N} = \left\{ \begin{array}{l} (1, 1), \\ (2, 1), \\ (3, 2) \\ \end{array} \right\} \]

If $A$ is a table, each tuple is a record

$A$
$ \mathbb{N} $	$ \mathbb{N} $
1	1
2	1
3	2

Set Types

If $A$ is a type, then $ \mathbb{P}(A) $ is the type of sets of type $A$.

Let's see an example using the type of

Thus the type $ \mathbb{P}(\mathbb{N}) $ is the type of sets of natural numbers. $ \mathbb{P}(\mathbb{N}) $ contains, for instance $ \emptyset, \{1\}, \{1,2 \} $ and (infinitely) many more.

Set Types

If $A$ is a type, $ \mathbb{P}(A) $ is the type of sets of $A$, e.g.

\[ A : \mathbb{P} (\mathbb{N}) = \left\{ \begin{array}{l} \emptyset, \\ \{ 1 \}, \\ \{ 1, 2 \}, \\ \{ 2 \}, \\ \{ 2, 3 \}, \\ \{ 1, 2, 3 \} \\ \end{array} \right\} \]

Databases

Now we can start to define sets that look a bit more like databases. In these sets we can store different types of data.

We might imagine that we have a set of type $ \mathbb{N} \times \Sigma^* \times \{ 0, 1\} $ that looks like the following:

$ \mathbb{N} $	$ \Sigma^* $	$ \{ 0, 1 \} $
0	John	1
1	Mary	1
2	Bill	0
...	...	...

Written in set notation, this would be:

\[ \{ (0, John, 1), (1, Mary, 1), (2, Bill, 0), \dots \} \]

Now we can formalise a database table as a set using set theory. Next we will see what we can do with sets. We will see ways of reasoning about them. These will help us to understand the mathematical underpinnings of databases.

Database type example

To formalise a database with three fields

User ID
Username
Enrolled

We might declare the type of the database as follows

\[ \mathbb{N} \times \Sigma^* \times \{ 0, 1\} \]

$ \mathbb{N} $	$ \Sigma^* $	$ \{ 0, 1 \} $
0	John	1
1	Mary	1
2	Bill	0
...	...	...

Set Elements

Set elements

Element-of

Sets (with the exception of the empty set) contain elements. For example, the set $ \{ a, b, c \} $ has as $a$ as an element. I can write this statement formally using set theory notation (here $ \in $ is to be read "is an element of"):

\[ a \in A \]

The above statement is a proposition. It evaluates to either $\top$ or $\bot$. Because $a$ is indeed in $ \{a, b, c\} $, the above proposition is true.

Element of

Let $a$ be a member of set $S$, written:

\[ a \in S \]

Law 1.1

\[ \neg ( s \in S ) \iff (s \not \in S) \]

Law 1.2

There are no elements of the empty set

\[ ( x \in \emptyset ) \iff \bot \]

Applying Set Theory Propositions

We can formalise a table in a relational database as a set of tuples. The attributes stored in the table dictate the type of the set. Consider the following database table. It is defined by columns or attributes (the types of data stored about each record), and rows or records (the particular data stored)

$ \mathbb{N} $	$ \Sigma^* $	$ \mathbb{R} $
1	'John'	2.5

To formalise this table as a set, we would describe a set $D$ of type $ \mathbb{N} \times \Sigma^* \times \mathbb{R} $. Let $D$ describe the table above. Thus $D$ is a set with one element.

Each row of the table is called a record or a tuple. For example, we know that $ (1, 'John', 2.5) \in D $, as this tuple appears in the table above. However, $ (2, 'Bill', 1.2) \not \in D $ because this record does not appear in the table above.

Set theory lets us formalise propositions - true or false statements. We can then incorporate these into propositional logic. For instance, how might we formalise the following statement in set theory and propositional logic?

"If Growltiger is a cat then Growltiger is sleepy"

First, let us define the following non-logical symbols:

$g$ (Growltiger),
$C$ (the set of cats), and
$S$ (the set of sleepy things):

Now we can express the above proposition as a statement of propositional logic and set theory:

\[ ((g \in C) \implies (g \in S)) \]

Database example

Let the set $ D : \mathbb{N} \times \Sigma^* \times \mathbb{R} $ represent the table

$ \mathbb{N} $	$ \Sigma^* $	$ \mathbb{R} $
1	'John'	2.5

A tuple $r$ (a record) is stored in this database iff

\[ r \in D \]

\[ (1, 'John', 2.5) \in D \]

\[ (2, 'Bill', 1.2) \not \in D \]

Logic example

"If Growltiger is a cat then Growltiger is sleepy"

Let us define the following non-logical symbols:

$g$ (Growltiger),
$C$ (the set of cats), and
$S$ (the set of sleepy things):

\[ ((g \in C) \implies (g \in S)) \]

Comparing Sets

So far we have defined what a set is. To recap, a set is an unordered collection of unique elements. Now we are going to look at what we can do with sets. In particular, how can we reason about them?

We are going to see the following relations:

Name	Symbol	Usage	Description	Example
element of	$\in$	$a \in A$	$a$ is an element of $A$	$ a \in \{ a \} $
subset	$\subseteq$	$A \subseteq B$	$A$ only contains elements that are also in $B$	$ \{ a \} \subseteq \{ a, b \} $
superset	$\supseteq$	$A \supseteq B$	$A$ contains all the elements in $B$	$ \{ a, b \} \supseteq \{ a, b \} $
equals	$=$	$A = B$	$A$ and $B$ are the same set: they contain exactly the same elements	$ \{ a \} = \{ a \} $
not equals	$\neq$	$A \neq B$	$A$ and $B$ are not the same set: they contain different elements	$ \{ a \} \neq \{ a, b \} $
proper subset	$\subset$	$A \subset B$	$A$ only contains elements that are also in $B$, but not all of them	$ \{ a \} \subset \{ a, b \} $
proper superset	$\supset$	$A \supset B$	$A$ contains all the elements in $B$ and at least one more	$ \{ a, b \} \supset \{ a \} $

Comparing Sets

Name	Symbol	Example
element of	$\in$	$ a \in \{ a \} $
subset	$\subseteq$	$ \{ a \} \subseteq \{ a, b \} $
superset	$\supseteq$	$ \{ a, b \} \supseteq \{ a, b \} $
equality	$=$	$ \{ a \} = \{ a \} $
proper subset	$\subset$	$ \{ a \} \subset \{ a, b \} $
proper superset	$\supset$	$ \{ a, b \} \supset \{ a \} $

Subset and Superset

Subset and superset express whether one set contains another. A subset is a set contained by another set. A superset is a set that contains another set. Consider for instance the following diagram representing two sets labelled $T$ and $S$. Set $S$ is contained within the set $T$:

Here $S$ is a subset of $T$ (written $S \subseteq T$) because all the elements of $S$ must also be elements of $T$, give how it is drawn. Equivalently, $T$ is a superset of $S$ (written $T \supseteq S)$, because $T$ contains all the elements that $S$ does. This gives us our two most common relations between sets:

\[ \subseteq \qquad \supseteq \]

By putting a slash through the subset or superset symbol, we express its negation, e.g. 'not subset' ($\not \subseteq$), 'not superset' ($\not \supseteq$), etc.

Subset

Let all elements of $A$ be in $B$, we may write this:

\[ A \subseteq B \]

If two sets are equal, they are also subsets each other

Law 2.1

\[ ( ( S \subseteq T ) \wedge ( T \subseteq S ) ) \iff ( S = T ) \]

$ A $

$ B $

Subset

Law 2.2

The empty set is a subset of every set

\[ \emptyset \subseteq S \]

$ \emptyset $

$ A $

Law 2.3

Every set is a subset of itself

\[ S \subseteq S \]

$ B $

Proper Subset and Proper Superset

From here we can define two more relations. When a set is a subset of another set but is not equal to that set, it is called a proper subset. Similarly, when a set is a superset of another set but is not equal to it, it is called a proper superset.

To write these we use the proper subset and proper superset symbols:

\[ \subset \qquad \supset \]

Proper Subset

Let $A$ be a set smaller than $B$ whose elements are all in $B$, written:

\[ A \subset B \]

Law 2.5

A subset is either a proper subset or equal

\[ S ⊆ T \iff ( S ⊂ T ) \vee (S = T ) \]

$ A $

$ B $

Superset

Let $A$ be a set containing all elements in $B$, written:

\[ A \supseteq B \]

$ A $

$ B $

Proper Superset

Let $A$ be larger than $B$ and contain $B$:

\[ A \supset B \]

$ A $

$ B $

Proof example

Task

Prove the statement $ ( S ⊆ \emptyset ) \implies (S = \emptyset ) $ is true for all sets $S$

Approach

Apply axiom schema of set theory and propositional logic in a formal proof

Proof example

$ S ⊆ ∅ ⇒ S = ∅ $
(= PL Law 4.1 $\quad (p \implies q) \iff ((¬p) \vee q)$)
$ ¬(S \subseteq \emptyset ) \vee ( S = \emptyset ) $
(= PL Law 2.2 $\quad (\phi \wedge \top) \iff \phi$)
$ ¬(S \subseteq \emptyset \wedge \top) \vee ( S = \emptyset ) $
(= ST Law 2.2 $\quad \emptyset \subseteq S$)
$ ¬(S ⊆ ∅ ∧ ∅ ⊆ S) \vee (S = \emptyset) $
(= ST Law 2.1 $\quad ( S ⊆ T ) \wedge (T ⊆ S ) \iff S = T$ )
$ ¬(S = \emptyset) \vee (S = ∅) $
(= PL Law 3.7 $\quad (\neg \phi \vee \phi) \iff \top$)
$ \top $

Relating Sets

Imagine we are managing the database for a website when the hard drive of the server fails. We lose the data it contains. Luckily, we have kept regular backups of our database. Unfortunately, we do not know when these backups were taken. We have narrowed down the correct one to two versions that we will call $A$ and $B$.

uid	username	password_sha1_hash
1	john	5f4dcc3b5aa765d61d8327deb882cf99
2	mary	25f9e794323b453885f5181f1b624d0b
3	bill	e8375d7cd983efcbf956da5937050ffc

uid	username	password_sha1_hash
1	john	5f4dcc3b5aa765d61d8327deb882cf99
2	mary	25f9e794323b453885f5181f1b624d0b

We can express following true propositions about these two sets.

\[ A \not = B \quad A \subseteq B \quad A \subset B \]

\[ B \supseteq A \quad B \supset A \quad B \not \subseteq A \]

\[ B \not \subset A \quad A \not \supseteq B \quad A \not \supset B \]

We know that to avoid data loss we need to restore database $A$ as this is a proper superset of $B$, written $A \supset B$.

Set Operations

So far we can define sets and say whether one set contains another. We can say a lot more about sets when we combine sets using the operators of set theory. The following operators take two sets as arguments and return a set, usually with different elements.

Name	Symbol	Usage	Description	Example
union	$\cup$	$A \cup A$	Returns all elements in either $A$ or $B$	$ \{ a \} \cup \{ b \} = \{ a, b \} $
intersection	$\cap$	$A \cap B$	Returns only elements in both $A$ and $B$	$ \{ a, b \} \cap \{ b, c \} = \{ b \} $
set difference	$\setminus$	$A \setminus B$	Returns elements from $A$ except those in $B$	$ \{ a, b \} \setminus \{ b \} = \{ a \} $
cardinality	$\#$	$\#(A)$	Returns the number of elements in $A$	$ \#(\{ a, b \}) = 2 $
power set	$ \mathbb{P} $	$ \mathbb{P} $	Returns the set of all subsets of $A$	$ \mathbb{P}(\{ a, b \} = \{ \emptyset, \{a\}, \{b\}, \{a, b\} \} $
cartesian product	$\times$	$A \times B$	Returns all pairs $(a, b)$ where $a \in A$ and $b \in B$	$ \{ a \} \times \{ 1 \} = \{(a, 1), (a, 2)\} $

In general we will assume that we always apply these operations to sets of the same type.

Set Operations

Name	Symbol	Example
union	$\cup$	$ \{ a \} \cup \{ b \} = \{ a, b \} $
intersection	$\cap$	$ \{ a, b \} \cap \{ b, c \} = \{ b \} $
set difference	$\setminus$	$ \{ a, b \} \setminus \{ b \} = \{ a \} $
cardinality	$\#$	$ \#(\{ a, b \}) = 2 $
power set	$ \mathbb{P} $	$ \mathbb{P}(\{ a, b \} = \{ \emptyset, \{a\}, \{b\}, \{a, b\} \} $
cartesian product	$\times$	$ \{ a \} \times \{ 1 \} = \{(a, 1), (a, 2)\} $

Union

Let $A$ and $B$ be sets, we can write their union

\[ A \cup B \]

Law 4.1

An element of the union of $S$ and $T$ is an element of at least one of $S$ and $T$

\[ a ∈ (S \cup T) \iff (a ∈ S) \vee (a ∈ T) \]

Intersection

Let $A$ and $B$ be sets, we can write their intersection

\[ A \cap B \]

Law 5.1

An element of the intersection of $S$ and $T$ is an element of both $S$ and $T$

\[ a ∈ (S \cap T) \iff (a ∈ S) \wedge (a ∈ T) \]

Generalised Operations

Let $A$, $B$, and $C$, be sets. The union of $A$, $B$, and $C$, can be written

\[ \bigcup\{A, B, C\} \]

The intersection of $A$, $B$, and $C$, can be written

\[ \bigcap\{A, B, C\} \]

Set Difference

Let $A$ and $B$ be sets, we can write their set difference

\[ A \setminus B \]

Law 6.1

Elements of the set difference $S \setminus T$ are those in $S$ and not in $T$

\[ a ∈ (S \setminus T) ⟺ ( a ∈ S ∧ a ∉ T) \]

Cardinality

The number of elements in a set $A$ is its cardinality, written:

\[ \# A \]

For example $ \# \{a, b\} = 2 $

Power Set

The set of subsets of a set $A$ is its power set, written:

\[ \mathbb{P}(A) \]

For example

\[ \mathbb{P}(\{ a, b \}) = \{ \emptyset, \{a\}, \{b\}, \{a, b\} \} \]

$ A $

$ \mathbb{P}(A) $

Cartesian Product

Let $A$ and $B$ be sets. The set of all pairs $(a, b)$ where $a \in A$ and $b \in B$ is written

\[ A \times B \]

\[ \{ a, b \} \times \{ 1, 2 \} = \left\{ \begin{array}{l} (a, 1), \\ (a, 2), \\ (b, 1), \\ (b, 2) \\ \end{array} \right\} \]

$ A $

$ B $

Real-world examples

Imagine John and Mary have each seen many movies. They want to watch something together,

Let us model this situation in set theory. We will describe two sets, $J$ (the movies John has seen) and $M$ (the movies Mary has seen). Assume that we can ask John and Mary to write us out a list. Further, let $U$ be the universal set of all movies (that John and mary would consider watching). If we were building a movie recommendation app, we might get this list from IMDB.

John	Mary
Yes Man (2008)	The Paperboy (2012)
Underdog (2007)	Nowhere to Run (1993)
Where the Heart Is (2000)	I, Robot (2004)
Mean Streets (1973)	The Voices (2014)
I, Robot (2004)	Bangkok Dangerous (2008)
Control (2007)	Tequila Sunrise (1988)
...	...

Say John is willing to rewatch a movie he has already seen, but Mary does not want to rewatch a movie. If so, we need to pick a movie from the set $U \setminus M$, the set of movies Mary has not watched. If neither John nor Mary is willing to rewatch a movie, we would need to select a movie from the set $U \setminus (J \cup M)$. If John and Mary both are wanting to watch something familiar, they would need to pick from $J \cap M$.

Being able to formalise the logic behind picking a movie, we are better placed to, for example, write a recommendation algorithm.

Imagine we are developing a photo management app that stores photographs in the cloud. This app can be used on multiple devices simultaneously, e.g. on a mobile and a laptop. These devices might not always have an internet connection. We could easily find each device has made changes to its local database. New photographs may have been taken, or old photographs deleted. We want our app to behave appropriately when these two devices have been synced with the cloud storage server again.

Phone

image uuid	filename	date/time
69ba4909-2d27-4824-b7a8-0801b8103ba2	IMG_1203.png	22/6/23 14:23
58f4a45c-a054-4de9-8315-d8701a170d90	IMG_1204.png	23/6/23 10:25

Laptop

image uuid	filename	date/time
69ba4909-2d27-4824-b7a8-0801b8103ba2	IMG_1203.png	22/6/23 14:23
40ccddbd-2b7c-4324-8623-6d3b8c5f1e3a	IMG_1204.png	23/6/23 12:01

We have two tables of photographs, which we will label $Mobile$ and $Laptop$. These two tables have got out of sync and we want our app to be able to deal with this.

The obvious thing we might want to do in this situation is to find all the photos that exist on either the mobile or the laptop. In other words, we want to include everything from both databases. Perhaps we would want to upload all of these photos to our cloud storage provider.

To find the database table that contains all the elements of two database tables, we are performing a union operation.

\[ Laptop \cup Mobile \]

image uuid	filename	date/time
69ba4909-2d27-4824-b7a8-0801b8103ba2	IMG_1203.png	22/6/23 14:23
40ccddbd-2b7c-4324-8623-6d3b8c5f1e3a	IMG_1204.png	23/6/23 12:01
58f4a45c-a054-4de9-8315-d8701a170d90	IMG_1204.png	23/6/23 10:25

$A \cup B$ returns a set containing all the elements in either of the sets $A$ and $B$. For instance,

\[ \{ a, b \} \cup \{a, c\} = \{a, b, c\} \]

Let's say we want to minimise the amount of user data we are storing in the cloud. After all, cloud storage costs money. We might decide that it is safe to delete those photos that exist on all of a users devices.

We would be able to find these photos by using the intersection operation.

\[ Laptop \cap Mobile \]

image uuid	filename	date/time
69ba4909-2d27-4824-b7a8-0801b8103ba2	IMG_1203.png	22/6/23 14:23

$A \cap B$ returns a set containing only those elements that are both in set $A$ and in set $B$. For instance,

\[ \{ a, b \} \cap \{a, c\} = \{a\} \]

Let's say we want to copy the new photos from the mobile onto the laptop directly. For reasons of efficiency, we naturally don't want to copy any that already exist on the laptop. That is, we want to create a table of all of the photos on the mobile except those that exist on the laptop.

To do this, we need the set difference operation:

\[ Mobile \setminus Laptop \]

image uuid	filename	date/time
69ba4909-2d27-4824-b7a8-0801b8103ba2	IMG_1203.png	22/6/23 14:23

$A \setminus B$ returns a set containing the elements in set $A$ taking away any that also appear in set $B$. For instance,

\[ \{ a, b \} \setminus \{a, c\} = \{b\} \]

Our cloud photo management app is expensive to run, so we want to bill customers for the number of photographs they store. Here, of course, we need to count all of the photos on any device, so we first find the union.

\[ \#(Laptop \cup Mobile ) = 3 \]

The cardinality of a set is the number of elements it contains. For example, the following set has a cardinality of 3:

\[ \#(\{a, b, c\} ) = 3 \]

Summary

Sets

Unordered collection with no repetition

\[ A = \{ a, b, c \} \]

\[ A = \{ x : \Phi(x) \} \]

Tuples (ordered) are written $(a, b, c)$

Sets (can be) of a particular type

Set Relations

Name	Symbol	Example
element of	$\in$	$ a \in \{ a \} $
subset	$\subseteq$	$ \{ a \} \subseteq \{ a, b \} $
superset	$\supseteq$	$ \{ a, b \} \supseteq \{ a, b \} $
equality	$=$	$ \{ a \} = \{ a \} $
proper subset	$\subset$	$ \{ a \} \subset \{ a, b \} $
proper superset	$\supset$	$ \{ a, b \} \supset \{ a \} $

Set Operations

Name	Symbol	Example
union	$\cup$	$ \{ a \} \cup \{ b \} = \{ a, b \} $
intersection	$\cap$	$ \{ a, b \} \cap \{ b, c \} = \{ b \} $
set difference	$\setminus$	$ \{ a, b \} \setminus \{ b \} = \{ a \} $
cardinality	$\#$	$ \#(\{ a, b \}) = 2 $
power set	$ \mathbb{P} $	$ \mathbb{P}(\{ a, b \} = \{ \emptyset, \{a\}, \{b\}, \{a, b\} \} $
cartesian product	$\times$	$ \{ a \} \times \{ 1 \} = \{(a, 1), (a, 2)\} $

Symbol	Elements
\( \mathbb{N} \)	\( \{ 1, 2, 3, 4, 5, \dots \} \)
\( \mathbb{R} \)	\( \{ \dots, -0.001020\dots, e, \pi,\dots \} \)
\( \mathbb{Q} \)	\( \{ \dots, \frac{1}{1}, \frac{1}{2}, \frac{2}{3}, \dots \} \)
\( \{ 0, 1 \} \)	\( \{ 0, 1 \} \)
\( \Sigma^* \)	\( \{a, b, abc, aabcbdc, \dots \} \)

Name	Symbol	Example
element of	$\in$	\( a \in \{ a \} \)
subset	$\subseteq$	\( \{ a \} \subseteq \{ a, b \} \)
superset	$\supseteq$	\( \{ a, b \} \supseteq \{ a, b \} \)
equality	$=$	\( \{ a \} = \{ a \} \)
proper subset	$\subset$	\( \{ a \} \subset \{ a, b \} \)
proper superset	$\supset$	\( \{ a, b \} \supset \{ a \} \)

Name	Symbol	Example
union	$\cup$	\( \{ a \} \cup \{ b \} = \{ a, b \} \)
intersection	$\cap$	\( \{ a, b \} \cap \{ b, c \} = \{ b \} \)
set difference	$\setminus$	\( \{ a, b \} \setminus \{ b \} = \{ a \} \)
cardinality	$\#$	\( \#(\{ a, b \}) = 2 \)
power set	\( \mathbb{P} \)	\( \mathbb{P}(\{ a, b \} = \{ \emptyset, \{a\}, \{b\}, \{a, b\} \} \)
cartesian product	$\times$	\( \{ a \} \times \{ 1 \} = \{(a, 1), (a, 2)\} \)