diff --git a/year4/semester2/CS4423/materials/CS4423-W04-1.pdf b/year4/semester2/CS4423/materials/CS4423-W04-1.pdf new file mode 100644 index 00000000..d7d9a0e3 Binary files /dev/null and b/year4/semester2/CS4423/materials/CS4423-W04-1.pdf differ diff --git a/year4/semester2/CS4423/materials/CS4423-W04-2.pdf b/year4/semester2/CS4423/materials/CS4423-W04-2.pdf new file mode 100644 index 00000000..c297b681 Binary files /dev/null and b/year4/semester2/CS4423/materials/CS4423-W04-2.pdf differ diff --git a/year4/semester2/CS4423/materials/CS4423-W04-Jupyter.pdf b/year4/semester2/CS4423/materials/CS4423-W04-Jupyter.pdf new file mode 100644 index 00000000..d79d08de Binary files /dev/null and b/year4/semester2/CS4423/materials/CS4423-W04-Jupyter.pdf differ diff --git a/year4/semester2/CS4423/materials/CS4423-W05-1.pdf b/year4/semester2/CS4423/materials/CS4423-W05-1.pdf new file mode 100644 index 00000000..874da107 Binary files /dev/null and b/year4/semester2/CS4423/materials/CS4423-W05-1.pdf differ diff --git a/year4/semester2/CS4423/notes/CS4423.pdf b/year4/semester2/CS4423/notes/CS4423.pdf index 4a70f24b..2d1c6abc 100644 Binary files a/year4/semester2/CS4423/notes/CS4423.pdf and b/year4/semester2/CS4423/notes/CS4423.pdf differ diff --git a/year4/semester2/CS4423/notes/CS4423.tex b/year4/semester2/CS4423/notes/CS4423.tex index 05137444..30ee76f5 100644 --- a/year4/semester2/CS4423/notes/CS4423.tex +++ b/year4/semester2/CS4423/notes/CS4423.tex @@ -392,6 +392,7 @@ Obviously, $a_{i,j}$ is the number of walks of length 1 between node $i$ and nod We can extract that information for node $j$ by computing the product of $A$ and $e_j$ (column $j$of the identity matrix). \section{Connectivity \& Permutations} +\subsection{Notation} To start, let's decide on our notation: \begin{itemize} \item If we write $A = (a_{i,j})$, we mean that $A$ is a matrix and $a_{i,j}$ is its entry row $i$, column $j$. @@ -401,16 +402,332 @@ To start, let's decide on our notation: \item When we write $A > 0$, we mean that all entries of $A$ are positive. \end{itemize} +\subsection{Counting Walks} Recall that the \textbf{adjacency matrix} of a graph $G$ of order $N$ is a square $n \times n$ matrix $A = (a_{i,j})$ with rows and columns corresponding to the nodes of the graph. $a_{i,j}$ is set to be the number of edges between nodes $i$ and $j$. We learned previously that: \begin{itemize} - \item If $e_j$ is the $j^\text{th}$ column of the + \item If $e_j$ is the $j^\text{th}$ column of the identity matrix $I_n$, then $(Ae_j)_i$ is the number of walks of length 1 from node $i$ to node $j$. + Also, it is the same as $a_{i,j}$. + + \item Moreover, $(A(Ae_j))_i = (A^2e_j)$ is the number of walks of length 2 from node $i$ to node $j$. + We can conclude that, if $B=A^2$, then $b_{i,j}$ is the number of walks of length 2 between nodes $i$ and $j$. + Note that $b_{i,i}$ is the degree of node $i$. + + \item In fact, if $B=A^k$, then $b_{i,j}$ is the number of walks of length $k$ between nodes $i$ and $j$. \end{itemize} +\subsection{Paths} +A \textbf{trail} is walk with no repeated edges. +A \textbf{cycle} is a trail in which the first and last nodes are the same, but no other node is repeated; +a \textbf{triangle} is a cycle of length 3. +A \textbf{path} is a walk in which no nodes (and so no edges) are repeated. +(The idea of a path is hugely important in network theory, and we will return to it often). +\\\\ +The \textbf{length} of a path is the number of edges in that path. +A path from node $u$ to node $v$ is a \textbf{shortest path} if there is no path between them that is shorter (although there could be other paths of the same length). +Finding shortest paths in a network is a major topic that we will return to at another time. +\begin{itemize} + \item Every path is also a walk. + \item If a particular walk is the shortest walk between two nodes then it is also the shortest path between two nodes. + \item If $k$ is the smallest natural number of which $(A^k)_{i,j} \neq 0$, then the shortest walk from node $i$ to node $j$ is of length $k$. + \item It follows that $k$ is also the length of the shortest path from node $i$ to node $j$. +\end{itemize} +For example, consider the following adjacency matrix and its powers: +\begin{align*} +A = +\begin{pmatrix} + 0 & 1 & 0 & 0 & 0 \\ + 1 & 0 & 1 & 0 & 0 \\ + 0 & 1 & 0 & 1 & 1 \\ + 0 & 0 & 1 & 0 & 1 \\ + 0 & 0 & 1 & 1 & 0 +\end{pmatrix} +\end{align*} +\begin{align*} +A^2 = +\begin{pmatrix} + 1 & 0 & 1 & 0 & 0 \\ + 0 & 2 & 0 & 1 & 1 \\ + 1 & 0 & 3 & 0 & 0 \\ + 0 & 1 & 1 & 2 & 1 \\ + 0 & 1 & 1 & 1 & 2 +\end{pmatrix} +\end{align*} +\begin{align*} +A^3 = +\begin{pmatrix} + 0 & 2 & 0 & 1 & 1 \\ + 2 & 0 & 4 & 1 & 1 \\ + 0 & 4 & 2 & 4 & 4 \\ + 1 & 1 & 4 & 2 & 3 \\ + 1 & 1 & 4 & 3 & 2 +\end{pmatrix} +\end{align*} + +We can observe that, where $A$ is the adjacency matrix of the graph $G$: +\begin{itemize} + \item $(A^2)_{i,i}$ is the degree of node $i$. + \item $\text{tr}(A^2)$ is the degree sum of the nodes in $G$. + \item $(A^3)_{i,i} \neq 0 $ if node $i$ is in a triangle. + \item $\frac{\text{tr}(A^3)}{6}$ is the number of triangles in $G$. + \item If $G$ is bipartite, then $(A^3)_{i,j} = 0$ for all $i,j$. +\end{itemize} + +\subsection{Connectivity} +Let $G$ be a graph and $A$ its adjacency matrix: +in $G$, node $i$ can be \textbf{reached} from node $j$ if there is a path between them. +If node $i$ is reachable from node $j$, then $(A^k)_{i,j} \neq 0$ for some $k$. +Also, note that $k \leq n$. +Equivalently, since each power of $A$ is non-negative, we can say that $(I + A + A^2 + A^3 + \cdots + A^k) > 0$. +\\\\ +A graph/network is \textbf{connected} if there is a path between every pair of nodes. +That is, every node is reachable from every other node. +If a graph is not connected, we say that it is \textbf{disconnected}. +Determining if a graph is connected or not is important; we'll see later that this is especially important with directed graphs. +A graph $G$ of order $n$ is connected if and only if, for each $i,j$, there is some $K \leq n$ for which $(A^k)_{i,j} \neq 0$. + +\subsection{Permutation Matrices} +We know that the structure of a network is not changed by labelling its nodes. +Sometimes, it is useful to re-label the nodes in order to expose certain properties, such as connectivity. +Since we think of the nodes as all being numbered from 1 to $n$, this is the same as \textbf{permuting} the numbers of some subset of the nodes. + +\begin{figure}[H] + \centering + \includegraphics[width=0.7\textwidth]{./images/permexampe.png} + \caption{ Example wherein nodes are re-labelled to expose certain properties of the graph } +\end{figure} + +When working with the adjacency matrix of a graph, such a permutation is expressed in terms of a \textbf{permutation matrix} $P$; +this is a $0$-$1$ matrix (also known as a Boolean or a binary matrix) where this is a single $1$ in every row \& column. +If the nodes of a graph $G$ (with adjacency matrix $A$) are listed as entries in a vector ${q}$, then: +\begin{itemize} + \item $Pq$ is a permutation of the nodes. + \item $PAP^T$ is the adjacency matrix of the graph with that node permutation applied. +\end{itemize} + +In many examples, we will have a symmetric $P$ for the sake of simplicity, but in general, $P \ neq P^T$. +However, $P^TAP = PAP^T$and $P^T = P^{-1}$ so $PAP^T = PAP^{-1}$. +\\\\ +A graph with adjacency matrix $A$ is \textbf{disconnected} if and only if there is a permutation matrix $P$ such that +\begin{align*} +A &= P \begin{pmatrix} X & O \\ O^T & Y \end{pmatrix} & +PAP^T &= P \begin{pmatrix} X & O \\ O & Y \end{pmatrix} +\end{align*} +where $O$ represents the zero matrix with the same number of rows as $X$ and the same number of columns as $Y$. + +\section{Permutations \& Bipartite Networks} +\subsection{Graph Connectivity} +Recall that a graph is \textbf{connected} if there is a path between every pair of nodes. +If the graph is not connected, we say that it is \textbf{disconnected}. +We now know how to check if a graph is connected by looking at powers of its adjacency matrix. +However, that is not very practical for large networks. +Instead, we can determine if a graph is connected by just looking at the adjacency matrix, provided that we have ordered the nodes properly. + +\subsection{Connected Components} +If a network is not connected, then we can divide it into \textbf{components} which \textit{are} connected. +The number of connected components is the number of blocks in the permuted adjacency matrix. + +\begin{figure}[H] + \centering + \includegraphics[width=0.7\textwidth]{./images/connectedcomponents.png} + \caption{ Connected components example } +\end{figure} + +\section{Bipartite Networks: Colours \& Computations} +\subsection{Class Survey Example} +\begin{figure}[H] + \centering + \includegraphics[width=0.7\textwidth]{./images/suverydata.png} + \caption{ Final survey data } +\end{figure} + +\begin{figure}[H] + \centering + \includegraphics[width=0.7\textwidth]{./images/suverygaaph.png} + \caption{ Final survey graph, with order 39 and size 87 } +\end{figure} + +\begin{figure}[H] + \centering + \includegraphics[width=0.7\textwidth]{./images/surveysubgrap.png} + \caption{ Subgraph of the survey network based on 7 randomly chosen people, with order 16 and size 24 } +\end{figure} + +\begin{figure}[H] + \centering + \includegraphics[width=0.7\textwidth]{./images/surveymatrix.png} + \caption{ Adjacency matrix where the nodes for people are listed first } +\end{figure} + +\begin{figure}[H] + \centering + \includegraphics[width=0.7\textwidth]{./images/surveymatrix2.png} + \caption{ $B = A^2$ } +\end{figure} + +Since we know from before that $(A^k)_{i,j}$ is the number of walks of length $k$between nodes $i$ and $j$, we can see that in this context: +\begin{itemize} + \item For the first 7 rows \& columns, $b_{i,j}$ is the number of programmes in common between person $i$ and person $j$. + (This even works for $i=j$, but the number of programmes a person has in common with themselves is just the number they watch). + \item For the last 9 rows \& columns, $b_{i,j}$ is the number of people who watch both programmes $i$ and $j$. +\end{itemize} + +\subsection{Projections} +Given a bipartite graph $G$ whose node set $V$ has parts $V_1$ \& $V_2$, and \textbf{projection} of $G$ onto (for example) $V_1$ is the graph with: +\begin{itemize} + \item Node set $V_1$; + \item An edge between a pair of nodes in $V_1$ if they share a common neighbour in $G$. +\end{itemize} + +In the context of our survey example, a projection onto $V_1$ (people/actors) gives us the graph of people who share a common programme. +To make such a graph: +\begin{itemize} + \item Let $A$ be the adjacency matrix of $G$. + \item Let $B$ be the submatrix of $A^2$ associated with the nodes in $V_1$. + \item Let $C$ be the adjacency matrix with the property: + \begin{align*} + c_{i,j} = + \begin{cases} + 1 & b_{i,j} > 0 \text{ and } i \neq j \\ + 0 & \text{otherwise} + \end{cases} + \end{align*} + That is, $b_{i,j} = 0 $ or $i=j$. + \item Let $G_{V_1}$ be the graph on $V_1$ with adjacency matrix $C$. + Then, $G_{V_1}$ is the \textbf{projection of $G$ onto $V_1$}. +\end{itemize} + +\begin{figure}[H] + \centering + \includegraphics[width=0.7\textwidth]{./images/surveygv1.png} + \caption{ $G_{V_1}$ computed for our survey data } +\end{figure} + +\subsection{Colouring} +\begin{figure}[H] + \centering + \includegraphics[width=0.7\textwidth]{./images/colouredsurvey.png} + \caption{ The original survey graph is more easily digestible if coloured } +\end{figure} + +For any bipartite graph, we can think of the nodes in the two sets as \textbf{coloured} with different colours. +For instance, we can think of nodes in $X_1$ as white nodes and those in $X_2$ as black nodes. +A \textbf{vertex-colouring} of a graph $G$is an assignment of (finitely many) colours to the nodes of $G$ such that any two nodes which are connected by an edge have different colours. +A graph is called \textbf{$N$-colourable} if it has a vertex colouring with at most $N$ colours. +The \textbf{chromatic number} of a graph $G$ is the \textit{smallest $N$} for which a graph $G$ is $N$-colourable. +The following statements about a graph $G$ are equivalent: +\begin{itemize} + \item $G$ is bipartite; + \item $G$ is 2-colourable; + \item Each cycle in $G$ has even length. +\end{itemize} + +\section{Trees} +A \textbf{cycle} in a simple graph provides, for any two nodes on that cycle, at least two different paths from node $a$ to node $b$. +It can be useful to provide alternative routes for connectivity in case one of the edges should fail, e.g., in an electrical network. +\\\\ +A graph is called \textbf{acyclic} if it does not contain any cycles. +A \textbf{tree} is a simple graph that is \textit{connected} \& \textit{acyclic}. +In other words, between any two vertices in a tree there is exactly one simple path. +Trees can be characterised in many different ways. +\\\\ +\textbf{Theorem:} Let $G=(X,E)$ be a (simple) graph of order $n=|X|$ and size $m=|E|$. +Then, the following are equivalent: +\begin{itemize} + \item $G$ is a tree (i.e., acyclic \& connected); + \item $G$ is connected and $m=n-1$. + \item $G$ is a minimally connected graph (i.e., removing any edge will disconnect $G$). + \item $G$ is acyclic and $m=n-1$. + \item $G$ is a maximally acyclic graph (i.e., adding any edge will introduce a cycle in $G$); + \item There is a unique path between each pair of nodes in $G$. +\end{itemize} + +All trees are \textbf{bipartite}: +there are a few ways of thinking about this; +one is that a graph is bipartite if it has no cycles of odd length -- since a tree has no cycles, it must be bipartite. + +\subsection{Cayley's Formula} +\textbf{Theorem:} there are exactly $n^{n-2}$ distinct (labelled) trees on the $n$-element vertex set $X=\{0,1,2, \dots, n-1\}$ if $n>1$. + +\subsubsection{Pr\"ufer Codes} +The \textbf{Pr\"fer code} of a tree can be determined (destructively) as follows: +\begin{enumerate} + \item Start with a tree $T$ with nodes labelled $0, 1, \dots, n-1$ and an empty list $a$. + \item Find the \textbf{leaf node} $x$ with the smallest label (with a ``leaf node'' being a node of degree 1. Every tree must have at least two leaf nodes). + \item Append the label of its unique neighbour $y$to the list $a$. + \item Remove $x$ (and the edge $x \leftrightarrow y$) from $T$. + \item Repeat steps 2-3 until $T$ has only two ndoes left. + We now have the code as a list of length $n-2$. +\end{enumerate} + +A tree can be re-constructed from its Pr\"ufer code as the degree of a node $x$ is $1$ plus the number of entries $x$ in the Pr\"ufer code of $T$. +A tree can be computed from a Pr\"ufer code $a$ (where the list $a$ is a list of length $n-2$ with all entries numbered $0$ to $n-1$) as follows: +\begin{enumerate} + \item Set $G$ to be a graph with node list $[0,1,2, \dots, n-1]$ and no edges yet. + \item Compute the list of node degrees $d$ from the code. + \item For $k=0,1,\dots, n-2$: + \begin{enumerate}[label=\arabic*.] + \item Set $y = a[k]$. + \item Set $x$to be the node with the smallest degree in $d$. + \item Add the edge $(x,y)$ to $G$. + \item Set $d[x]=d[x]-1$and $d[y]=d[y]-1$ (that is, decrease the degrees of both $x$ and $y$ by one). + \end{enumerate} + \item Finally, connect the remaining two nodes of degrees 1 by an edge. +\end{enumerate} + +Since we know now that there is a bijection between labelled trees and Pr\"ufer codes, we can prove Cayley's theorem easily: +\begin{enumerate} + \item A tree with $n$ nodes has a Pr\"ufer code of length $n-2$. + \item There are $n$ choices for each entry in the code. + \item So, there are $n^{n-2}$ possible codes for a tree with $n$ nodes. + \item So, there are $n^{n-2}$ possible trees with $n$ nodes. +\end{enumerate} + +\subsection{Graph \& Tree Traversal} +Often, one has to search through a network to check properties of nodes such as to find the node with the largest degree. +For large unstructured networks, this can be challenging; +fortunately, there are simple \& efficient algorithms to achieve this: +\begin{itemize} + \item DFS. + \item BFS. +\end{itemize} + +\subsubsection{Depth-First Search} +\textbf{Depth-first search (DFS)} works by starting at a root node and travelling as far along one of its branches as it can, then returning to the last unexplored branch. +The main data structure needed to implement DFS is a \textbf{stack}, also known as a Last-In-First-Out (LIFO) queue. +Given a rooted tree $T$ with root $x$, to visit all nodes in the tree: +\begin{enumerate} + \item Start with an empty stack $S$. + \item Push $x$ onto $S$. + \item While $S \neq \emptyset$: + \begin{enumerate}[label=\arabic*.] + \item Pop node $y$ from the stack. + \item Visit $y$. + \item Push $y$'s children onto the stack. + \end{enumerate} +\end{enumerate} + +\subsubsection{Breadth-First Search} +\textbf{Breadth-first search (BFS)} works by starting at a root node and exploring all the neighbouring nodes (on the same level) first. +Next, it searches their neighbours (level 2), etc. +The main data structure needed to implement BFS is a \textbf{queue}, also known as a First-In-First-Out (FIFO) queue. +Given a rooted tree $T$ with root $x$, to visit all nodes in the tree: +\begin{itemize} + \item Start with an empty queue $Q$. + \item Push $x$ onto $Q$. + \item While $Q \neq \emptyset$: + \begin{enumerate}[label=\arabic*.] + \item Pop node $y$ from $Q$. + \item Visit node $y$. + \item Push $y$'s children onto $Q$. + \end{enumerate} +\end{itemize} + +Many questions on networks regarding distance \& connectivity can be answered by a versatile strategy involving a subgraph which is a tree and then searching that; such a tree is called \textbf{spanning tree} of the underlying graph. diff --git a/year4/semester2/CS4423/notes/images/colouredsurvey.png b/year4/semester2/CS4423/notes/images/colouredsurvey.png new file mode 100644 index 00000000..191928b8 Binary files /dev/null and b/year4/semester2/CS4423/notes/images/colouredsurvey.png differ diff --git a/year4/semester2/CS4423/notes/images/connectedcomponents.png b/year4/semester2/CS4423/notes/images/connectedcomponents.png new file mode 100644 index 00000000..070d61a8 Binary files /dev/null and b/year4/semester2/CS4423/notes/images/connectedcomponents.png differ diff --git a/year4/semester2/CS4423/notes/images/permexampe.png b/year4/semester2/CS4423/notes/images/permexampe.png new file mode 100644 index 00000000..73684ad3 Binary files /dev/null and b/year4/semester2/CS4423/notes/images/permexampe.png differ diff --git a/year4/semester2/CS4423/notes/images/surveygv1.png b/year4/semester2/CS4423/notes/images/surveygv1.png new file mode 100644 index 00000000..a11eaad0 Binary files /dev/null and b/year4/semester2/CS4423/notes/images/surveygv1.png differ diff --git a/year4/semester2/CS4423/notes/images/surveymatrix.png b/year4/semester2/CS4423/notes/images/surveymatrix.png new file mode 100644 index 00000000..0b423d10 Binary files /dev/null and b/year4/semester2/CS4423/notes/images/surveymatrix.png differ diff --git a/year4/semester2/CS4423/notes/images/surveymatrix2.png b/year4/semester2/CS4423/notes/images/surveymatrix2.png new file mode 100644 index 00000000..8fd0c23c Binary files /dev/null and b/year4/semester2/CS4423/notes/images/surveymatrix2.png differ diff --git a/year4/semester2/CS4423/notes/images/surveysubgrap.png b/year4/semester2/CS4423/notes/images/surveysubgrap.png new file mode 100644 index 00000000..855a5fe1 Binary files /dev/null and b/year4/semester2/CS4423/notes/images/surveysubgrap.png differ diff --git a/year4/semester2/CS4423/notes/images/suverydata.png b/year4/semester2/CS4423/notes/images/suverydata.png new file mode 100644 index 00000000..1a6abc19 Binary files /dev/null and b/year4/semester2/CS4423/notes/images/suverydata.png differ diff --git a/year4/semester2/CS4423/notes/images/suverygaaph.png b/year4/semester2/CS4423/notes/images/suverygaaph.png new file mode 100644 index 00000000..1033e249 Binary files /dev/null and b/year4/semester2/CS4423/notes/images/suverygaaph.png differ