uni/year4/semester2/CS4423/notes/CS4423.tex

% ! TeX program = lualatex
\documentclass[a4paper,11pt]{article}
% packages
\usepackage{censor}
\StopCensoring
\usepackage{fontspec}
\setmainfont{EB Garamond}
% for tironian et fallback
% % \directlua{luaotfload.add_fallback
% % ("emojifallback",
% %      {"Noto Serif:mode=harf"}
% % )}
% % \setmainfont{EB Garamond}[RawFeature={fallback=emojifallback}]

\setmonofont[Scale=MatchLowercase]{Deja Vu Sans Mono}
\usepackage[a4paper,left=2cm,right=2cm,top=\dimexpr15mm+1.5\baselineskip,bottom=2cm]{geometry}
\setlength{\parindent}{0pt}

\usepackage{fancyhdr}       % Headers and footers
\fancyhead[R]{\normalfont \leftmark}
\fancyhead[L]{}
\pagestyle{fancy}

\usepackage{microtype}      % Slightly tweak font spacing for aesthetics
\usepackage[english]{babel} % Language hyphenation and typographical rules
\usepackage{xcolor}
\definecolor{linkblue}{RGB}{0, 64, 128}
\usepackage[final, colorlinks = false, urlcolor = linkblue]{hyperref}
% \newcommand{\secref}[1]{\textbf{§~\nameref{#1}}}
\newcommand{\secref}[1]{\textbf{§\ref{#1}~\nameref{#1}}}

\usepackage{amsmath}
\usepackage[most]{tcolorbox}
\usepackage{changepage}     % adjust margins on the fly
\usepackage{amsmath,amssymb}

\usepackage{minted}
\usemintedstyle{algol_nu}

\usepackage{pgfplots}
\pgfplotsset{width=\textwidth,compat=1.9}

\usepackage{caption}
\newenvironment{code}{\captionsetup{type=listing}}{}
\captionsetup[listing]{skip=0pt}
\setlength{\abovecaptionskip}{5pt}
\setlength{\belowcaptionskip}{5pt}

\usepackage[yyyymmdd]{datetime}
\renewcommand{\dateseparator}{--}

\usepackage{enumitem}

\usepackage{titlesec}

\author{Andrew Hayes}

\begin{document}
\begin{titlepage}
    \begin{center}
        \hrule
        \vspace*{0.6cm}
        \Huge \textsc{cs4423}
        \vspace*{0.6cm}
        \hrule
        \LARGE
       \vspace{0.5cm}
       Networks
       \vspace{0.5cm}
       \hrule

       \vfill

       \hrule
        \begin{minipage}{0.495\textwidth}
            \vspace{0.4em}
            \raggedright
            \normalsize
            \begin{tabular}{@{}l l}
                Name: & Andrew Hayes \\
                Student ID: & 21321503 \\
                E-mail: & \href{mailto://a.hayes18@universityofgalway.ie}{a.hayes18@universityofgalway.ie} \\
            \end{tabular}
        \end{minipage}
        \begin{minipage}{0.495\textwidth}
            \raggedleft
            \vspace*{0.8cm}
            \Large
            \today
            \vspace*{0.6cm}
        \end{minipage}
        \medskip\hrule
    \end{center}
\end{titlepage}

\pagenumbering{roman}
\newpage
\tableofcontents
\newpage
\setcounter{page}{1}
\pagenumbering{arabic}

\section{Introduction}
\textbf{CS4423 Networks} is a Semester 2 module on \textbf{Network Science}.
Modern societies are in many ways highly connected.
Certain aspects of this phenomenon are frequently described as \textbf{networks}.
CS4423 is an introduction to this emerging interdisciplinary subject.
We'll cover several major topics in this module, including:
\begin{itemize}
    \item   Graphs \& Graph Theory,  and how they relate to networks;
    \item   Representations of networks, including as matrices;
    \item   Computing with networks, using \mintinline{python}{networkx} in Python;
    \item   Centrality measures;
    \item   Random graphs;
    \item   Small worlds;
    \item   Models of growing graphs;
\end{itemize}

Lecture notes \& assignments will come in the form of Jupyter notebooks, which allows us to include interactive Python code with the text.

\subsection{Lecturer Contact Information}
\begin{itemize}
    \item   Name: Dr Niall Madden.
    \item   School of Mathematical \& Statistical Sciences, University of Galway.
    \item   Office: Room ADB-1013, Arás de Brún.
    \item   E-mail: \href{mailto://niall.madden@universityofgalway.ie}{niall.madden@universityofgalway.ie}.
    \item   Website: \url{https://www.niallmadden.ie}
\end{itemize}

\subsection{Exam Information}
First year lecturing, should be similar to old exam papers.
Only looked at the past 2 years or so.

\subsection{Schedule}
Tentative schedule for labs / tutorials:
\begin{itemize}
    \item   Tuesday at 16:00 in AC215;
    \item   Wednesday at 10:00 in CA116a.
\end{itemize}

There will be some practicals during the semester: Week 3 ``Introduction to Python \& Jupyter'' sessions, later weeks help with assignments, preparations for exam, etc.

\subsection{Assessment}
\begin{itemize}
    \item   Two homework assignments.
            Tentative deadlines: Weeks 5 \& 10.
            Each contribute 10\% each to the final grade.

    \item   One in-class test.
            Probably Week 7 (depending on FYP deadlines).
            Contributes 10\% to the final grade.

    \item   Final exam: 70\%.
\end{itemize}

\subsection{Introduction to Networks}
Newman (for example) broadly divides the most commonly studied real-world networks into four classes:
\begin{enumerate}
    \item   \textbf{Technological networks:} rely on physical infrastructure.
            In many cases, this infrastructure has been built over many decades and forms part of the backbone of modern societies, including roads \& other transportation networks, power grids, and communications networks.

    \item   \textbf{Social networks:} the vertices of a social network are people (or, at leasts, User IDs), with edges representing some sort of \textbf{social interaction}.
            In sociology, the vertices are often called \textbf{actors}, and the edges are called \textbf{ties}.
            Social networks are not just online: sociologists have studied social networks long before people started exhibiting their relations to others online.
            Traditionally, data about the structure of social networks have been compiled by interviewing the people involved.

    \item   \textbf{Information networks:} consist of \textbf{data items} which are linked to each other in some way.
            Examples include relational databases.
            Sets of information (like scientific publications) have been linking to each other (e.g., through citations) long before computers were invented, although links in digital form are easier to follow.
            \\\\
            The \textbf{WWW} is probably the most widespread \& best-known example of an information network.
            Its nodes are \textbf{web pages} containing information in form of text \& pictures, and its edges are the \textbf{hyperlinks}, allowing us to surf or navigate from page to page.
            Hyperlinks run in one direction only, from the page that contains the hyperlink to the page that is referenced.
            Therefore, the WWW is a \textbf{directed network}, a graph where each edge has a direction.

    \item   \textbf{Biological networks:}
            \begin{itemize}
                \item   \textbf{Biochemical networks} represent molecular-level patterns of interaction \& control mechanisms in the biological cell, including metabolic networks, protein-protein interaction networks, \& genetic regulatory networks.

                \item   A \textbf{neural network} can be represented as a set of vertices, the neurons, connected by two types of directed edges, one for excitatory inputs and one for inhibitory inputs.
                        (Not to be confused with an artificial neural network).

                \item   \textbf{Ecological networks} are networks of ecological interactions between species.
            \end{itemize}
\end{enumerate}

In each case, a network connects parts of a system (\textbf{nodes}) by some means (\textbf{links}).
Different techniques are used to display, discover, \& measure the structure in each example.
\\\\
In its simplest form, a \textbf{network} is just a collection of points (called \textbf{vertices} or \textbf{nodes}), some of which are joined in pairs (called \textbf{edges}  or \textbf{links}).
Many systems of interest are composed of individual parts that are in some way linked together: such systems can be regarded as networks, and thinking about them in this way can often lead to new \& useful insights.
\\\\
\textbf{Network science} studies the patterns of connections between the components of a system.
Naturally, the structure of the networks can have a big impact on the behaviour of the system.
A \textbf{network} is a simplified representation of a complex system by vertices \& edges.
The scientific study of networks is an interdisciplinary undertaking that combines ideas from mathematics, computer science, physics, the social sciences, \& biology.
Between these scientific fields, many tools have been developed for analysing, modeling, \& understanding networks.

\subsubsection{Network Measures}
\textbf{Centrality} is an example of a useful \& important type of network measure; it is concerned with the question of how important a particular vertex or edge is in a networked system.
Different concepts have been proposed to capture mathematically what it means to be central.
For example, a simple measure of the centrality of a vertex is its \textbf{degree}, that is, the number of edges it is part of (or, equivalently, the number of vertices it is adjacent to).
Applications of centrality include determining which entities in a social network have the most influence, or which links in a power grid are most vulnerable.
\\\\
Which measurements \& calculations give meaningful answers for a particular system depends of course on the specific nature of the system and the questions one wants to ask.

\subsubsection{Network Concepts}
Another interesting network concept is the \textbf{small-world effect}, which is concerned with the question of how far apart two randomly chosen points in a network typically are.
Here, \textbf{distance} is usually measured by the number of edges one would need to cross over when travelling along a \textbf{path} from one vertex to another.
In real-world social networks, the distance between people tends to be rather small.

\section{Graphs}
A \textbf{graph} can serve as a mathematical model of a network.
Later, we will use the \mintinline{python}{networkx} package to work with examples of graphs \& networks.

\subsection{Example: The Internet (circa 1970)}
\begin{figure}[H]
    \centering
    \includegraphics[width=0.7\textwidth]{./images/f7dec1970.jpg}
    \caption{
        The Internet (more precisely, ARPANET) in December 1970.
        Nodes are computers, connected by a link if they can directly communicate with each other.
        At the time, only 13 computers participated in that network.
    }
\end{figure}

\begin{code}
\begin{minted}[linenos, breaklines, frame=single]{text}
UCSB SRI UCLA
SRI UCLA STAN UTAH
UCLA STAN RAND
UTAH SDC MIT
RAND SDC BBN
MIT BBN LINC
BBN HARV
LINC CASE
HARV CARN
CASE CARN
\end{minted}
\caption{\texttt{arpa.adj}}
\end{code}

The following \textbf{diagram}, built from the adjacencies in \verb|arpa.adj|, contains the same information as in the above figure, without the distracting details of US geography;
this is actually an important point, as networks only reflect the \textbf{topology} of the object being studied.

\begin{code}
\begin{minted}[linenos, breaklines, frame=single]{python}
H = nx.read_adjlist("../data/arpa.adj")
opts = { "with_labels": True, "node_color": 'y' }
nx.draw(H, **opts)
\end{minted}
\caption{\texttt{arpa.adj}}
\end{code}

\begin{figure}[H]
    \centering
    \includegraphics[width=0.7\textwidth]{./images/qwe_download.png}
    \caption{ The ARPA Network as a Graph }
\end{figure}

\subsection{Simple Graphs}
A \textbf{simple graph} is a pair $G = (X,E)$ consisting of a finite set $X$ of objects called \textit{nodes}, \textit{vertices}, or \textit{points} and a set of \textit{links} or \textit{edges} $E$ which are each a set of two different vertices.
\begin{itemize}
    \item   We can also write $E \subseteq \binom{X}{2}$, where $\binom{X}{2}$ ($X$ \textit{choose} 2) is the set of all $2$-element subsets of $X$.
    \item   The \textbf{order} of the graph $G$ is denoted as $n = |X|$, where $n$ is the number of vertices in the graph.
    \item   The \textbf{size} of the graph is denoted as $m = |E|$, where $m$ is the number of edges in the graph.
            Naturally, $m \leq \binom{n}{2}$.
\end{itemize}

\subsection{Subgraphs \& Induced Subgraphs}
Given $G = (X,E)$, a \textbf{subgraph} of $G$ is $H=(Y, E_H)$ with $Y \subseteq X$ and $E_H \subseteq  E \cap \binom{Y}{s}$;
therefore, all the nodes in $H$ are also in $G$ and any edge in $H$ was also in $G$, and is incident only to vertices in $Y$.
\\\\
One of the most important subgraphs of $G$ is the \textbf{induced subgraph} on $Y \subseteq X$: $H = (Y, E \cap \binom{Y}{2})$;
that is, given a subset $Y$ of $X$, we include all possible edges from the original graph $G$ too.
Each node has a list of \textbf{neighbours} which are the nodes it is directly connected to by an edge of the graph.

\subsection{Important Graphs}
The \textbf{complete graph} on a vertex set $X$ is the graph with edge set $\binom{X}{2}$.
For example, if $X = \{0,1,2,3 \}$, then $E = \{01,02,03,12,13,23\}$
\\\\
The \textbf{Petersen graph} is a graph on 10 vertices with 15 edges.
It can be constructed as the complement of the line graph of the complete graph $K_5$, that is, as the graph with the vertex set $X = \binom{ \{0,1,2,3,4\} }{2}$ (the edge set of $K_5$) and with an edge between $x,y \in X$ whenever $x \cap y = \emptyset$.
\\\\
A graph is \textbf{bipartite} if we can divide the node set $X$ into two subsets $X_1$ and $X_2$ such that:
\begin{itemize}
    \item   $X_1 \cap X_2 = \emptyset$  (the sets have no edge in common);
    \item   $X_1 \cup X_2 = X$.
\end{itemize}

For any edge $(u_1, u_2)$, we have $u_1 \in X_1$ and $u_2 \in X_2$; that is, we only ever have edges between nodes from different sets.
Such graphs are very common in Network Science, where nodes in the network represent two different types of entities; for example, we might have a graph wherein nodes represent students and modules, with edges between students and modules they were enrolled in, often called an \textbf{affiliation network}.
\\\\
A \textbf{complete bipartite graph} is a particular bipartite graph wherein there is an edge between every node in $X_1$ and every node in $X_2$.
Such graphs are denoted $K_{m,n}$, where $|X_1| = m$ and $|X_2|=n$.
\\\\
The \textbf{path graph} with $n$ nodes, denoted $P_n$, is a graph where two nodes have degree 1, and the other $n-2$ have degree 2.
\\\\
The \textbf{cycle graph} on $n \geq 3$ nodes, denoted $C_n$ (slightly informally) is formed by adding an edge between the two nodes of degree 1 in a path graph.

\subsection{New Graphs from Old}
The \textbf{complement} of a graph $G$ is a graph $H$ with the same nodes as $G$ but each pair of nodes in $H$ are adjacent if and only if they are \textit{not adjacent} in $G$.
The complement of a complete graph is an empty graph.
\\\\
A graph $G$ can be thought of as being made from ``things'' that have connection to each other: the ``things'' are nodes, and their connections are represented by an edge.
However, we can also think of edges as ``things'' that are connected to any other edge with which they share a vertex in common.
This leads to the idea of a line graph:
the \textbf{line graph} of a graph $G$, denoted $L(G)$ is the graph where every node in $L(G)$ corresponds to an edge in $G$, and for every pair of edges in $G$ that share a node, $L(G)$ has an edge between their corresponding nodes.

\section{Matrices of Graphs}
There are various was to represent a graph, including the node set, the edge set, or a drawing of the graph;
one of the most useful representations of a graph for computational purposes is as a \textbf{matrix}; the three most important matrix representations are:
\begin{itemize}
    \item   The \textbf{adjacency matrix} (most important);
    \item   The \textbf{incidence matrix} (has its uses);
    \item   The \textbf{graph Laplacian} (the coolest).
\end{itemize}

\subsection{Adjacency Matrices}
The \textbf{adjacency matrix} of a graph $G$ of order $n$ is a square $n \times n$ matrix $A = (a_{i,j})$ with rows \& columns corresponding to the nodes of the graph, that is, we number the nodes $1, 2, \dots, n$.
Then, $A$ is given by:
\begin{align*}
    a_{i,j} =
    \begin{cases}
        1 & \text{if nodes } i \text{ and } j \text{ are joined by an edge,} \\
        0 & \text{otherwise}
    \end{cases}
\end{align*}

Put another way, $a_{i,j}$ is the number of edges between node $i$ and node $j$.
Properties of adjacency matrices include:
\begin{itemize}
    \item   $\sum^N_{i=1} \sum^N_{j=1} a_{i,j} = \sum_{u \in X}\text{deg}(u)$ where $\text{deg}(u)$ is the degree of $u$.
    \item   All graphs that we've seen hitherto are \textit{undirected}: for all such graphs, $A$ is symmetric.
            $A = A^T$ and, equivalently, $a_{i,j} = a{j,i}$.
    \item   $a_{i,i} = 0$ for all $i$.
    \item   In real-world examples, $A$ is usually \textbf{sparse} which means that $\sum^N_{i=1} \sum^N_{j=1} a_{i,j} \ll n^2$, that is, the vast majority of the entries are zero.
            Sparse matrices have huge importance in computational linear algebra: an important idea is that is much more efficient to just store the location of the non-zero entities in a sparse matrix.
\end{itemize}

Any matrix $M = (m_{i,j})$ with the properties that all entries are zero or one and that the diagonal entries are zero (i.e., $m_{i,j}=0$) is an adjacency matrix of \textit{some} graph (as long as we don't mind too much about node labels).
In a sense, every square matrix defines a graph if:
\begin{itemize}
    \item   We allow loops (an edge between a node and itself).
    \item   Every edge has a weight: this is equivalent to the case for our more typical graphs that every potential edge is weighted 0 (is not in the edge set)  or 1 (is in the edge set).
    \item    There are two edges between each node (one in each direction) and they can have different weights.
\end{itemize}

\subsubsection{Examples of Adjacency Matrices}
Let $G = G(X,E)$ be the graph with $X = \{a,b,c,d,e\}$ nodes and edges $\{a \leftrightarrow b, b \leftrightarrow c, b \leftrightarrow d, c \leftrightarrow d, d \leftrightarrow e \}$.
Then:
\begin{align*}
    A =
    \begin{pmatrix}
        0 & 1 & 0 & 0 & 0 \\
        1 & 0 & 1 & 1 & 0 \\
        0 & 1 & 0 & 1 & 0 \\
        0 & 1 & 1 & 0 & 1 \\
        0 & 0 & 0 & 1 & 0 \\
    \end{pmatrix}
\end{align*}

The adjacency matrix of $K_4$ is:
\begin{align*}
    A =
    \begin{pmatrix}
        0 & 1 & 1 & 1 \\
        1 & 0 & 1 & 1 \\
        1 & 1 & 0 & 1 \\
        1 & 1 & 1 & 0 \\
    \end{pmatrix}
\end{align*}

\subsection{Degree}
The \textbf{degree} of a node in a simple graph is the number of nodes to which it is adjacent, i.e., its number of neighbours.
For a node $v$ we denote this number $\text{deg}(v)$.
The degree of a node can serve as a (simple) measure of the importance of a node in a network.
Recall that one of the basic properties of an adjacency matrix is $\sum^n_{i=1} \sum^n_{j=1} a_{i,j} = \sum_{u \in X} \text{deg}(u)$, where $\text{deg}(u)$ is the degree of $u$ and $n$ is the order of the graph;
this relates to a (crude) measure of how connected a network is: the \textbf{average degree}:
\begin{align*}
    \text{Average degree} = \frac{1}{n} \sum_{u \in X} \text{deg}(u) = \frac{1}{n}\sum^n_{i,j} a_{i,j}
\end{align*}
However, if the size of the network (the number of edges) is $m$, then the total sum of degrees is $2m$ (since each edge contributes to the degree count of two nodes), meaning that the average degree is $\frac{2m}{n}$.

\subsection{Walks}
A \textbf{walk} in a graph is a series of edges (perhaps with some repeated) $\{ u_1 \leftrightarrow v_1, u_2 \leftrightarrow u_2, \dots, u_p \leftrightarrow v_p\}$ with the property that $v_i = u_{i+1}$.
If $v_p = u_1$, then it is a \textbf{closed walk}.
The \textbf{length} of a walk is the number of edges in it.
\\\\
Adjacency matrices can be used to enumerate the number of walks of a given length between a pair of vertices.
Obviously, $a_{i,j}$ is the number of walks of length 1 between node $i$ and node $j$.
We can extract that information for node $j$ by computing the product of $A$ and $e_j$ (column $j$of the identity matrix).

\section{Connectivity \& Permutations}
\subsection{Notation}
To start, let's decide on our notation:
\begin{itemize}
    \item   If we write $A = (a_{i,j})$, we mean that $A$ is a matrix and $a_{i,j}$ is its entry row $i$, column $j$.
    \item   We also write such entries as $(A)_{i,j}$;
            the reason for this slightly different notation is to allow us to write, for example, $(A^2)_{i,j}$ is the entry in row $i$, column $j$ of $B = A^2$.
    \item   The \textbf{trace} of a matrix is the sum of its diagonal entries, that is, $\text{tr}(A) = \sum^n_{i=1}a_{i,i}$. (Very standard).
    \item   When we write $A > 0$, we mean that all entries of $A$ are positive.
\end{itemize}

\subsection{Counting Walks}
Recall that the \textbf{adjacency matrix} of a graph $G$ of order $N$ is a square $n \times n$ matrix $A = (a_{i,j})$ with rows and columns corresponding to the nodes of the graph.
$a_{i,j}$ is set to be the number of edges between nodes $i$ and $j$.
We learned previously that:
\begin{itemize}
    \item   If $e_j$ is the $j^\text{th}$ column of the identity matrix $I_n$, then $(Ae_j)_i$ is the number of walks of length 1 from node $i$ to node $j$.
            Also, it is the same as $a_{i,j}$.

    \item   Moreover, $(A(Ae_j))_i = (A^2e_j)$ is the number of walks of length 2 from node $i$ to node $j$.
            We can conclude that, if $B=A^2$, then $b_{i,j}$ is the number of walks of length 2 between nodes $i$ and $j$.
            Note that $b_{i,i}$ is the degree of node $i$.

    \item   In fact, if $B=A^k$, then $b_{i,j}$ is the number of walks of length $k$ between nodes $i$ and $j$.
\end{itemize}

\subsection{Paths}
A \textbf{trail} is  walk with no repeated edges.
A \textbf{cycle} is a trail in which the first and last nodes are the same, but no other node is repeated;
a \textbf{triangle} is a cycle of length 3.
A \textbf{path} is a walk in which no nodes (and so no edges) are repeated.
(The idea of a path is hugely important in network theory, and we will return to it often).
\\\\
The \textbf{length} of a path is the number of edges in that path.
A path from node $u$ to node $v$ is a \textbf{shortest path} if there is no path between them that is shorter (although there could be other paths of the same length).
Finding shortest paths in a network is a major topic that we will return to at another time.
\begin{itemize}
    \item   Every path is also a walk.
    \item   If a particular walk is the shortest walk between two nodes then it is also the shortest path between two nodes.
    \item   If $k$ is the smallest natural number of which $(A^k)_{i,j} \neq 0$, then the shortest walk from node $i$ to node $j$ is of length $k$.
    \item   It follows that $k$ is also the length of the shortest path from node $i$ to node $j$.
\end{itemize}

For example, consider the following adjacency matrix and its powers:
\begin{align*}
A =
\begin{pmatrix}
    0 & 1 & 0 & 0 & 0 \\
    1 & 0 & 1 & 0 & 0 \\
    0 & 1 & 0 & 1 & 1 \\
    0 & 0 & 1 & 0 & 1 \\
    0 & 0 & 1 & 1 & 0
\end{pmatrix}
\end{align*}

\begin{align*}
A^2 =
\begin{pmatrix}
    1 & 0 & 1 & 0 & 0 \\
    0 & 2 & 0 & 1 & 1 \\
    1 & 0 & 3 & 0 & 0 \\
    0 & 1 & 1 & 2 & 1 \\
    0 & 1 & 1 & 1 & 2
\end{pmatrix}
\end{align*}

\begin{align*}
A^3 =
\begin{pmatrix}
    0 & 2 & 0 & 1 & 1 \\
    2 & 0 & 4 & 1 & 1 \\
    0 & 4 & 2 & 4 & 4 \\
    1 & 1 & 4 & 2 & 3 \\
    1 & 1 & 4 & 3 & 2
\end{pmatrix}
\end{align*}

We can observe that, where $A$ is the adjacency matrix of the graph $G$:
\begin{itemize}
    \item   $(A^2)_{i,i}$ is the degree of node $i$.
    \item   $\text{tr}(A^2)$ is the degree sum of the nodes in $G$.
    \item   $(A^3)_{i,i} \neq 0 $ if node $i$ is in a triangle.
    \item   $\frac{\text{tr}(A^3)}{6}$ is the number of triangles in $G$.
    \item   If $G$ is bipartite, then $(A^3)_{i,j} = 0$ for all $i,j$.
\end{itemize}

\subsection{Connectivity}
Let $G$ be a graph and $A$ its adjacency matrix:
in $G$, node $i$ can be \textbf{reached} from node $j$ if there is a path between them.
If node $i$ is reachable from node $j$, then $(A^k)_{i,j} \neq 0$ for some $k$.
Also, note that $k \leq n$.
Equivalently, since each power of $A$ is non-negative, we can say that $(I + A + A^2 + A^3 + \cdots + A^k) > 0$.
\\\\
A graph/network is \textbf{connected} if there is a path between every pair of nodes.
That is, every node is reachable from every other node.
If a graph is not connected, we say that it is \textbf{disconnected}.
Determining if a graph is connected or not is important; we'll see later that this is especially important with directed graphs.
A graph $G$ of order $n$ is connected if and only if, for each $i,j$, there is some $K \leq n$ for which $(A^k)_{i,j} \neq 0$.

\subsection{Permutation Matrices}
We know that the structure of a network is not changed by labelling its nodes.
Sometimes, it is useful to re-label the nodes in order to expose certain properties, such as connectivity.
Since we think of the nodes as all being numbered from 1 to $n$, this is the same as \textbf{permuting} the numbers of some subset of the nodes.

\begin{figure}[H]
    \centering
    \includegraphics[width=0.7\textwidth]{./images/permexampe.png}
    \caption{ Example wherein nodes are re-labelled to expose certain properties of the graph }
\end{figure}

When working with the adjacency matrix of a graph, such a permutation is expressed in terms of a \textbf{permutation matrix} $P$;
this is a $0$-$1$ matrix (also known as a Boolean or a binary matrix) where this is a single $1$ in every row \& column.
If the nodes of a graph $G$ (with adjacency matrix $A$) are listed as entries in a vector ${q}$, then:
\begin{itemize}
    \item   $Pq$ is a permutation of the nodes.
    \item   $PAP^T$ is the adjacency matrix of the graph with that node permutation applied.
\end{itemize}

In many examples, we will have a symmetric $P$ for the sake of simplicity, but in general, $P \ neq P^T$.
However, $P^TAP = PAP^T$and $P^T = P^{-1}$ so $PAP^T = PAP^{-1}$.
\\\\
A graph with adjacency matrix $A$ is \textbf{disconnected} if and only if there is a permutation matrix $P$ such that
\begin{align*}
A &= P \begin{pmatrix} X & O \\ O^T & Y \end{pmatrix} &
PAP^T &= P \begin{pmatrix} X & O \\ O & Y \end{pmatrix}
\end{align*}
where $O$ represents the zero matrix with the same number of rows as $X$ and the same number of columns as $Y$.

\section{Permutations \& Bipartite Networks}
\subsection{Graph Connectivity}
Recall that a graph is \textbf{connected} if there is a path between every pair of nodes.
If the graph is not connected, we say that it is \textbf{disconnected}.
We now know how to check if a graph is connected by looking at powers of its adjacency matrix.
However, that is not very practical for large networks.
Instead, we can determine if a graph is connected by just looking at the adjacency matrix, provided that we have ordered the nodes properly.

\subsection{Connected Components}
If a network is not connected, then we can divide it into \textbf{components} which \textit{are} connected.
The number of connected components is the number of blocks in the permuted adjacency matrix.

\begin{figure}[H]
    \centering
    \includegraphics[width=0.7\textwidth]{./images/connectedcomponents.png}
    \caption{ Connected components example }
\end{figure}

\section{Bipartite Networks: Colours \& Computations}
\subsection{Class Survey Example}
\begin{figure}[H]
    \centering
    \includegraphics[width=0.7\textwidth]{./images/suverydata.png}
    \caption{ Final survey data }
\end{figure}

\begin{figure}[H]
    \centering
    \includegraphics[width=0.7\textwidth]{./images/suverygaaph.png}
    \caption{ Final survey graph, with order 39 and size 87 }
\end{figure}

\begin{figure}[H]
    \centering
    \includegraphics[width=0.7\textwidth]{./images/surveysubgrap.png}
    \caption{ Subgraph of the survey network based on 7 randomly chosen people, with order 16 and size 24 }
\end{figure}

\begin{figure}[H]
    \centering
    \includegraphics[width=0.7\textwidth]{./images/surveymatrix.png}
    \caption{ Adjacency matrix where the nodes for people are listed first }
\end{figure}

\begin{figure}[H]
    \centering
    \includegraphics[width=0.7\textwidth]{./images/surveymatrix2.png}
    \caption{ $B = A^2$ }
\end{figure}

Since we know from before that $(A^k)_{i,j}$ is the number of walks of length $k$between nodes $i$ and $j$, we can see that in this context:
\begin{itemize}
    \item   For the first 7 rows \& columns, $b_{i,j}$ is the number of programmes in common between person $i$ and person $j$.
            (This even works for $i=j$, but the number of programmes a person has in common with themselves is just the number they watch).
    \item   For the last 9 rows \& columns, $b_{i,j}$ is the number of people who watch both programmes $i$ and $j$.
\end{itemize}

\subsection{Projections}
Given a bipartite graph $G$ whose node set $V$ has parts $V_1$ \& $V_2$, and \textbf{projection} of $G$ onto (for example) $V_1$ is the graph with:
\begin{itemize}
    \item   Node set $V_1$;
    \item   An edge between a pair of nodes in $V_1$ if they share a common neighbour in $G$.
\end{itemize}

In the context of our survey example, a projection onto $V_1$ (people/actors) gives us the graph of people who share a common programme.
To make such a graph:
\begin{itemize}
    \item   Let $A$ be the adjacency matrix of $G$.
    \item   Let $B$ be the submatrix of $A^2$ associated with the nodes in $V_1$.
    \item   Let $C$ be the adjacency matrix with the property:
            \begin{align*}
                c_{i,j} =
                \begin{cases}
                    1 & b_{i,j} > 0 \text{ and } i \neq j \\
                    0 & \text{otherwise}
                \end{cases}
            \end{align*}
            That is, $b_{i,j} = 0 $ or $i=j$.
    \item   Let $G_{V_1}$ be the graph on $V_1$ with adjacency matrix $C$.
            Then, $G_{V_1}$ is the \textbf{projection of $G$ onto $V_1$}.
\end{itemize}

\begin{figure}[H]
    \centering
    \includegraphics[width=0.7\textwidth]{./images/surveygv1.png}
    \caption{ $G_{V_1}$ computed for our survey data }
\end{figure}

\subsection{Colouring}
\begin{figure}[H]
    \centering
    \includegraphics[width=0.7\textwidth]{./images/colouredsurvey.png}
    \caption{ The original survey graph is more easily digestible if coloured }
\end{figure}

For any bipartite graph, we can think of the nodes in the two sets as \textbf{coloured} with different colours.
For instance, we can think of nodes in $X_1$ as white nodes and those in $X_2$ as black nodes.
A \textbf{vertex-colouring} of a graph $G$is an assignment of (finitely many) colours to the nodes of $G$ such that any two nodes which are connected by an edge have different colours.
A graph is called \textbf{$N$-colourable} if it has a vertex colouring with at most $N$ colours.
The \textbf{chromatic number} of a graph $G$ is the \textit{smallest $N$} for which a graph $G$ is $N$-colourable.
The following statements about a graph $G$ are equivalent:
\begin{itemize}
    \item   $G$ is bipartite;
    \item   $G$ is 2-colourable;
    \item   Each cycle in $G$ has even length.
\end{itemize}

\section{Trees}
A \textbf{cycle} in a simple graph provides, for any two nodes on that cycle, at least two different paths from node $a$ to node $b$.
It can be useful to provide alternative routes for connectivity in case one of the edges should fail, e.g., in an electrical network.
\\\\
A graph is called \textbf{acyclic} if it does not contain any cycles.
A \textbf{tree} is a simple graph that is \textit{connected} \& \textit{acyclic}.
In other words, between any two vertices in a tree there is exactly one simple path.
Trees can be characterised in many different ways.
\\\\
\textbf{Theorem:} Let $G=(X,E)$ be a (simple) graph of order $n=|X|$ and size $m=|E|$.
Then, the following are equivalent:
\begin{itemize}
    \item   $G$ is a tree (i.e., acyclic \& connected);
    \item   $G$ is connected and $m=n-1$.
    \item   $G$ is a minimally connected graph (i.e., removing any edge will disconnect $G$).
    \item   $G$ is acyclic and $m=n-1$.
    \item   $G$ is a maximally acyclic graph (i.e., adding any edge will introduce a cycle in $G$);
    \item   There is a unique path between each pair of nodes in $G$.
\end{itemize}

All trees are \textbf{bipartite}:
there are a few ways of thinking about this;
one is that a graph is bipartite if it has no cycles of odd length -- since a tree has no cycles, it must be bipartite.

\subsection{Cayley's Formula}
\textbf{Theorem:} there are exactly $n^{n-2}$ distinct (labelled) trees on the $n$-element vertex set $X=\{0,1,2, \dots, n-1\}$ if $n>1$.

\subsubsection{Pr\"ufer Codes}
The \textbf{Pr\"fer code} of a tree can be determined (destructively) as follows:
\begin{enumerate}
    \item   Start with a tree $T$ with nodes labelled $0, 1, \dots, n-1$ and an empty list $a$.
    \item   Find the \textbf{leaf node} $x$ with the smallest label (with a ``leaf node'' being a node of degree 1. Every tree must have at least two leaf nodes).
    \item   Append the label of its unique neighbour $y$to the list $a$.
    \item   Remove $x$ (and the edge $x \leftrightarrow y$) from $T$.
    \item   Repeat steps 2-3 until $T$ has only two ndoes left.
            We now have the code as a list of length $n-2$.
\end{enumerate}

A tree can be re-constructed from its Pr\"ufer code as the degree of a node $x$ is $1$ plus the number of entries $x$ in the Pr\"ufer code of $T$.
A tree can be computed from a Pr\"ufer code $a$ (where the list $a$ is a list of length $n-2$ with all entries numbered $0$ to $n-1$) as follows:
\begin{enumerate}
    \item   Set $G$ to be a graph with node list $[0,1,2, \dots, n-1]$ and no edges yet.
    \item   Compute the list of node degrees $d$ from the code.
    \item   For $k=0,1,\dots, n-2$:
            \begin{enumerate}[label=\arabic*.]
                \item   Set $y = a[k]$.
                \item   Set $x$to be the node with the smallest degree in $d$.
                \item   Add the edge $(x,y)$ to $G$.
                \item   Set $d[x]=d[x]-1$and $d[y]=d[y]-1$ (that is, decrease the degrees of both $x$ and $y$ by one).
            \end{enumerate}
    \item   Finally, connect the remaining two nodes of degrees 1 by an edge.
\end{enumerate}

Since we know now that there is a bijection between labelled trees and Pr\"ufer codes, we can prove Cayley's theorem easily:
\begin{enumerate}
    \item   A tree with $n$ nodes has a Pr\"ufer code of length $n-2$.
    \item   There are $n$ choices for each entry in the code.
    \item   So, there are $n^{n-2}$ possible codes for a tree with $n$ nodes.
    \item   So, there are $n^{n-2}$ possible trees with $n$ nodes.
\end{enumerate}

\subsection{Graph \& Tree Traversal}
Often, one has to search through a network to check properties of nodes such as to find the node with the largest degree.
For large unstructured networks, this can be challenging;
fortunately, there are simple \& efficient algorithms to achieve this:
\begin{itemize}
    \item   DFS.
    \item   BFS.
\end{itemize}

\subsubsection{Depth-First Search}
\textbf{Depth-first search (DFS)} works by starting at a root node and travelling as far along one of its branches as it can, then returning to the last unexplored branch.
The main data structure needed to implement DFS is a \textbf{stack}, also known as a Last-In-First-Out (LIFO) queue.
Given a rooted tree $T$ with root $x$, to visit all nodes in the tree:
\begin{enumerate}
    \item   Start with an empty stack $S$.
    \item   Push $x$ onto $S$.
    \item   While $S \neq \emptyset$:
            \begin{enumerate}[label=\arabic*.]
                \item   Pop node $y$ from the stack.
                \item   Visit $y$.
                \item   Push $y$'s children onto the stack.
            \end{enumerate}
\end{enumerate}

\subsubsection{Breadth-First Search}
\textbf{Breadth-first search (BFS)} works by starting at a root node and exploring all the neighbouring nodes (on the same level) first.
Next, it searches their neighbours (level 2), etc.
The main data structure needed to implement BFS is a \textbf{queue}, also known as a First-In-First-Out (FIFO) queue.
Given a rooted tree $T$ with root $x$, to visit all nodes in the tree:
\begin{itemize}
    \item   Start with an empty queue $Q$.
    \item   Push $x$ onto $Q$.
    \item   While $Q \neq \emptyset$:
            \begin{enumerate}[label=\arabic*.]
                \item   Pop node $y$ from $Q$.
                \item   Visit node $y$.
                \item   Push $y$'s children onto $Q$.
            \end{enumerate}
\end{itemize}

Many questions on networks regarding distance \& connectivity can be answered by a versatile strategy involving a subgraph which is a tree and then searching that; such a tree is called \textbf{spanning tree} of the underlying graph.

\subsubsection{Graph Diameter}
A natural problem arising in many practical applications is the following: given a pair of nodes $x,y$, find one or all the paths from $x$ to $y$ with the fewest number of edges possible.
This is a somewhat complex measure on a network (compared to, say, statistics on node degrees) and we will therefore need a more complex procedure, that is, an algorithm, in order to solve such problems systematically.
\\\\
\textbf{Definition:} let $G=(X,E)$ be a simple graph and let $x,y \in X$.
Let $P(x,y)$ be the set of all paths from $x$ to $y$.
Then:
\begin{itemize}
    \item   The \textbf{distance} $d(x,y)$ from $x$ to $y$ is
            \begin{align*}
                d(x,y) = \text{min}\{ l(p) : p \in P(x,y) \},
            \end{align*}
            the shortest possible length of a path from $x$ to $y$, and a \textbf{shortest path} from $x$ to $y$ is a path $p \in P(x,y)$ of length $l(p) = d(x,y)$.

    \item   The \textbf{diameter} $\text{diam}(G)$ of the network $G$ is the length of the longest shortest path between any two nodes:
            \begin{align*}
                \text{diam}(G) = \text{max}\{ l(p) : p \in P(x,y) \}
            \end{align*}
\end{itemize}

% MISSING WEEK06-01 shit

\section{Centrality Measures}
Key nodes in a network can be identified through \textbf{centrality measures}: a way of assigning ``scores'' to nodes that represents their ``importance''.
However, what it means to be central depends on the context;
accordingly, in the context of network analysis, a variety of different centrality measures have been developed.
Measures of centrality include:
\begin{itemize}
    \item   \textbf{Degree centrality:} just the degree of the nodes, important in transport networks for example.
    \item   \textbf{Eigenvector centrality:} defined in terms of properties of the network's adjacency matrix.
    \item   \textbf{Closeness centrality:} defined in terms of a node's \textbf{distance} to other nodes in the network.
    \item   \textbf{Betweenness centrality:} defined in terms of \textbf{shortest paths}.
\end{itemize}
\subsection{Degree Centrality}
In a (simple) graph $G=(X,E)$ with $X=\{ 0, 1, \dots, n-1 \}$ and adjacency matrix $A=(a_{i,j})$, the \textbf{degree centrality} $c_i^D$ of node $i \in X$ is defined as:
\[
    c_i^D = k_i = \sum_j a_{i,j}
\]

where $k_i$ is the degree of node $i$.
\\\\
In some cases, this measure can be misleading since it depends (among other things) on the order of the graph.
A better measure is the \textbf{normalised degree centrality}: the normalised degree centrality $C_I^D$ of node $i \in X$ is defined as:
\[
    C_i^D = \frac{k_i}{n-1} = \frac{c_i^D}{n-1} \left( = \frac{\text{degree of centrality of node } i}{\text{number of potential neighbours of } i} \right)
\]

Note that in a directed graph, one distinguishes between the \textbf{in-degree} and the \textbf{out-degree}o f a node and defines the in-degree centrality and the out-degree centrality accordingly.

\subsection{Eigenvector Centrality}
Let $A$ be a square $n \times n$ matrix.
An $n$-dimensional vector, $v$, is called an \textbf{eigenvector} of $A$ if :
\[
    Av = \lambda v
\]

for some scalar $\lambda$ which is called an \textbf{eigenvalue} of $A$.
\\\\
When $A$ is a real-valued matrix, one usually finds that $\lambda$ and $v$ are \textit{complex-valued}.
However, if $A$ is \textbf{symmetric}, then they are \textit{real-valued}.
$A$ may have up to $n$ eigenvalues $\lambda_1, \lambda_2, \dots, \lambda_n$.
The \textbf{spectral radius} of $A$ is $\rho(A) := \text{max}(|\lambda_1|, \lambda_2|, \dots, |\lambda_n|)$.
If $v$ is an eigenvector associated with the eigenvalue $\lambda$, so too is any non-zero multiple of $v$.
\\\\
The basic idea of eigenvector centrality is that a node's ranking in a network should relate to the rankings of the nodes it is connected to.
More specifically,  up to to some scalar $\lambda$, the centrality $c_i^E$ of node $i$ should be equal to the sum of the centralities $c_j^E$ of its neighbouring nodes $j$.
In terms of the adjacency matrix $A = (a_{i,j})$, this relationship is expressed as:
\[
    \lambda c_i^E = \sum_j a_{i,j} c_j^E
\]

which, in turn, in matrix language is:
\[
    \lambda c^E = Ac^E
\]

for the vector $c^E = (c_i^E)$ which then is an eigenvector of $A$.
So $c^E$ is an eigenvector of $A$ (but which one?).

\subsubsection{How to find $c^E$ and/or $\lambda$}
If the network is small, one could use the usual method (although it is almost never a good idea).
\begin{enumerate}
    \item   Find the \textit{characteristic polynomial} $p_A(x)$ of $A$ as \textit{determinant} of the matrix $xI -A$, where $I$ is the $n \times n$ identity matrix.
    \item   Find the roots $\lambda$ of $p_A(x)$ (i.e., scalars $\lambda$ such that $p_A(\lambda) = 0$).
    \item   Find a \textit{non-trivial solution} of the linear system $(\lambda I - A) c = 0$ (where $0$ stands for an all-$0$ column vector and $c = (c_1, \dots, c_n)$ is a column of \textit{unknowns}).
\end{enumerate}

For large networks, there is a much better algorithm, such as the \textbf{Power method}, which we will look at later.

\subsubsection{Perron-Frobenius Theory}
Presently, we'll lean that the adjacency matrix always has one eigenvalue which is greater than all the others.
\\\\
A matrix $A$ is called \textbf{reducible} if, for some simultaneous permutations of its rows and columns, it has the block form:
\[
    A =
    \begin{pmatrix}
        A_{1,1} & A_{1,2} \\
        o & A_{2,2}
    \end{pmatrix}
\]

If $A$ is not reducible, we say that it is \textbf{irreducible}.
The adjacency matrix of a simple graph $G$ is \textbf{irreducible} if and only if $G$ is connected.
\\\\
A matrix $A=(a_{i,j})$ is \textbf{non-negative} is $a_{i,j \geq 0}$ for all $i$, $j$.
For simplicity, we usually write $A \geq 0$.
It is important to node that adjacency matrices are examples of non-negative matrices.
There are similar concepts of, say, positive matrices, negative matrices, etc.
Of particular importance are \textbf{positive vectors}:  $v = (v_i)$ is positive for if $v_i > 0$ for all $i$.
We write $v \geq 0$.
\\\\
\textbf{Theorem:} suppose that $A$ is a square, non-negative, \textbf{irreducible} matrix.
Then:
\begin{itemize}
    \item   $A$ has a real eigenvalue $\lambda = \rho(A)$ and $\lambda > |\lambda'|$ for any other eigenvalue $\lambda'$ of $A$.
            $\lambda$ is called the \textbf{Perron root} of $A$.

    \item   $\lambda$ is a simple root of the characteristic polynomial of $A$ (so has just one corresponding eigenvector).

    \item   There is an eigenvector, $v$, associated with $\lambda$ such that $v >0$.
\end{itemize}

For us, this means:
\begin{itemize}
    \item   The adjacency matrix of a connected graph has an eigenvalue that is positive and greater in magnitude than any other.
    \item   It has an eigenvector $v$ that is positive.
    \item   $v_i$ is the eigenvector centrality of the node $i$.
\end{itemize}


\subsection{Closeness Centrality}
A node $x$ in a network can be regarded as being central if it is \textbf{close} to (many) other nodes, as it can quickly interact with them.
Recalling that $d(i,j)$ is the distance between nodes $i$ and $j$ (i.e., the length of the shortest path between them).
Then, we can use $\frac{1}{d(i,j)}$ as a measure of ``closeness'';
in a simple, \textit{connected} graph $G=(X,E)$ of order $n$, the \textbf{closeness centrality}, $c^C_i$ of node $i$ is defined as:
\[
    c_i^C = \frac{1}{\sum_{j \in X} d(i,j)} = \frac{1}{s(i)}
\]

where $s(i)$ is the \textbf{distance sum} for node $i$.
As is usually the case, there is a \textbf{normalised} version of this measure;
the \textbf{normalised closeness centrality} is defined as:
\[
    C_i^C = (n-1)c_i^C = \frac{n-1}{\sum_{j \in X} d(i,j)} = \frac{n-1}{s(i)}
\]

Note that $0 \leq C_i^C \leq 1$.
\\\\
The \textbf{distance matrix} of a graph, $G$, of order $n$ is the $n \times n$ matrix $D=(d_{i,j})$ such that:
\[
    d_{i,j} = d(i,j)
\]

We'll return to how to compute $D$ later, but for now we note:
\begin{itemize}
    \item   $s(i)$ is the sum of row $i$ of $D$.
    \item   If $s$ is the vector of distance sums, then $s = De$ where $e = (1,1, \dots, 1)^T$.
\end{itemize}

\subsection{Betweenness Centrality}
In a simple, connected graph $G$, the \textbf{betweenness centrality} $c_i^B$ of node $i$ is defined as:
\[
    c_i^B = \sum_j \sum_k \frac{n_i(j,k)}{n(j,k)}, \quad j \neq k \neq 1
\]

where $n(j,k)$ denotes the \textit{number} of shortest paths from node $j$ to node $k$, where $n_i(j,k)$ denotes the number of those shortest paths \textit{passing through} node $i$.
\\\\
In a simple, connected graph $G$, the \textbf{normalised betweenness centrality} $c_i^B$ of node $i$ is defined as:
\[
    C_i^B = \frac{c_i^B}{(n-1)(n-2)}
\]

\section{Random Graphs}
A \textbf{random graph} is mathematical model of a family of networks, where certain parameters (like the number of nodes \& edges) have fixed values, but other aspects (like the actual edges) are randomly assigned.
Although a random graph is not a specific object, many of its properties can be described precisely in the form of expected values or probability distributions.

\subsection{Random Samples}
Suppose our network $G = (X,E)$ has $|X| = n$ nodes.
Then, we know that the greatest number of edges it can have is:
\begin{align*}
    \binom{n}{2} = \frac{n!}{(n-2)! 2!} = \frac{n(n-1)}{2}
\end{align*}

Our goal is to randomly select edges on the vertex set $X$, that is, pick random elements from the set $\binom{X}{2}$ of pairs of nodes.
So, we need a procedure for selecting $m$ from $N$ objects randomly, in such a way that each of the $\binom{N}{m}$ subsets of the $N$ objects is an equally likely outcome.
We first discuss sampling $m$ values in the range $\{0,1, \dots, N-1 \}$.
\begin{enumerate}
    \item   Suppose that we choose a natural number $N$ and a real number $p \in [0,1]$.
    \item   Then, iterate over each element of the set $\{0,1 \dots, N-1\}$.
    \item   For each, we pick a random number $x \in [0,1]$.
    \item   If $x < p$, we keep that number.
            Otherwise, remove it from the set.
\end{enumerate}

When we are done, how many elements do we expect in the set if $p = \frac{m}{N}$ for some chosen $m$?
And what is the likelihood of there being, say, $K$ elements in the set?
Since we are creating random samples, where the size of each is a random number, $k$, we expect that $E[k] = Np = m$; this is a \textbf{binomial distribution}:
\begin{itemize}
    \item   The probability of a specific subset of size $K$ to be chosen is $p^k(1-p)^{N-k}$.
    \item   There are $\binom{N}{k}$ subsets of size $k$, so the probability $P(k)$ of the sample to have size $k$ is $P(k) = \binom{N}{k}p^k (1-p)^{N-k}$.
\end{itemize}

We use the following facts:
\begin{itemize}
    \item   $j\binom{N}{j}p^i = Np \binom{N-1}{j-1}p^{i-1}$.
    \item   $(1-p)^{N-j} = (1-p)^{(N-1) - (j-1)}$.
    \item   $(p + (1 -p))^r = 1$ for all $r$.
\end{itemize}

The expected value is:

\begin{align*}
    E[k]    =& \sum^N_{j=1}jP(j) \quad &\text{ weighted average of } j \\
            =& \sum^N_{j=0} j \binom{N}{j} p^j (1-p)^{N-j} \quad \text{ formula for } P(j) \\
            = & Np \sum^{N-1}_{l=0} \binom{N-1}{l} p^l (1-p)^{(N-1) - l} = Np \quad \text{ substituting } l=k-1
\end{align*}

\subsection{Erd\"os-Rényi Models}
\subsubsection{Model A: $G_{ER}(n,m)$ --- Uniformly Selected Edges}
Let $n \geq 1$, let $N = \binom{n}{2}$ and let $0 \leq m \leq N$.
The model $G_{ER}(n,m)$ consists of the ensemble of graphs $G$ on the $n$ nodes $X = \{0,1, \dots, n-1\}$, and $M$ randomly selected edges, chosen uniformly from the $N = \binom{n}{2}$  possible edges.
Equivalently, one can choose uniformly at random one network in the set $G(n,m)$ of \textit{all} networks on a given set of $n$ nodes with exactly $m$ edges.
\\\\
Equivalently, one can choose uniformly at random one network in the \textbf{set} $G(n,m)$ of \textit{all} networks on a given set of $n$ nodes with \textit{exactly} $m$ edges.
One could think of $G(n,m)$ as a probability distribution $P: G(n,m) \rightarrow \mathbb{R}$ that assigns to each network $G \in G(n,m)$ the same probability
\[
    P(G) = \binom{N}{m}^-1
\]
where $N = \binom{n}{2}$.

\begin{figure}[H]
    \centering
    \includegraphics[width=0.7\textwidth]{./images/gnm.png}
    \caption{ Some networks drawn from $G_{ER}(20,15)$ }
\end{figure}


\subsubsection{Model B: $G_{ER}(n,p)$ --- Randomly Selected Edges}
Let $n \geq 1$, let $N = \binom{n}{2}$ and let $0 \leq p \leq 1$.
The model $G_{ER}(n,p)$ consists of the ensemble of graphs $G$ on the $n$ nodes $X=\{0,1, \dots, n-1\}$ with each of the possible $N=\binom{n}{2}$ edges chosen with probability $p$.
\\\\
The probability $P(G)$ of a particular graph $G=(X,E)$ with $X=\{0,1, \dots, n-1\}$ and $m = |E|$ edges in the $G_{ER}(n,p)$ model is
\[
    P(G) = p^m(1-p)^{N-m}
\]

\begin{figure}[H]
    \centering
    \includegraphics[width=0.7\textwidth]{./images/gnm2005.png}
    \caption{ Some networks drawn from $G_{ER}(20,0.5)$ }
\end{figure}

Of the two models, $G_{ER}(n,p)$ is the more studied.
There are many similarities, but they do differ.
For example:
\begin{itemize}
    \item   $G_{ER}(n,m)$ will have $m$ edges with probability 1.
    \item   A graph in $G_{ER}(n,p)$ will have $m$ edges with probability $\binom{N}{m}p^m(p-1)^{N-m}$.
\end{itemize}

\subsubsection{Properties}
We'd like to investigate (theoretically \& computationally) the properties of such graphs.
For example:
\begin{itemize}
    \item   When might it be a tree?
    \item   Does it contain a tree, or other cycles? If so, how many?
    \item   When does it contain a small complete graph?
    \item   When does it contain a \textbf{large component}, larger than all other components?
    \item   When does the network form a single \textbf{connected component}?
    \item   How do these properties depend on $n$ and $m$ (or $p$)?
\end{itemize}

Denote by $G_n$ the set of \textit{all} graphs the $n$ nodes $X=\{0, \dots, n-1\}$.
Set $N=\binom{n}{2}$ the maximal number of edges of a graph $G \in \textsl{G}$.
Regard the ER models A \& B as \textbf{probability distributions} $P : \mathcal{G}_n \rightarrow \mathbb{R}$
\\\\
Denote $m(G)$ as the number of edges of a graph $G$.
As we have seen, the probability of a specific graph $G_{ER}$ to be sampled from the model $G(n,m)$ is:
\begin{align*}
    P(G) =
    \begin{cases}
        \binom{N}{m}^{-1} & \text{if } m(G)= m, \\
        0 & \text{otherwise}
    \end{cases}
\end{align*}

And the probability of a specific graph $G$ to be sampled from the model $G(n,p)$ is
\begin{align*}
    P(G) = n^m(1-n)&{N-m}
\end{align*}

\subsubsection{Expected Size \& Average Degree}
Let's use the following notation:
\begin{itemize}
    \item   $\bar{a}$ is the expected value of property $a$ (that is, as the graphs vary across the ensemble produced by the model).
    \item   $<a>$ is the average of property $a$ over all the nodes of a graph.
\end{itemize}

In $G(n,m)$ the expected \textbf{size} is
\begin{align*}
    \bar{m} = m
\end{align*}
as every graph $G$ in $G(n,m)$ has exactly $m$ edges.
The expected \textbf{average degree} is
\begin{align*}
    \langle k \rangle = \frac{2m}{n}
\end{align*}

as every graph has average degree $\frac{2m}{n}$.
Other properties of $G(n,m)$ are less straightforward, and it is easier to work with the $G(n,p)$.
\\\\
In $G(n,m)$, the \textbf{expected size} (i.e.,  expected number of edges) is
\begin{align*}
    \bar{m} = pN
\end{align*}

Also, variance is $\sigma^2_m = Np(1-p)$.
\\\\
The expected \textbf{average degree} is
\begin{align*}
    \langle k\rangle = p(n-1)
\end{align*}
with standard deviation $\sigma_k = \sqrt{p(1-p) (n-1)}$.


\subsubsection{Degree Distribution}
The \textbf{degree distribution} $p: \mathbb{N}_0 \to \mathbb{R}, k \mapsto p_k$ of a graph $G$ is defined as
\begin{align*}
    p_k = \frac{n_k}{n}
\end{align*}
where, for $k \geq 0$, $n_k$ is the number of nodes of degree $k$ in $G$.
This definition can be extended to ensembles of graphs with $n$ nodes (like the random graphs $G(n,m)$ and $G(n,p)$) by setting
\begin{align*}
    p_k \frac{\bar{n}_k}{n}
\end{align*}

where $\bar{n}_k$ denotes the expected value of the random graph $n_k$ over the ensemble of graphs.
\\\\
The degree distribution in a random graph $G(n,p)$ is a \textbf{binomial distribution}:
\begin{align*}
    p_k = \binom{n-1}{k}p^k (1-p)^{n-1-k} = \text{bin}(n-1,p,k)
\end{align*}

That is, in the $G(n,p)$ model, the probability that a nodes has degree $k$ is $p_k$.
Also, the \textbf{average degree} of a randomly chosen node is
\begin{align*}
    \langle k \rangle = \sum^{n-1}_{k=0} kp_k = p(n-1)
\end{align*}

(with standard deviation $\sigma_k = \sqrt{p(1-p)(n-1))}$.
\\\\
In general, it is not so easy to compute
\[
    \binom(n-1)(k) p^k (1-p)^{n-1-k}
\]

However, in the limit $n \to \infty$ with $\langle k \rangle k = p(n-1)$ kept constant, the binomial distribution $\text{bin}(n-1,p,k)$ is well-approximated by the \textbf{Poisson distribution}:
\[
    p_k = e^{-\lambda} \frac{\lambda^k}{k!} = \text{Pois}(\lambda, k)
\]

where $\lambda = p(n-1)$.

\section{Giant Components \& Small Worlds}
Recall that a network may be made up of several \textbf{connected components}, and any connected network has a single connected component.
It is common in large networks to observe a \textbf{giant component}: a connected component which has a large proportion of the network's nodes.
This is particularly the case with graphs in $G_{ER}(n,p)$ with large enough $p$.
More formally, a connected component of a graph $G$ is called a \textbf{giant component} if its number of nodes increases with the order $n$ of $G$ as some positive power of $n$.
Suppose that $p(n) = cn^{-1}$ for some positive constant $c$;
then, the average degree $\langle k \rangle = pn = c$ remains fixed as $n \to \infty$.
For graphs $G_{ER}(n,p)$:
\begin{itemize}
    \item   If $c < 1$, the graph contains many small components with orders bounded by $O(\ln(n))$.
    \item   If $c=1$ the graph has large components of order $S = O(n^\frac{2}{3})$.
    \item   If $c > 1$, there is a unique \textbf{giant component} of order $S = O(n)$.
\end{itemize}

\subsection{Small World Network}
Many real-world networks are \textbf{small world networks}, wherein most pairs of nodes are only a few steps away from each other, and where nodes to form \textit{cliques}, i.e., subgraphs in which all nodes are connected to each other.
Three network attributes that measure these small-world effects are:
\begin{itemize}
    \item   \textbf{Characteristic path length}, $L$: the average length of all shortest paths in the network.
    \item   \textbf{Transitivity}, $T$: the proportion of \textit{triads} that form triangles.
    \item   \textbf{Clustering coefficient}, $C$: the average node clustering coefficient.
\end{itemize}

A network is called a \textbf{small world network} if it has:
\begin{itemize}
    \item   A small \textbf{average shortest path length} $L$ (scaling with $\log(n)$, where $n$ is the number of nodes) and
    \item   A high \textbf{clustering coefficient} $C$.
\end{itemize}

It turns out that ER random networks do have a small average shortest path length, but not a high clustering coefficient.
This observation justifies the need for a different model of random networks, if they are to be used to model the clustering behaviour of real-world networks.

\subsubsection{Distance}
We have seen how BFS can determine the length of a shortest path from a given node $x$ to any node $y$ in a \textit{connected network}.
An application to all nodes $x$ yields the shortest distances between all pairs of nodes.
Recall that the \textbf{distance matrix} of a connected graph $G = (X,E)$ is $\mathcal{D} = (d_{i,j})$ where entry $d_{i,j}$ is the length of the shortest path from node $i \in X$ to node $j \in X$.
(Note that $d_{i,i} = 0$ for all $i$).
There are a number of graph (and node) attributes that can be defined in terms of this matrix:
\begin{itemize}
    \item   The \textbf{eccentricity} $e_i$ of a node $i \in X$ is the maximum distance between $i$ and any other vertex in $G$, so $e_i = \text{max}_j(d_{i,j})$.
    \item   The \textbf{graph radius} $R$ is the minimum eccentricity, $R = \text{min}_i(e_i)$.
    \item   The \textbf{graph diameter} $D$ is the maximum eccentricity: $D = \text{max}_i(e_i) = - \text{max}_{i,j} (d_{i,j})$.
\end{itemize}

Note that one shouldn't think that the ``diameter is twice the radius'', but rather diameter is the distance between the points furthest from each other and radius is the distance from the ``centre'' to the furthest point from it.
It can be helpful to think about $P_n$.

\subsubsection{Characteristic Path Length}
The \textbf{characteristic path length} (i.e., the average shortest path length) $L$ of a graph $G$ is the average distance between pairs of nodes:
\[
    L = \frac{1}{n(n-1)} \sum_{i \neq j} d_{i,j}
\]

For graphs drawn from $G_{ER}(n,m)$ and $G_{ER}(n,p)$, $L = \frac{\ln(n)}{\ln( \langle k \rangle)}$, where $\langle k \rangle$ is the average degree of the network.

\subsubsection{Clustering}
In contrast to random graphs, real-world networks also contain \textbf{many triangles}: it is not uncommon that a friend of one of my friends is also my friend.
This \textbf{degree of transitivity} can be measured in several different ways.
For the first, we need two concepts:
\begin{itemize}
    \item   The number of \textbf{triangles} in $G$, denoted $n_\Delta$, is the number of subgraphs of $G$ that are isomorphic to $C_3$.
    \item   The number of \textbf{triads} in $G$, denoted $n_\land$, is the number of pairs of edges with a shared node.
\end{itemize}

There is an easy way to count the number of \textbf{triads} in a network:
if node $i$ has degree $k_i = \text{deg}(i)$, then it is involved in $\binom{k_i}{2}$ triads,
so the total number of triads is $n_\land = \sum_i \binom{k_i}{2}$.
\\\\
The \textbf{transitivity} $T$ of a graph $G = (X,E)$ is the proportion of \textbf{transitive} triads, i.e., triads which are subgraphs of \textbf{triangles}.
This proportion can be computed as follows:
\[
    T = 3 \frac{n_\Delta}{n_\land}
\]

where $n_\Delta$ is the number of triangles in $G$ and $n_\land$ is the number of triads.

\subsubsection{Small World Behaviour}
A network $G = (X,E)$ is said to exhibit \textbf{small world behaviour} if its characteristic path length $L$ grows proportionally to the logarithm of the number of nodes of $G$:
\[
    L \sim \ln(n)
\]

In this sense, the ensembles $G(n,m)$ \& $G(n,p)$ of random graphs do exhibit small world behaviour (as $n \to \infty$).

\subsubsection{Transitivity}
The \textbf{transitivity} $T$ of a graph $G=(X,E)$ is the proportion of \textbf{transitive} triads, i.e., triads which are subgraphs of \textbf{triangles}.
This proportion can be computed as:
\[
    T = \frac{3n_\Delta}{n_\land}
\]

where $n_\Delta$ is the number of triangles in $G$ and $n_\land$ is the number of triads.
\\\\
The transitivity of a graph in $G_{ER}(n,p)$ is easy to estimate:
for every triad, the ``third'' edge is present with probability $p$, so:
\[
    T = p
\]

Or, compute $\frac{3n_\Delta}{n_\land}$ using the explicit formulas from the previous lecture:
$n_\Delta = \binom{n}{3} p^3$ and $n_\land = 3 \binom{n}{3}p^2$.

\subsection{Clustering}
The concept of \textbf{clustering} measures the transitivity of a node, or of an entire graph in a different way.
To define it, we need the concept of an \textbf{induced subgraph}.

\subsubsection{Induced Subgraph}
Given $G = (X,E)$ and $Y \subset X$, the \textbf{induced subgraph} of $G$ on $Y$ is the graph $H = \left( Y, E \cap \binom{Y}{2} \right)$.
That is:
\begin{itemize}
    \item   $H$ is a subgraph of $G$ with node set $Y$.
    \item   $H$ has all possible edges in $G$ for which both nodes are in $Y$.
\end{itemize}

\subsubsection{Clustering Coefficient}
For a node $i \in X$ of a graph $G = (X,E)$, denote by $G_i$ the subgraph induced on the neighbours of $i$ in $G$, and by $m(G_i)$ its number of edges;
the \textbf{node clustering coefficient} $c_i$ of node $i$ is defined as:
\[
    c_i =
    \begin{cases}
        \binom{k_i}{2}^{-1} m(G_i) & k_i \geq 2 \\
        0 & \text{otherwise}
    \end{cases}
\]

That is, the node clustering coefficient measures the proportion of existing edges in its \textbf{social graph} among the possible edges.
\\\\
The \textbf{graph clustering coefficient} $C$ of $G$ is the average node clustering coefficient:
\[
    C = \langle c \rangle c = \frac{1}{n} \sum^n_{i=1} c_i
\]

By definition,  $0 \leq c_i \leq 1$ for all nodes $i \in X$, and $0 \leq C \leq 1$.
\\\\\
The \textbf{node clustering coefficient} of any node $i$ in a $G_{ER}(n,p)$ \textbf{random graph} is $c_i = p$, i.e., in any selection of potential edges, by construction a proportion $p$ of them is present in the random graph;
this is true in particular for the $\binom{k}{2}$ potential edges between the $k$ neighbours of a node of degree $k$.
The \textbf{graph clustering coefficient} of a $G_{ER}(n,p)$ \textbf{random graph} is:
\[
    C = p
\]

Note that when $p(n) = \langle k \rangle n^{-1}$ for a fixed expected average degree $\langle k \rangle$, then $C = \frac{\langle k \rangle}{n} \to 0$ for $n \to \infty$;
that is, in large $G_{ER}$ random graphs, the number of triangles is negligible.
In real-world networks, one often observers that $\frac{C}{\langle k \rangle}$ does not depend on $n$ (as $n \to \infty$).

\subsubsection{Clustering versus Transitivity}
For a node $i \in X$, denote by $n^\land_i = \binom{k_i}{2}$ the number of triads containing $i$ as their central node, and by $n_i^\Delta$ the actual number of triangles containing $i$;
then, the node clustering coefficient is:
\begin{align*}
    c_i = \frac{n_i^\Delta}{n_i^\land} \quad \text{ or,} \\
    n_i^\Delta = n_i^\land c_i
\end{align*}

Moreover, $3n_\Delta = \sum_i n_i^\Delta$ and $n_\land = \sum_i n_i^\land$;
it follows that $T = \frac{3n_\Delta}{n_\land} = \frac{1}{n_\land} \sum_i n_i^\land c_i$, in contrast to $C = \frac{1}{n} \sum_i c_i$.
That is, $C$ is the (plain) \textbf{average} of the node clustering coefficients, whereas $T$ is a \textbf{weighted average} of node clustering coefficients, giving higher weight to high-degree nodes.
\\\\
The fact that ER random networks tend to have low transitivity \& clustering shows the need for a new kind of (random) network construction that is better at modelling real-world networks.
One idea is to start with some \textbf{regular network} that naturally has \textit{high clustering}, and then to randomly distort its edges to introduce some \textbf{short paths}.


\end{document}