418 lines
21 KiB
TeX
418 lines
21 KiB
TeX
% ! TeX program = lualatex
|
|
\documentclass[a4paper,11pt]{article}
|
|
% packages
|
|
\usepackage{censor}
|
|
\StopCensoring
|
|
\usepackage{fontspec}
|
|
\setmainfont{EB Garamond}
|
|
% for tironian et fallback
|
|
% % \directlua{luaotfload.add_fallback
|
|
% % ("emojifallback",
|
|
% % {"Noto Serif:mode=harf"}
|
|
% % )}
|
|
% % \setmainfont{EB Garamond}[RawFeature={fallback=emojifallback}]
|
|
|
|
\setmonofont[Scale=MatchLowercase]{Deja Vu Sans Mono}
|
|
\usepackage[a4paper,left=2cm,right=2cm,top=\dimexpr15mm+1.5\baselineskip,bottom=2cm]{geometry}
|
|
\setlength{\parindent}{0pt}
|
|
|
|
\usepackage{fancyhdr} % Headers and footers
|
|
\fancyhead[R]{\normalfont \leftmark}
|
|
\fancyhead[L]{}
|
|
\pagestyle{fancy}
|
|
|
|
\usepackage{microtype} % Slightly tweak font spacing for aesthetics
|
|
\usepackage[english]{babel} % Language hyphenation and typographical rules
|
|
\usepackage{xcolor}
|
|
\definecolor{linkblue}{RGB}{0, 64, 128}
|
|
\usepackage[final, colorlinks = false, urlcolor = linkblue]{hyperref}
|
|
% \newcommand{\secref}[1]{\textbf{§~\nameref{#1}}}
|
|
\newcommand{\secref}[1]{\textbf{§\ref{#1}~\nameref{#1}}}
|
|
|
|
\usepackage{amsmath}
|
|
\usepackage[most]{tcolorbox}
|
|
\usepackage{changepage} % adjust margins on the fly
|
|
\usepackage{amsmath,amssymb}
|
|
|
|
\usepackage{minted}
|
|
\usemintedstyle{algol_nu}
|
|
|
|
\usepackage{pgfplots}
|
|
\pgfplotsset{width=\textwidth,compat=1.9}
|
|
|
|
\usepackage{caption}
|
|
\newenvironment{code}{\captionsetup{type=listing}}{}
|
|
\captionsetup[listing]{skip=0pt}
|
|
\setlength{\abovecaptionskip}{5pt}
|
|
\setlength{\belowcaptionskip}{5pt}
|
|
|
|
\usepackage[yyyymmdd]{datetime}
|
|
\renewcommand{\dateseparator}{--}
|
|
|
|
\usepackage{enumitem}
|
|
|
|
\usepackage{titlesec}
|
|
|
|
\author{Andrew Hayes}
|
|
|
|
\begin{document}
|
|
\begin{titlepage}
|
|
\begin{center}
|
|
\hrule
|
|
\vspace*{0.6cm}
|
|
\Huge \textsc{cs4423}
|
|
\vspace*{0.6cm}
|
|
\hrule
|
|
\LARGE
|
|
\vspace{0.5cm}
|
|
Networks
|
|
\vspace{0.5cm}
|
|
\hrule
|
|
|
|
\vfill
|
|
|
|
\hrule
|
|
\begin{minipage}{0.495\textwidth}
|
|
\vspace{0.4em}
|
|
\raggedright
|
|
\normalsize
|
|
\begin{tabular}{@{}l l}
|
|
Name: & Andrew Hayes \\
|
|
Student ID: & 21321503 \\
|
|
E-mail: & \href{mailto://a.hayes18@universityofgalway.ie}{a.hayes18@universityofgalway.ie} \\
|
|
\end{tabular}
|
|
\end{minipage}
|
|
\begin{minipage}{0.495\textwidth}
|
|
\raggedleft
|
|
\vspace*{0.8cm}
|
|
\Large
|
|
\today
|
|
\vspace*{0.6cm}
|
|
\end{minipage}
|
|
\medskip\hrule
|
|
\end{center}
|
|
\end{titlepage}
|
|
|
|
\pagenumbering{roman}
|
|
\newpage
|
|
\tableofcontents
|
|
\newpage
|
|
\setcounter{page}{1}
|
|
\pagenumbering{arabic}
|
|
|
|
\section{Introduction}
|
|
\textbf{CS4423 Networks} is a Semester 2 module on \textbf{Network Science}.
|
|
Modern societies are in many ways highly connected.
|
|
Certain aspects of this phenomenon are frequently described as \textbf{networks}.
|
|
CS4423 is an introduction to this emerging interdisciplinary subject.
|
|
We'll cover several major topics in this module, including:
|
|
\begin{itemize}
|
|
\item Graphs \& Graph Theory, and how they relate to networks;
|
|
\item Representations of networks, including as matrices;
|
|
\item Computing with networks, using \mintinline{python}{networkx} in Python;
|
|
\item Centrality measures;
|
|
\item Random graphs;
|
|
\item Small worlds;
|
|
\item Models of growing graphs;
|
|
\end{itemize}
|
|
|
|
Lecture notes \& assignments will come in the form of Jupyter notebooks, which allows us to include interactive Python code with the text.
|
|
|
|
\subsection{Lecturer Contact Information}
|
|
\begin{itemize}
|
|
\item Name: Dr Niall Madden.
|
|
\item School of Mathematical \& Statistical Sciences, University of Galway.
|
|
\item Office: Room ADB-1013, Arás de Brún.
|
|
\item E-mail: \href{mailto://niall.madden@universityofgalway.ie}{niall.madden@universityofgalway.ie}.
|
|
\item Website: \url{https://www.niallmadden.ie}
|
|
\end{itemize}
|
|
|
|
\subsection{Exam Information}
|
|
First year lecturing, should be similar to old exam papers.
|
|
Only looked at the past 2 years or so.
|
|
|
|
\subsection{Schedule}
|
|
Tentative schedule for labs / tutorials:
|
|
\begin{itemize}
|
|
\item Tuesday at 16:00 in AC215;
|
|
\item Wednesday at 10:00 in CA116a.
|
|
\end{itemize}
|
|
|
|
There will be some practicals during the semester: Week 3 ``Introduction to Python \& Jupyter'' sessions, later weeks help with assignments, preparations for exam, etc.
|
|
|
|
\subsection{Assessment}
|
|
\begin{itemize}
|
|
\item Two homework assignments.
|
|
Tentative deadlines: Weeks 5 \& 10.
|
|
Each contribute 10\% each to the final grade.
|
|
|
|
\item One in-class test.
|
|
Probably Week 7 (depending on FYP deadlines).
|
|
Contributes 10\% to the final grade.
|
|
|
|
\item Final exam: 70\%.
|
|
\end{itemize}
|
|
|
|
\subsection{Introduction to Networks}
|
|
Newman (for example) broadly divides the most commonly studied real-world networks into four classes:
|
|
\begin{enumerate}
|
|
\item \textbf{Technological networks:} rely on physical infrastructure.
|
|
In many cases, this infrastructure has been built over many decades and forms part of the backbone of modern societies, including roads \& other transportation networks, power grids, and communications networks.
|
|
|
|
\item \textbf{Social networks:} the vertices of a social network are people (or, at leasts, User IDs), with edges representing some sort of \textbf{social interaction}.
|
|
In sociology, the vertices are often called \textbf{actors}, and the edges are called \textbf{ties}.
|
|
Social networks are not just online: sociologists have studied social networks long before people started exhibiting their relations to others online.
|
|
Traditionally, data about the structure of social networks have been compiled by interviewing the people involved.
|
|
|
|
\item \textbf{Information networks:} consist of \textbf{data items} which are linked to each other in some way.
|
|
Examples include relational databases.
|
|
Sets of information (like scientific publications) have been linking to each other (e.g., through citations) long before computers were invented, although links in digital form are easier to follow.
|
|
\\\\
|
|
The \textbf{WWW} is probably the most widespread \& best-known example of an information network.
|
|
Its nodes are \textbf{web pages} containing information in form of text \& pictures, and its edges are the \textbf{hyperlinks}, allowing us to surf or navigate from page to page.
|
|
Hyperlinks run in one direction only, from the page that contains the hyperlink to the page that is referenced.
|
|
Therefore, the WWW is a \textbf{directed network}, a graph where each edge has a direction.
|
|
|
|
\item \textbf{Biological networks:}
|
|
\begin{itemize}
|
|
\item \textbf{Biochemical networks} represent molecular-level patterns of interaction \& control mechanisms in the biological cell, including metabolic networks, protein-protein interaction networks, \& genetic regulatory networks.
|
|
|
|
\item A \textbf{neural network} can be represented as a set of vertices, the neurons, connected by two types of directed edges, one for excitatory inputs and one for inhibitory inputs.
|
|
(Not to be confused with an artificial neural network).
|
|
|
|
\item \textbf{Ecological networks} are networks of ecological interactions between species.
|
|
\end{itemize}
|
|
\end{enumerate}
|
|
|
|
In each case, a network connects parts of a system (\textbf{nodes}) by some means (\textbf{links}).
|
|
Different techniques are used to display, discover, \& measure the structure in each example.
|
|
\\\\
|
|
In its simplest form, a \textbf{network} is just a collection of points (called \textbf{vertices} or \textbf{nodes}), some of which are joined in pairs (called \textbf{edges} or \textbf{links}).
|
|
Many systems of interest are composed of individual parts that are in some way linked together: such systems can be regarded as networks, and thinking about them in this way can often lead to new \& useful insights.
|
|
\\\\
|
|
\textbf{Network science} studies the patterns of connections between the components of a system.
|
|
Naturally, the structure of the networks can have a big impact on the behaviour of the system.
|
|
A \textbf{network} is a simplified representation of a complex system by vertices \& edges.
|
|
The scientific study of networks is an interdisciplinary undertaking that combines ideas from mathematics, computer science, physics, the social sciences, \& biology.
|
|
Between these scientific fields, many tools have been developed for analysing, modeling, \& understanding networks.
|
|
|
|
\subsubsection{Network Measures}
|
|
\textbf{Centrality} is an example of a useful \& important type of network measure; it is concerned with the question of how important a particular vertex or edge is in a networked system.
|
|
Different concepts have been proposed to capture mathematically what it means to be central.
|
|
For example, a simple measure of the centrality of a vertex is its \textbf{degree}, that is, the number of edges it is part of (or, equivalently, the number of vertices it is adjacent to).
|
|
Applications of centrality include determining which entities in a social network have the most influence, or which links in a power grid are most vulnerable.
|
|
\\\\
|
|
Which measurements \& calculations give meaningful answers for a particular system depends of course on the specific nature of the system and the questions one wants to ask.
|
|
|
|
\subsubsection{Network Concepts}
|
|
Another interesting network concept is the \textbf{small-world effect}, which is concerned with the question of how far apart two randomly chosen points in a network typically are.
|
|
Here, \textbf{distance} is usually measured by the number of edges one would need to cross over when travelling along a \textbf{path} from one vertex to another.
|
|
In real-world social networks, the distance between people tends to be rather small.
|
|
|
|
\section{Graphs}
|
|
A \textbf{graph} can serve as a mathematical model of a network.
|
|
Later, we will use the \mintinline{python}{networkx} package to work with examples of graphs \& networks.
|
|
|
|
\subsection{Example: The Internet (circa 1970)}
|
|
\begin{figure}[H]
|
|
\centering
|
|
\includegraphics[width=0.7\textwidth]{./images/f7dec1970.jpg}
|
|
\caption{
|
|
The Internet (more precisely, ARPANET) in December 1970.
|
|
Nodes are computers, connected by a link if they can directly communicate with each other.
|
|
At the time, only 13 computers participated in that network.
|
|
}
|
|
\end{figure}
|
|
|
|
\begin{code}
|
|
\begin{minted}[linenos, breaklines, frame=single]{text}
|
|
UCSB SRI UCLA
|
|
SRI UCLA STAN UTAH
|
|
UCLA STAN RAND
|
|
UTAH SDC MIT
|
|
RAND SDC BBN
|
|
MIT BBN LINC
|
|
BBN HARV
|
|
LINC CASE
|
|
HARV CARN
|
|
CASE CARN
|
|
\end{minted}
|
|
\caption{\texttt{arpa.adj}}
|
|
\end{code}
|
|
|
|
The following \textbf{diagram}, built from the adjacencies in \verb|arpa.adj|, contains the same information as in the above figure, without the distracting details of US geography;
|
|
this is actually an important point, as networks only reflect the \textbf{topology} of the object being studied.
|
|
|
|
\begin{code}
|
|
\begin{minted}[linenos, breaklines, frame=single]{python}
|
|
H = nx.read_adjlist("../data/arpa.adj")
|
|
opts = { "with_labels": True, "node_color": 'y' }
|
|
nx.draw(H, **opts)
|
|
\end{minted}
|
|
\caption{\texttt{arpa.adj}}
|
|
\end{code}
|
|
|
|
\begin{figure}[H]
|
|
\centering
|
|
\includegraphics[width=0.7\textwidth]{./images/qwe_download.png}
|
|
\caption{ The ARPA Network as a Graph }
|
|
\end{figure}
|
|
|
|
\subsection{Simple Graphs}
|
|
A \textbf{simple graph} is a pair $G = (X,E)$ consisting of a finite set $X$ of objects called \textit{nodes}, \textit{vertices}, or \textit{points} and a set of \textit{links} or \textit{edges} $E$ which are each a set of two different vertices.
|
|
\begin{itemize}
|
|
\item We can also write $E \subseteq \binom{X}{2}$, where $\binom{X}{2}$ ($X$ \textit{choose} 2) is the set of all $2$-element subsets of $X$.
|
|
\item The \textbf{order} of the graph $G$ is denoted as $n = |X|$, where $n$ is the number of vertices in the graph.
|
|
\item The \textbf{size} of the graph is denoted as $m = |E|$, where $m$ is the number of edges in the graph.
|
|
Naturally, $m \leq \binom{n}{2}$.
|
|
\end{itemize}
|
|
|
|
\subsection{Subgraphs \& Induced Subgraphs}
|
|
Given $G = (X,E)$, a \textbf{subgraph} of $G$ is $H=(Y, E_H)$ with $Y \subseteq X$ and $E_H \subseteq E \cap \binom{Y}{s}$;
|
|
therefore, all the nodes in $H$ are also in $G$ and any edge in $H$ was also in $G$, and is incident only to vertices in $Y$.
|
|
\\\\
|
|
One of the most important subgraphs of $G$ is the \textbf{induced subgraph} on $Y \subseteq X$: $H = (Y, E \cap \binom{Y}{2})$;
|
|
that is, given a subset $Y$ of $X$, we include all possible edges from the original graph $G$ too.
|
|
Each node has a list of \textbf{neighbours} which are the nodes it is directly connected to by an edge of the graph.
|
|
|
|
\subsection{Important Graphs}
|
|
The \textbf{complete graph} on a vertex set $X$ is the graph with edge set $\binom{X}{2}$.
|
|
For example, if $X = \{0,1,2,3 \}$, then $E = \{01,02,03,12,13,23\}$
|
|
\\\\
|
|
The \textbf{Petersen graph} is a graph on 10 vertices with 15 edges.
|
|
It can be constructed as the complement of the line graph of the complete graph $K_5$, that is, as the graph with the vertex set $X = \binom{ \{0,1,2,3,4\} }{2}$ (the edge set of $K_5$) and with an edge between $x,y \in X$ whenever $x \cap y = \emptyset$.
|
|
\\\\
|
|
A graph is \textbf{bipartite} if we can divide the node set $X$ into two subsets $X_1$ and $X_2$ such that:
|
|
\begin{itemize}
|
|
\item $X_1 \cap X_2 = \emptyset$ (the sets have no edge in common);
|
|
\item $X_1 \cup X_2 = X$.
|
|
\end{itemize}
|
|
|
|
For any edge $(u_1, u_2)$, we have $u_1 \in X_1$ and $u_2 \in X_2$; that is, we only ever have edges between nodes from different sets.
|
|
Such graphs are very common in Network Science, where nodes in the network represent two different types of entities; for example, we might have a graph wherein nodes represent students and modules, with edges between students and modules they were enrolled in, often called an \textbf{affiliation network}.
|
|
\\\\
|
|
A \textbf{complete bipartite graph} is a particular bipartite graph wherein there is an edge between every node in $X_1$ and every node in $X_2$.
|
|
Such graphs are denoted $K_{m,n}$, where $|X_1| = m$ and $|X_2|=n$.
|
|
\\\\
|
|
The \textbf{path graph} with $n$ nodes, denoted $P_n$, is a graph where two nodes have degree 1, and the other $n-2$ have degree 2.
|
|
\\\\
|
|
The \textbf{cycle graph} on $n \geq 3$ nodes, denoted $C_n$ (slightly informally) is formed by adding an edge between the two nodes of degree 1 in a path graph.
|
|
|
|
\subsection{New Graphs from Old}
|
|
The \textbf{complement} of a graph $G$ is a graph $H$ with the same nodes as $G$ but each pair of nodes in $H$ are adjacent if and only if they are \textit{not adjacent} in $G$.
|
|
The complement of a complete graph is an empty graph.
|
|
\\\\
|
|
A graph $G$ can be thought of as being made from ``things'' that have connection to each other: the ``things'' are nodes, and their connections are represented by an edge.
|
|
However, we can also think of edges as ``things'' that are connected to any other edge with which they share a vertex in common.
|
|
This leads to the idea of a line graph:
|
|
the \textbf{line graph} of a graph $G$, denoted $L(G)$ is the graph where every node in $L(G)$ corresponds to an edge in $G$, and for every pair of edges in $G$ that share a node, $L(G)$ has an edge between their corresponding nodes.
|
|
|
|
\section{Matrices of Graphs}
|
|
There are various was to represent a graph, including the node set, the edge set, or a drawing of the graph;
|
|
one of the most useful representations of a graph for computational purposes is as a \textbf{matrix}; the three most important matrix representations are:
|
|
\begin{itemize}
|
|
\item The \textbf{adjacency matrix} (most important);
|
|
\item The \textbf{incidence matrix} (has its uses);
|
|
\item The \textbf{graph Laplacian} (the coolest).
|
|
\end{itemize}
|
|
|
|
\subsection{Adjacency Matrices}
|
|
The \textbf{adjacency matrix} of a graph $G$ of order $n$ is a square $n \times n$ matrix $A = (a_{i,j})$ with rows \& columns corresponding to the nodes of the graph, that is, we number the nodes $1, 2, \dots, n$.
|
|
Then, $A$ is given by:
|
|
\begin{align*}
|
|
a_{i,j} =
|
|
\begin{cases}
|
|
1 & \text{if nodes } i \text{ and } j \text{ are joined by an edge,} \\
|
|
0 & \text{otherwise}
|
|
\end{cases}
|
|
\end{align*}
|
|
|
|
Put another way, $a_{i,j}$ is the number of edges between node $i$ and node $j$.
|
|
Properties of adjacency matrices include:
|
|
\begin{itemize}
|
|
\item $\sum^N_{i=1} \sum^N_{j=1} a_{i,j} = \sum_{u \in X}\text{deg}(u)$ where $\text{deg}(u)$ is the degree of $u$.
|
|
\item All graphs that we've seen hitherto are \textit{undirected}: for all such graphs, $A$ is symmetric.
|
|
$A = A^T$ and, equivalently, $a_{i,j} = a{j,i}$.
|
|
\item $a_{i,i} = 0$ for all $i$.
|
|
\item In real-world examples, $A$ is usually \textbf{sparse} which means that $\sum^N_{i=1} \sum^N_{j=1} a_{i,j} \ll n^2$, that is, the vast majority of the entries are zero.
|
|
Sparse matrices have huge importance in computational linear algebra: an important idea is that is much more efficient to just store the location of the non-zero entities in a sparse matrix.
|
|
\end{itemize}
|
|
|
|
Any matrix $M = (m_{i,j})$ with the properties that all entries are zero or one and that the diagonal entries are zero (i.e., $m_{i,j}=0$) is an adjacency matrix of \textit{some} graph (as long as we don't mind too much about node labels).
|
|
In a sense, every square matrix defines a graph if:
|
|
\begin{itemize}
|
|
\item We allow loops (an edge between a node and itself).
|
|
\item Every edge has a weight: this is equivalent to the case for our more typical graphs that every potential edge is weighted 0 (is not in the edge set) or 1 (is in the edge set).
|
|
\item There are two edges between each node (one in each direction) and they can have different weights.
|
|
\end{itemize}
|
|
|
|
\subsubsection{Examples of Adjacency Matrices}
|
|
Let $G = G(X,E)$ be the graph with $X = \{a,b,c,d,e\}$ nodes and edges $\{a \leftrightarrow b, b \leftrightarrow c, b \leftrightarrow d, c \leftrightarrow d, d \leftrightarrow e \}$.
|
|
Then:
|
|
\begin{align*}
|
|
A =
|
|
\begin{pmatrix}
|
|
0 & 1 & 0 & 0 & 0 \\
|
|
1 & 0 & 1 & 1 & 0 \\
|
|
0 & 1 & 0 & 1 & 0 \\
|
|
0 & 1 & 1 & 0 & 1 \\
|
|
0 & 0 & 0 & 1 & 0 \\
|
|
\end{pmatrix}
|
|
\end{align*}
|
|
|
|
The adjacency matrix of $K_4$ is:
|
|
\begin{align*}
|
|
A =
|
|
\begin{pmatrix}
|
|
0 & 1 & 1 & 1 \\
|
|
1 & 0 & 1 & 1 \\
|
|
1 & 1 & 0 & 1 \\
|
|
1 & 1 & 1 & 0 \\
|
|
\end{pmatrix}
|
|
\end{align*}
|
|
|
|
\subsection{Degree}
|
|
The \textbf{degree} of a node in a simple graph is the number of nodes to which it is adjacent, i.e., its number of neighbours.
|
|
For a node $v$ we denote this number $\text{deg}(v)$.
|
|
The degree of a node can serve as a (simple) measure of the importance of a node in a network.
|
|
Recall that one of the basic properties of an adjacency matrix is $\sum^n_{i=1} \sum^n_{j=1} a_{i,j} = \sum_{u \in X} \text{deg}(u)$, where $\text{deg}(u)$ is the degree of $u$ and $n$ is the order of the graph;
|
|
this relates to a (crude) measure of how connected a network is: the \textbf{average degree}:
|
|
\begin{align*}
|
|
\text{Average degree} = \frac{1}{n} \sum_{u \in X} \text{deg}(u) = \frac{1}{n}\sum^n_{i,j} a_{i,j}
|
|
\end{align*}
|
|
However, if the size of the network (the number of edges) is $m$, then the total sum of degrees is $2m$ (since each edge contributes to the degree count of two nodes), meaning that the average degree is $\frac{2m}{n}$.
|
|
|
|
\subsection{Walks}
|
|
A \textbf{walk} in a graph is a series of edges (perhaps with some repeated) $\{ u_1 \leftrightarrow v_1, u_2 \leftrightarrow u_2, \dots, u_p \leftrightarrow v_p\}$ with the property that $v_i = u_{i+1}$.
|
|
If $v_p = u_1$, then it is a \textbf{closed walk}.
|
|
The \textbf{length} of a walk is the number of edges in it.
|
|
\\\\
|
|
Adjacency matrices can be used to enumerate the number of walks of a given length between a pair of vertices.
|
|
Obviously, $a_{i,j}$ is the number of walks of length 1 between node $i$ and node $j$.
|
|
We can extract that information for node $j$ by computing the product of $A$ and $e_j$ (column $j$of the identity matrix).
|
|
|
|
\section{Connectivity \& Permutations}
|
|
To start, let's decide on our notation:
|
|
\begin{itemize}
|
|
\item If we write $A = (a_{i,j})$, we mean that $A$ is a matrix and $a_{i,j}$ is its entry row $i$, column $j$.
|
|
\item We also write such entries as $(A)_{i,j}$;
|
|
the reason for this slightly different notation is to allow us to write, for example, $(A^2)_{i,j}$ is the entry in row $i$, column $j$ of $B = A^2$.
|
|
\item The \textbf{trace} of a matrix is the sum of its diagonal entries, that is, $\text{tr}(A) = \sum^n_{i=1}a_{i,i}$. (Very standard).
|
|
\item When we write $A > 0$, we mean that all entries of $A$ are positive.
|
|
\end{itemize}
|
|
|
|
Recall that the \textbf{adjacency matrix} of a graph $G$ of order $N$ is a square $n \times n$ matrix $A = (a_{i,j})$ with rows and columns corresponding to the nodes of the graph.
|
|
$a_{i,j}$ is set to be the number of edges between nodes $i$ and $j$.
|
|
We learned previously that:
|
|
\begin{itemize}
|
|
\item If $e_j$ is the $j^\text{th}$ column of the
|
|
\end{itemize}
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
\end{document}
|