[CS4423]: WK10-1 lecture materials & notes

2025-03-19 09:29:46 +00:00
parent cc23f66fa0
commit 9b9ec8b956
3 changed files with 74 additions and 0 deletions
--- a/year4/semester2/CS4423/materials/CS4423-W10-Part-1.pdf
+++ b/year4/semester2/CS4423/materials/CS4423-W10-Part-1.pdf
--- a/year4/semester2/CS4423/notes/CS4423.pdf
+++ b/year4/semester2/CS4423/notes/CS4423.pdf
--- a/year4/semester2/CS4423/notes/CS4423.tex
+++ b/year4/semester2/CS4423/notes/CS4423.tex
@ -1092,7 +1092,81 @@ However, in the limit $n \to \infty$ with $\langle k \rangle k = p(n-1)$ kept co

 where $\lambda = p(n-1)$.

+\section{Giant Components \& Small Worlds}
+Recall that a network may be made up of several \textbf{connected components}, and any connected network has a single connected component.
+It is common in large networks to observe a \textbf{giant component}: a connected component which has a large proportion of the network's nodes.
+This is particularly the case with graphs in $G_{ER}(n,p)$ with large enough $p$.
+More formally, a connected component of a graph $G$ is called a \textbf{giant component} if its number of nodes increases with the order $n$ of $G$ as some positive power of $n$.
+Suppose that $p(n) = cn^{-1}$ for some positive constant $c$;
+then, the average degree $\langle k \rangle = pn = c$ remains fixed as $n \to \infty$.
+For graphs $G_{ER}(n,p)$:
+\begin{itemize}
+    \item   If $c < 1$, the graph contains many small components with orders bounded by $O(\ln(n))$.
+    \item   If $c=1$ the graph has large components of order $S = O(n^\frac{2}{3})$.
+    \item   If $c > 1$, there is a unique \textbf{giant component} of order $S = O(n)$.
+\end{itemize}

+\subsection{Small World Network}
+Many real-world networks are \textbf{small world networks}, wherein most pairs of nodes are only a few steps away from each other, and where nodes to form \textit{cliques}, i.e., subgraphs in which all nodes are connected to each other.
+Three network attributes that measure these small-world effects are:
+\begin{itemize}
+    \item   \textbf{Characteristic path length}, $L$: the average length of all shortest paths in the network.
+    \item   \textbf{Transitivity}, $T$: the proportion of \textit{triads} that form triangles.
+    \item   \textbf{Clustering coefficient}, $C$: the average node clustering coefficient.
+\end{itemize}
+
+A network is called a \textbf{small world network} if it has:
+\begin{itemize}
+    \item   A small \textbf{average shortest path length} $L$ (scaling with $\log(n)$, where $n$ is the number of nodes) and
+    \item   A high \textbf{clustering coefficient} $C$.
+\end{itemize}
+
+It turns out that ER random networks do have a small average shortest path length, but not a high clustering coefficient.
+This observation justifies the need for a different model of random networks, if they are to be used to model the clustering behaviour of real-world networks.
+
+\subsubsection{Distance}
+We have seen how BFS can determine the length of a shortest path from a given node $x$ to any node $y$ in a \textit{connected network}.
+An application to all nodes $x$ yields the shortest distances between all pairs of nodes.
+Recall that the \textbf{distance matrix} of a connected graph $G = (X,E)$ is $\mathcal{D} = (d_{i,j})$ where entry $d_{i,j}$ is the length of the shortest path from node $i \in X$ to node $j \in X$.
+(Note that $d_{i,i} = 0$ for all $i$).
+There are a number of graph (and node) attributes that can be defined in terms of this matrix:
+\begin{itemize}
+    \item   The \textbf{eccentricity} $e_i$ of a node $i \in X$ is the maximum distance between $i$ and any other vertex in $G$, so $e_i = \text{max}_j(d_{i,j})$.
+    \item   The \textbf{graph radius} $R$ is the minimum eccentricity, $R = \text{min}_i(e_i)$.
+    \item   The \textbf{graph diameter} $D$ is the maximum eccentricity: $D = \text{max}_i(e_i) = - \text{max}_{i,j} (d_{i,j})$.
+\end{itemize}
+
+Note that one shouldn't think that the ``diameter is twice the radius'', but rather diameter is the distance between the points furthest from each other and radius is the distance from the ``centre'' to the furthest point from it.
+It can be helpful to think about $P_n$.
+
+\subsubsection{Characteristic Path Length}
+The \textbf{characteristic path length} (i.e., the average shortest path length) $L$ of a graph $G$ is the average distance between pairs of nodes:
+\[
+    L = \frac{1}{n(n-1)} \sum_i \sum_j d_{i,j}
+\]
+
+For graphs drawn from $G_{ER}(n,m)$ and $G_{ER}(n,p)$, $L = \frac{\ln(n)}{\ln( \langle k \rangle)}$.
+
+\subsubsection{Clustering}
+In contrast to random graphs, real-world networks also contain \textbf{many triangles}: it is not uncommon that a friend of one of my friends is also my friend.
+This \textbf{degree of transitivity} can be measured in several different ways.
+For the first, we need two concepts:
+\begin{itemize}
+    \item   The number of \textbf{triangles} in $G$, denoted $n_\Delta$, is the number of subgraphs of $G$ that are isomorphic to $C_3$.
+    \item   The number of \textbf{triads} in $G$, denoted $n_\land$, is the number of pairs of edges with a shared node.
+\end{itemize}
+
+There is an easy way to count the number of \textbf{triads} in a network:
+if node $i$ has degree $k_i = \text{deg}(i)$, then it is involved in $\binom{k_i}{2}$ triads,
+so the total number of triads is $n_\land = \sum_i \binom{k_i}{2}$.
+\\\\
+The \textbf{transitivity} $T$ of a graph $G = (X,E)$ is the proportion of \textbf{transitive} triads, i.e., triads which are subgraphs of \textbf{triangles}.
+This proportion can be computed as follows:
+\[
+    T = 3 \frac{n_\Delta}{n_\land}
+\]
+
+where $n_\Delta$ is the number of triangles in $G$ and $n_\land$ is the number of triads.