[CT4101]: Add Week 6 lecture notes

2024-10-16 17:15:47 +01:00
parent 6a947a806f
commit dbdfc24fcb
2 changed files with 66 additions and 1 deletions
--- a/Learning/notes/CT4101-Notes.pdf
+++ b/Learning/notes/CT4101-Notes.pdf
--- a/Learning/notes/CT4101-Notes.tex
+++ b/Learning/notes/CT4101-Notes.tex
@ -3,6 +3,8 @@
 % packages
 \usepackage{censor}
 \usepackage{multicol}
+\usepackage{algorithm}
+\usepackage{algpseudocode}
 \StopCensoring
 \usepackage{fontspec}
 \setmainfont{EB Garamond}
@ -917,7 +919,7 @@ $\left| S \right|$ \& $\left| S_v \right|$ refer to the cardinality or size of t
 When selecting an attribute for a node in a decision tree, we use whichever attribute $A$ that gives the greatest information gain.

 \begin{tcolorbox}[colback=gray!10, colframe=black, title=\textbf{Worked Information Gain Example}]
-    Given $\left| S \right| = 14$, $\left| S_{\text{windy} = \text{true}} \right| = 14$, \& $\left| S_{\text{windy} = \text{false}} \right| = 14$, calculate the information gain of the attribute ``windy''.
+    Given $\left| S \right| = 14$, $\left| S_{\text{windy} = \text{true}} \right| = 6$, \& $\left| S_{\text{windy} = \text{false}} \right| = 8$, calculate the information gain of the attribute ``windy''.

    \begin{align*}
        \text{Gain}(S, \text{windy}) =& \text{Ent}(S) - \frac{\left| S_{\text{windy} = \text{true}} \right|}{\left| S \right|} \text{Ent}(S_\text{windy} = \text{true})
@ -928,5 +930,68 @@ When selecting an attribute for a node in a decision tree, we use whichever attr
    \end{align*}
 \end{tcolorbox}

+The best partitioning is the one that results in the highest information gain.
+Once the best split for the root node is found, the procedure is repeated with each subset of examples.
+$S$ will then refer to the subset in the partition being considered instead of the entire dataset.
+
+\subsection{Computing the Gini Index}
+An alternative to using entropy as the measure of the impurity of a set is to use the \textbf{Gini Index}:
+\[
+    \text{Gini}(S) = 1 - \sum^n_{i=1} p_i^2
+\]
+
+This is the default measure of impurity in scikit-learn.
+The gain for a feature can then be calculated based off the reduction in the Gini Index (rather than as a reduction in entropy):
+\[
+    \text{GiniGain}(S,A) = \text{Gini}(S) = \sum_{v \in \text{Values}(A)} \frac{\left| S_v \right|}{\left|S\right|}\text{Gini}(S_v)
+\]
+
+\subsection{The ID3 Algorithm}
+\begin{algorithm}[H]
+\caption{ID3 Algorithm}
+\begin{algorithmic}[1]
+\Procedure{ID3}{Examples, Attributes, Target}
+    \State \textbf{Input:} 
+    \State \quad Examples: set of classified examples
+    \State \quad Attributes: set of attributes in the examples
+    \State \quad Target: classification to be predicted
+    \If{Examples is empty}
+        \State \Return Default class
+    \ElsIf{all Examples have the same class}
+        \State \Return this class
+    \ElsIf{all Attributes are tested}
+        \State \Return majority class
+    \Else
+        \State Let Best = attribute that best separates Examples relative to Target
+        \State Let Tree = new decision tree with Best as root node
+        \ForAll{value $v_i$ of Best}
+            \State Let Examples$_i$ = subset of Examples where Best = $v_i$
+            \State Let Subtree = ID3(Examples$_i$, Attributes - Best, Target)
+            \State Add branch from Tree to Subtree with label $v_i$
+        \EndFor
+        \State \Return Tree
+    \EndIf
+\EndProcedure
+\end{algorithmic}
+\end{algorithm}
+
+\subsection{Decision Tree Summary}
+Decision trees are popular because:
+\begin{itemize}
+    \item   It's a relatively easy algorithm to implement.
+    \item   It's fast: greedy search without backtracking.
+    \item   It has comprehensible output, which is important in decision-making (medical, financial, etc.).
+    \item   It's practical.
+    \item   It's \textbf{expressive:} a decision tree can technically represent any boolean function, although some functions require exponentially large trees such as a parity function.
+\end{itemize}
+
+\subsubsection{Dealing with Noisy or Missing Data}
+If the data is inconsistent or \textit{noisy} we can either use the majority class as in line 11 of the above ID3 algorithm, or interpret the values as probabilities, or return the average target feature value.
+
+
+
+
+
+

 \end{document}