[CT4101]: Add Week 6 lecture notes

2024-10-16 17:15:47 +01:00
parent 6a947a806f
commit dbdfc24fcb
2 changed files with 66 additions and 1 deletions
--- a/Learning/notes/CT4101-Notes.pdf
+++ b/Learning/notes/CT4101-Notes.pdf
--- a/Learning/notes/CT4101-Notes.tex
+++ b/Learning/notes/CT4101-Notes.tex
@ -3,6 +3,8 @@
 % packages
 \usepackage{censor}
 \usepackage{multicol}
 \usepackage{algorithm}
 \usepackage{algpseudocode}
 \StopCensoring
 \usepackage{fontspec}
 \setmainfont{EB Garamond}
@ -917,7 +919,7 @@ $\left| S \right|$ \& $\left| S_v \right|$ refer to the cardinality or size of t
 When selecting an attribute for a node in a decision tree, we use whichever attribute $A$ that gives the greatest information gain.
 \begin{tcolorbox}[colback=gray!10, colframe=black, title=\textbf{Worked Information Gain Example}]
-    Given $\left| S \right| = 14$, $\left| S_{\text{windy} = \text{true}} \right| = 14$, \& $\left| S_{\text{windy} = \text{false}} \right| = 14$, calculate the information gain of the attribute ``windy''.
+    Given $\left| S \right| = 14$, $\left| S_{\text{windy} = \text{true}} \right| = 6$, \& $\left| S_{\text{windy} = \text{false}} \right| = 8$, calculate the information gain of the attribute ``windy''.
    \begin{align*}
        \text{Gain}(S, \text{windy}) =& \text{Ent}(S) - \frac{\left| S_{\text{windy} = \text{true}} \right|}{\left| S \right|} \text{Ent}(S_\text{windy} = \text{true})
@ -928,5 +930,68 @@ When selecting an attribute for a node in a decision tree, we use whichever attr
    \end{align*}
 \end{tcolorbox}
 The best partitioning is the one that results in the highest information gain.
 Once the best split for the root node is found, the procedure is repeated with each subset of examples.
 $S$ will then refer to the subset in the partition being considered instead of the entire dataset.
 \subsection{Computing the Gini Index}
 An alternative to using entropy as the measure of the impurity of a set is to use the \textbf{Gini Index}:
 \[
    \text{Gini}(S) = 1 - \sum^n_{i=1} p_i^2
 \]
 This is the default measure of impurity in scikit-learn.
 The gain for a feature can then be calculated based off the reduction in the Gini Index (rather than as a reduction in entropy):
 \[
    \text{GiniGain}(S,A) = \text{Gini}(S) = \sum_{v \in \text{Values}(A)} \frac{\left| S_v \right|}{\left|S\right|}\text{Gini}(S_v)
 \]
 \subsection{The ID3 Algorithm}
 \begin{algorithm}[H]
 \caption{ID3 Algorithm}
 \begin{algorithmic}[1]
 \Procedure{ID3}{Examples, Attributes, Target}
    \State \textbf{Input:} 
    \State \quad Examples: set of classified examples
    \State \quad Attributes: set of attributes in the examples
    \State \quad Target: classification to be predicted
    \If{Examples is empty}
        \State \Return Default class
    \ElsIf{all Examples have the same class}
        \State \Return this class
    \ElsIf{all Attributes are tested}
        \State \Return majority class
    \Else
        \State Let Best = attribute that best separates Examples relative to Target
        \State Let Tree = new decision tree with Best as root node
        \ForAll{value $v_i$ of Best}
            \State Let Examples$_i$ = subset of Examples where Best = $v_i$
            \State Let Subtree = ID3(Examples$_i$, Attributes - Best, Target)
            \State Add branch from Tree to Subtree with label $v_i$
        \EndFor
        \State \Return Tree
    \EndIf
 \EndProcedure
 \end{algorithmic}
 \end{algorithm}
 \subsection{Decision Tree Summary}
 Decision trees are popular because:
 \begin{itemize}
    \item   It's a relatively easy algorithm to implement.
    \item   It's fast: greedy search without backtracking.
    \item   It has comprehensible output, which is important in decision-making (medical, financial, etc.).
    \item   It's practical.
    \item   It's \textbf{expressive:} a decision tree can technically represent any boolean function, although some functions require exponentially large trees such as a parity function.
 \end{itemize}
 \subsubsection{Dealing with Noisy or Missing Data}
 If the data is inconsistent or \textit{noisy} we can either use the majority class as in line 11 of the above ID3 algorithm, or interpret the values as probabilities, or return the average target feature value.
 \end{document}