[CT4101]: Week 9 lecture notes

This commit is contained in:
2024-11-06 23:29:35 +00:00
parent 960cb8eddb
commit 9397827eb0
2 changed files with 39 additions and 0 deletions

View File

@ -1575,10 +1575,49 @@ To make such judgements without deep domain knowledge, a normalise \textbf{domai
\\\\
The \textbf{$R^2$ coefficient} is a domain-independent measure that compares the performance of a model on a test set with the performance of an imaginary model that always predicts the average values from the test set.
$R^2$ values may be interpreted as the amount of variation in the target feature that is explained by the descriptive features in the model.
\begin{align*}
\text{sum of squared errors} =& \frac{1}{2} \sum^n_{i=1} \left( t_i - \mathbb{M} \left( d_i \right) \right)^2 \\
\text{total sum of squares} =& \frac{1}{2} \sum^n_{i=1} \left( t_i - \overline{t} \right)^2 \\
R^2 =& \frac{\text{sum of squared errors}}{\text{total sum of squares}}
\end{align*}
where $\overline{t}$ is the average value of the target variable.
\\\\
$R^2$ values are usually in the range $[0,1]$, with larger values indicating better performance.
However, $R^2$ values can be $< 0$ in certain rare cases (although 1 is always the maximum $R^2$ value).
Negative $R^2$ values indicate a very poor model performance, i.e. that the model performs worse than the horizontal straight-line hypothesis that always predicts the average value of the target feature.
For example, a negative $R^2$ on the test set with a positive $R^2$ value on the training set likely indicates that the model is overfit to the training data.
\subsection{Applying $k$-NN to Regression Tasks}
Previously, we have seen that the $k$-nearest neighbours algorithm bases its prediction on several ($k$) nearest neighbours by computing the distance from the query case to all stored cases, and picking the $k$ nearest neighbours.
When $k$-NN is used for classification tasks, the neighbours vote on the classification of the test case.
In \textbf{regression} tasks, the average value of the neighbours is taken as the label for the query case.
\subsubsection{Uniform Weighting}
Assuming that each neighbour is given an equal weighting:
\begin{align*}
\text{prediction}(q) = \frac{1}{k} \sum^k_{i=1} t_i
\end{align*}
where $q$ is a vector containing the attribute values for the query instance, $k$ is the number of neighbours, $t_i$ is the target value of neighbour $i.
\subsubsection{Distance Weighting}
Assuming that each neighbour is given a weight based on the inverse square of its distance from the query instance:
\begin{align*}
\text{prediction}(q) = \frac{ \sum^k_{i=1} \left( \frac{1}{ \text{dist}(q, d_i)^2 \times t_i } \right) }{ \sum^k_{i=1} \left( \frac{1}{\text{dist}(q, d_i)^2 } \right) }
\end{align*}
where $q$ is a vector containing the attribute values for the query instance and $\text{dist}(q,d_i)$ returns the distance between the query and the neighbour $i$.
\subsection{Applying Decision Trees to Regression}
\textbf{Regression trees} are constructed similarly to those for classification;
the main change is that the function used to measure the quality of a split is changed so that it is a measure relevant to regression, e.g. variance, MSE, MAE, etc.
This adapation is easily made to the ID3/C4.5 algorithm.
\\\\
The aim in regression trees is to group similar target values together at a leaf node.
Typically, a regression tree returns the mean target value at a leaf node.