551 lines
22 KiB
TeX
551 lines
22 KiB
TeX
%! TeX program = lualatex
|
|
\documentclass[a4paper,11pt]{article}
|
|
% packages
|
|
\usepackage{censor}
|
|
\StopCensoring
|
|
\usepackage{fontspec}
|
|
\usepackage{tcolorbox}
|
|
\setmainfont{EB Garamond}
|
|
% for tironian et fallback
|
|
% % \directlua{luaotfload.add_fallback
|
|
% % ("emojifallback",
|
|
% % {"Noto Serif:mode=harf"}
|
|
% % )}
|
|
% % \setmainfont{EB Garamond}[RawFeature={fallback=emojifallback}]
|
|
|
|
\setmonofont[Scale=MatchLowercase]{Deja Vu Sans Mono}
|
|
\usepackage[a4paper,left=2cm,right=2cm,top=\dimexpr15mm+1.5\baselineskip,bottom=2cm]{geometry}
|
|
\setlength{\parindent}{0pt}
|
|
|
|
\usepackage{fancyhdr} % Headers and footers
|
|
\fancyhead[R]{\normalfont \leftmark}
|
|
\fancyhead[L]{}
|
|
\pagestyle{fancy}
|
|
|
|
\usepackage{microtype} % Slightly tweak font spacing for aesthetics
|
|
\usepackage{amsmath}
|
|
\usepackage[english]{babel} % Language hyphenation and typographical rules
|
|
\usepackage{xcolor}
|
|
\definecolor{linkblue}{RGB}{0, 64, 128}
|
|
\usepackage[final, colorlinks = false, urlcolor = linkblue]{hyperref}
|
|
% \newcommand{\secref}[1]{\textbf{§~\nameref{#1}}}
|
|
\newcommand{\secref}[1]{\textbf{§\ref{#1}~\nameref{#1}}}
|
|
|
|
\usepackage{changepage} % adjust margins on the fly
|
|
|
|
\usepackage{minted}
|
|
\usemintedstyle{algol_nu}
|
|
|
|
\usepackage{pgfplots}
|
|
\pgfplotsset{width=\textwidth,compat=1.9}
|
|
|
|
\usepackage{caption}
|
|
\newenvironment{code}{\captionsetup{type=listing}}{}
|
|
\captionsetup[listing]{skip=0pt}
|
|
\setlength{\abovecaptionskip}{5pt}
|
|
\setlength{\belowcaptionskip}{5pt}
|
|
|
|
\usepackage[yyyymmdd]{datetime}
|
|
\renewcommand{\dateseparator}{--}
|
|
|
|
\usepackage{enumitem}
|
|
|
|
\usepackage{titlesec}
|
|
|
|
\author{Andrew Hayes}
|
|
|
|
\begin{document}
|
|
\begin{titlepage}
|
|
\begin{center}
|
|
\hrule
|
|
\vspace*{0.6cm}
|
|
\censor{\huge \textbf{CT4100}}
|
|
\vspace*{0.6cm}
|
|
\hrule
|
|
\LARGE
|
|
\vspace{0.5cm}
|
|
Information Retrieval
|
|
\vspace{0.5cm}
|
|
\hrule
|
|
|
|
\vfill
|
|
\vfill
|
|
|
|
\hrule
|
|
\begin{minipage}{0.495\textwidth}
|
|
\vspace{0.4em}
|
|
\raggedright
|
|
\normalsize
|
|
Name: Andrew Hayes \\
|
|
E-mail: \href{mailto://a.hayes18@universityofgalway.ie}{\texttt{a.hayes18@universityofgalway.ie}} \hfill\\
|
|
Student ID: 21321503 \hfill
|
|
\end{minipage}
|
|
\begin{minipage}{0.495\textwidth}
|
|
\raggedleft
|
|
\vspace*{0.8cm}
|
|
\Large
|
|
\today
|
|
\vspace*{0.6cm}
|
|
\end{minipage}
|
|
\medskip\hrule
|
|
\end{center}
|
|
\end{titlepage}
|
|
|
|
\pagenumbering{roman}
|
|
\newpage
|
|
\tableofcontents
|
|
\newpage
|
|
\setcounter{page}{1}
|
|
\pagenumbering{arabic}
|
|
|
|
\section{Introduction}
|
|
\subsection{Lecturer Contact Details}
|
|
\begin{itemize}
|
|
\item Colm O'Riordan.
|
|
\item \href{mailto://colm.oriordan@universityofgalway.ie}{\texttt{colm.oriordan@universityofgalway.ie}}.
|
|
\end{itemize}
|
|
|
|
\subsection{Motivations}
|
|
\begin{itemize}
|
|
\item To study/analyse techniques to deal suitably with the large amounts (\& types) of information.
|
|
\item Emphasis on research \& practice in Information Retrieval.
|
|
\end{itemize}
|
|
|
|
\subsection{Related Fields}
|
|
\begin{itemize}
|
|
\item Artificial Intelligence.
|
|
\item Database \& Information Systems.
|
|
\item Algorithms.
|
|
\item Human-Computer Interaction.
|
|
\end{itemize}
|
|
|
|
\subsection{Recommended Texts}
|
|
\begin{itemize}
|
|
\item \textit{Modern Information Retrieval} -- Riberio-Neto \& Baeza-Yates (several copies in library).
|
|
\item \textit{Information Retrieval} -- Grossman.
|
|
\item \textit{Introduction to Information Retrieval} -- Christopher Manning.
|
|
\item Extra resources such as research papers will be recommended as extra reading.
|
|
\end{itemize}
|
|
|
|
\subsection{Grading}
|
|
\begin{itemize}
|
|
\item Exam: 70\%.
|
|
\item Assignment 1: 30\%.
|
|
\item Assignment 2: 30\%.
|
|
\end{itemize}
|
|
|
|
There will be exercise sheets posted for most lecturers; these are not mandatory and are intended as a study aid.
|
|
|
|
\subsection{Introduction to Information Retrieval}
|
|
\textbf{Information Retrieval (IR)} deals with identifying relevant information based on users' information needs, e.g.
|
|
web search engines, digital libraries, \& recommender systems.
|
|
It is finding material (usually documents) of an unstructured nature that satisfies an information need within large
|
|
collections (usually stored on computers).
|
|
|
|
\section{Information Retrieval Models}
|
|
\subsection{Introduction to Information Retrieval Models}
|
|
\textbf{Data collections} are well-structured collections of related items; items are usually atomic with a
|
|
well-defined interpretation.
|
|
Data retrieval involves the selection of a fixed set of data based on a well-defined query (e.g., SQL, OQL).
|
|
\\\\
|
|
\textbf{Information collections} are usually semi-structured or unstructured.
|
|
Information Retrieval (IR) involves the retrieval of documents of natural language which is typically not
|
|
structured and may be semantically ambiguous.
|
|
|
|
\subsubsection{Information Retrieval vs Information Filtering}
|
|
The main differences between information retrieval \& information filtering are:
|
|
\begin{itemize}
|
|
\item The nature of the information need.
|
|
\item The nature of the document set.
|
|
\end{itemize}
|
|
|
|
Other than these two differences, the same models are used.
|
|
Documents \& queries are represented using the same set of techniques and similar comparison algorithms are also
|
|
used.
|
|
|
|
\subsubsection{User Role}
|
|
In traditional IR, the user role was reasonably well-defined in that a user:
|
|
\begin{itemize}
|
|
\item Formulated a query.
|
|
\item Viewed the results.
|
|
\item Potentially offered feedback.
|
|
\item Potentially reformulated their query and repeated steps.
|
|
\end{itemize}
|
|
|
|
In more recent systems, with the increasing popularity of the hypertext paradigm, users usually intersperse
|
|
browsing with the traditional querying.
|
|
This raises many new difficulties \& challenges.
|
|
|
|
\subsection{Pre-Processing}
|
|
\textbf{Document pre-processing} is the application of a set of well-known techniques to the documents \& queries
|
|
prior to any comparison.
|
|
This includes, among others:
|
|
\begin{itemize}
|
|
\item \textbf{Stemming:} the reduction of words to a potentially common root.
|
|
The most common stemming algorithms are Lovin's \& Porter's algorithms.
|
|
E.g. \textit{computerisation},
|
|
\textit{computing}, \textit{computers} could all be stemmed to the common form \textit{comput}.
|
|
\item \textbf{Stop-word removal:} the removal of very frequent terms from documents, which add little to the
|
|
semantics of meaning of the document.
|
|
\item \textbf{Thesaurus construction:} the manual or automatic creation of thesauri used to try to identify
|
|
synonyms within the documents.
|
|
\end{itemize}
|
|
|
|
\textbf{Representation} \& comparison technique depends on the information retrieval model chosen.
|
|
The choice of feedback techniques is also dependent on the model chosen.
|
|
|
|
\subsection{Models}
|
|
Retrieval models can be broadly categorised as:
|
|
\begin{itemize}
|
|
\item Boolean:
|
|
\begin{itemize}
|
|
\item Classical Boolean.
|
|
\item Fuzzy Set approach.
|
|
\item Extended Boolean.
|
|
\end{itemize}
|
|
|
|
\item Vector:
|
|
\begin{itemize}
|
|
\item Vector Space approach.
|
|
\item Latent Semantic indexing.
|
|
\item Neural Networks.
|
|
\end{itemize}
|
|
|
|
\item Probabilistic:
|
|
\begin{itemize}
|
|
\item Inference Network.
|
|
\item Belief Network.
|
|
\end{itemize}
|
|
\end{itemize}
|
|
|
|
We can view any IR model as being comprised of:
|
|
\begin{itemize}
|
|
\item $D$ is the set of logical representations within the documents.
|
|
\item $Q$ is the set of logical representations of the user information needs (queries).
|
|
\item $F$ is a framework for modelling representations ($D$ \& $Q$) and the relationship between $D$ \& $Q$.
|
|
\item $R$ is a ranking function which defines an ordering among the documents with regard to any query $q$.
|
|
\end{itemize}
|
|
|
|
We have a set of index terms:
|
|
$$
|
|
t_1, \dots , t_n
|
|
$$
|
|
|
|
A \textbf{weight} $w_{i,j}$ is assigned to each term $t_i$ occurring in the $d_j$.
|
|
We can view a document or query as a vector of weights:
|
|
$$
|
|
\vec{d_j} = (w_1, w_2, w_3, \dots)
|
|
$$
|
|
|
|
\subsection{Boolean Model}
|
|
The \textbf{Boolean model} of information retrieval is based on set theory \& Boolean algebra.
|
|
A query is viewed as a Boolean expression.
|
|
The model also assumes terms are present or absent, hence term weights $w_{i,j}$ are binary \& discrete, i.e.,
|
|
$w_{i,j}$ is an element of $\{0, 1\}$.
|
|
\\\\
|
|
Advantages of the Boolean model include:
|
|
\begin{itemize}
|
|
\item Clean formalism.
|
|
\item Widespread \& popular.
|
|
\item Relatively simple
|
|
\end{itemize}
|
|
|
|
Disadvantages of the Boolean model include:
|
|
\begin{itemize}
|
|
\item People often have difficulty formulating expressions, harbours some difficulty in use.
|
|
\item Documents are considered either relevant or irrelevant; no partial matching allowed.
|
|
\item Poor performance.
|
|
\item Suffers badly from natural language effects of synonymy etc.
|
|
\item No ranking of results.
|
|
\item Terms in a document are considered independent of each other.
|
|
\end{itemize}
|
|
|
|
\subsubsection{Example}
|
|
$$
|
|
q = t_1 \land (t_2 \lor (\neg t_3))
|
|
$$
|
|
|
|
\begin{minted}[linenos, breaklines, frame=single]{sql}
|
|
q = t1 AND (t2 OR (NOT t3))
|
|
\end{minted}
|
|
|
|
This can be mapped to what is termed \textbf{disjunctive normal form}, where we have a series of disjunctions
|
|
(or logical ORs) of conjunctions.
|
|
|
|
$$
|
|
q = 100 \lor 110 \lor 111
|
|
$$
|
|
|
|
If a document satisfies any of the components, the document is deemed relevant and returned.
|
|
|
|
\subsection{Vector Space Model}
|
|
The \textbf{vector space model} attempts to improve upon the Boolean model by removing the limitation of binary
|
|
weights for index terms.
|
|
Terms can have non-binary weights in both queries \& documents.
|
|
Hence, we can represent the documents \& the query as $n$-dimensional vectors.
|
|
|
|
$$
|
|
\vec{d_j} = (w_{1,j}, w_{2,j}, \dots, w_{n,j})
|
|
$$
|
|
$$
|
|
\vec{q} = (w_{1,q}, w_{2,q}, \dots, w_{n,q})
|
|
$$
|
|
|
|
We can calculate the similarity between a document \& a query by calculating the similarity between the vector
|
|
representations of the document \& query by measuring the cosine of the angle between the two vectors.
|
|
$$
|
|
\vec{a} \cdot \vec{b} = \mid \vec{a} \mid \mid \vec{b} \mid \cos (\vec{a}, \vec{b})
|
|
$$
|
|
$$
|
|
\Rightarrow \cos (\vec{a}, \vec{b}) = \frac{\vec{a} \cdot \vec{b}}{\mid \vec{a} \mid \mid \vec{b} \mid}
|
|
$$
|
|
|
|
We can therefore calculate the similarity between a document and a query as:
|
|
$$
|
|
\text{sim}(q,d) = \cos (\vec{q}, \vec{d}) = \frac{\vec{q} \cdot \vec{d}}{\mid \vec{q} \mid \mid \vec{d} \mid}
|
|
$$
|
|
|
|
Considering term weights on the query and documents, we can calculate similarity between the document \& query as:
|
|
$$
|
|
\text{sim}(q,d) =
|
|
\frac
|
|
{\sum^N_{i=1} (w_{i,q} \times w_{i,d})}
|
|
{\sqrt{\sum^N_{i=1} (w_{i,q})^2} \times \sqrt{\sum^N_{i=1} (w_{i,d})^2} }
|
|
$$
|
|
|
|
Advantages of the vector space model over the Boolean model include:
|
|
\begin{itemize}
|
|
\item Improved performance due to weighting schemes.
|
|
\item Partial matching is allowed which gives a natural ranking.
|
|
\end{itemize}
|
|
|
|
The primary disadvantage of the vector space model is that terms are considered to be mutually independent.
|
|
|
|
\subsubsection{Weighting Schemes}
|
|
We need a means to calculate the term weights in the document and query vector representations.
|
|
A term's frequency within a document quantifies how well a term describes a document;
|
|
the more frequently a term occurs in a document, the better it is at describing that document and vice-versa.
|
|
This frequency is known as the \textbf{term frequency} or \textbf{tf factor}.
|
|
\\\\
|
|
If a term occurs frequently across all the documents, that term does little to distinguish one document from another.
|
|
This factor is known as the \textbf{inverse document frequency} or \textbf{idf-frequency}.
|
|
Traditionally, the most commonly-used weighting schemes are know as \textbf{tf-idf} weighting schemes.
|
|
\\\\
|
|
For all terms in a document, the weight assigned can be calculated as:
|
|
$$
|
|
w_{i,j} = f_{i,j} \times \log \left( \frac{N}{N_i} \right)
|
|
$$
|
|
where
|
|
\begin{itemize}
|
|
\item $f_{i,j}$ is the (possibly normalised) frequency of term $t_i$ in document $d_j$.
|
|
\item $N$ is the number of documents in the collection.
|
|
\item $N_i$ is the number of documents that contain term $t_i$.
|
|
\end{itemize}
|
|
|
|
\section{Evaluation of IR Systems}
|
|
When evaluating an IR system, we need to consider:
|
|
\begin{itemize}
|
|
\item The \textbf{functional requirements}: whether or not the system works as intended.
|
|
This is done with standard testing techniques.
|
|
\item The \textbf{performance:}
|
|
\begin{itemize}
|
|
\item Response time.
|
|
\item Space requirements.
|
|
\item Measure by empirical analysis, efficiency of algorithms \& data structures for compression,
|
|
indexing, etc.
|
|
\end{itemize}
|
|
\item The \textbf{retrieval performance:} how useful is the system?
|
|
IR is a highly empirical discipline and there is a long history of the evaluation of retrieval performance.
|
|
This is less of an issue in data retrieval systems wherein perfect matching is possible as there exists
|
|
a correct answer.
|
|
\end{itemize}
|
|
|
|
\subsection{Test Collections}
|
|
Evaluation of IR systems is usually based on a reference \textbf{test collection} involving human evaluations.
|
|
The test collection usually comprises:
|
|
\begin{itemize}
|
|
\item A collection of documents $D$.
|
|
\item A set of information needs that can be represented as queries.
|
|
\item A list of relevance judgements for each query-document pair.
|
|
\end{itemize}
|
|
|
|
Issues with using test collections include:
|
|
\begin{itemize}
|
|
\item It can be very costly to obtain relevance judgements.
|
|
\item Crowd sourcing.
|
|
\item Pooling approaches.
|
|
\item Relevance judgements don't have to be binary.
|
|
\item Agreement among judges.
|
|
\end{itemize}
|
|
|
|
\textbf{TREC (Text REtrieval Conference)} provides a means to empirically test the performance of systems in
|
|
different domains by providing \textit{tracks} consisting of a data set \& test problems.
|
|
These tracks include:
|
|
\begin{itemize}
|
|
\item \textbf{Ad-hoc retrieval:} different tracks have been proposed to test ad-hoc retrieval including the
|
|
Web track (retrieval on web corpora) and the Million Query track (large number of queries).
|
|
\item \textbf{Interactive Track}: users interact with the system for relevance feedback.
|
|
\item \textbf{Contextual Search:} multiple queries over time.
|
|
\item \textbf{Entity Retrieval:} the task is to retrieve entities (people, places, organisations).
|
|
\item \textbf{Spam Filtering:} identifying \& filtering out non-relevant or harmful content such as email
|
|
spam.
|
|
\item \textbf{Question Answering (QA):} the goal is to retrieve precise answers to user questions rather than
|
|
returning entire documents.
|
|
\item \textbf{Cross-Language Retrieval:} the goal is to retrieve relevant documents in a different language
|
|
from the query.
|
|
Requires machine translation.
|
|
\item \textbf{Conversational IR:} retrieving information in conversational IR systems.
|
|
\item \textbf{Sentiment Retrieval:} emphasis on identifying opinions \& sentiments.
|
|
\item \textbf{Fact Checking:} misinformation track.
|
|
\item \textbf{Domain-Specific Retrieval:} e.g., genomic data.
|
|
\item Summarisation Tasks.
|
|
\end{itemize}
|
|
|
|
Relevance is assessed for the information need and not the query.
|
|
Because tuning \& optimisation can occur for many IR systems, it is considered good practice to tune on one
|
|
collection and then test on another.
|
|
\\\\
|
|
Interaction with an IR system may be a one-off query or an interactive session.
|
|
For the former, \textit{quality} of the returned set is the important metric, while for interactive systems other
|
|
issues have to be considered: duration of the session, user effort required, etc.
|
|
These issues make evaluation of interactive sessions more difficult.
|
|
|
|
\subsection{Precision \& Recall}
|
|
The most commonly used metrics are \textbf{precision} \& \textbf{recall}.
|
|
\subsubsection{Unranked Sets}
|
|
Given a set $D$ and a query $Q$, let $R$ be the set of documents relevant to $Q$.
|
|
Let $A$ be the set actually returned by the system.
|
|
\begin{itemize}
|
|
\item \textbf{Precision} is defined as $\frac{|R \cap A|}{|A|} = \frac{\text{relevant retrieved documents}}{\text{all retrieved documents}}$, i.e. what fraction of the retrieved documents are relevant.
|
|
\item \textbf{Recall} is defined as $\frac{|R \cap A|}{|R|} = \frac{\text{relevant retrieved documents}}{\text{all relevant documents}}$, i.e. what fraction of the relevant documents were returned.
|
|
\end{itemize}
|
|
|
|
Having two separate measures is useful as different IR systems may have different user requirements.
|
|
For example, in web search precision is of the greatest importance, but in the legal domain recall is of the greatest
|
|
importance.
|
|
\\\\
|
|
There is a trade-off between the two measures; for example, by returning every document in the set, recall is
|
|
maximised (because all relevant documents will be returned) but precision will be poor (because many irrelevant documents will be returned).
|
|
Recall is non-decreasing as the number of documents returned increases, while precision usually decreases as the
|
|
number of documents returned increases.
|
|
|
|
\begin{table}[h!]
|
|
\centering
|
|
\begin{tabular}{|p{0.3\textwidth}|p{0.3\textwidth}|p{0.3\textwidth}|}
|
|
\hline
|
|
& \textbf{Relevant} & \textbf{Non-Relevant} \\
|
|
\hline
|
|
\textbf{Relevant} & True Positive (TP) & False Negative (FN) \\
|
|
\hline
|
|
\textbf{Non-Relevant} & False Positive (FP) & True Negative (TN) \\
|
|
\hline
|
|
\end{tabular}
|
|
\caption{Confusion Matrix of True/False Positives \& Negatives}
|
|
\end{table}
|
|
|
|
$$
|
|
\text{Precision } P = \frac{tp}{tp + fp} = \frac{\text{true positives}}{\text{true positives + false positives}}
|
|
$$
|
|
$$
|
|
\text{Recall } R = \frac{tp}{tp + fn} = \frac{\text{true positives}}{\text{true positives + false negatives}}
|
|
$$
|
|
|
|
The \textbf{accuracy} of a system is the fraction of these classifications that are correct:
|
|
$$
|
|
\text{Accuracy} = \frac{tp + tn}{tp +fp + fn + tn}
|
|
$$
|
|
|
|
Accuracy is a commonly used evaluation measure in machine learning classification work, but is not a very useful
|
|
measure in IR; for example, when searching for relevant documents in a very large set, the number of irrelevant
|
|
documents is usually much higher than the number of relevant documents, meaning that a high accuracy score is
|
|
attainable by getting true negatives by discarding most documents, even if there aren't many true positives.
|
|
\\\\
|
|
There are also many single-value measures that combine precision \& recall into one value:
|
|
\begin{itemize}
|
|
\item F-measure.
|
|
\item Balanced F-measure.
|
|
\end{itemize}
|
|
|
|
\subsubsection{Evaluation of Ranked Results}
|
|
In IR, returned documents are usually ranked.
|
|
One way of evaluating ranked results is to use \textbf{Precision-Recall plots}, wherein precision is typically
|
|
plotted against recall.
|
|
In an ideal system, we would have a precision value of 1 for a recall value of 1, i.e., all relevant documents
|
|
have been returned and no irrelevant documents have been returned.
|
|
|
|
\begin{tcolorbox}[colback=gray!10, colframe=black, title=Example]
|
|
Given $|D| = 20$ \& $|R| = 10$ and a ranked list of length 10, let the returned ranked list be:
|
|
$$
|
|
\mathbf{d_1}, \mathbf{d_2}, d_3, \mathbf{d_4}, d_5, d_6, \mathbf{d_7}, d_8, d_9, d_{10}
|
|
$$
|
|
|
|
where those in items in bold are those that are relevant.
|
|
\begin{itemize}
|
|
\item Considering the list as far as the first document: Precision = 1, Recall = 0.1.
|
|
\item As far as the first two documents: Precision = 1, Recall = 0.2.
|
|
\item As far as the first three documents: Precision = 0.67, Recall = 0.2.
|
|
\end{itemize}
|
|
|
|
We usually plot for recall values = 10\% ... 90\%.
|
|
\end{tcolorbox}
|
|
|
|
We typically calculate precision for these recall values over a set of queries to get a truer measure of a system's
|
|
performance:
|
|
$$
|
|
P(r) = \frac{1}{N} \sum^N_{i=1}P_i(r)
|
|
$$
|
|
|
|
Advantages of Precision-Recall include:
|
|
\begin{itemize}
|
|
\item Widespread use.
|
|
\item It gives a definable measure.
|
|
\item It summarises the behaviour of an IR system.
|
|
\end{itemize}
|
|
|
|
Disadvantages of Precision-Recall include:
|
|
\begin{itemize}
|
|
\item It's not always possible to calculate the recall measure effective of queries in batch mode.
|
|
\item Precision \& recall graphs can only be generated when we have ranking.
|
|
\item They're not necessarily of interest to the user.
|
|
\end{itemize}
|
|
|
|
Single-value measures for evaluating ranked results include:
|
|
\begin{itemize}
|
|
\item Evaluating precision when every new document is retrieved and averaging precision values.
|
|
\item Evaluating precision when the first relevant document is retrieved.
|
|
\item $R$-precision: calculate precision when the final document has been retrieved.
|
|
\item Precision at $k$ (P@k).
|
|
\item Mean Average Precision (MAP).
|
|
\end{itemize}
|
|
|
|
Precision histograms are used to compare two algorithms over a set of queries.
|
|
We calculate the $R$-precision (or possibly another single summary statistic) of two systems over all queries.
|
|
The difference between the two are plotted for each of the queries.
|
|
|
|
\subsection{User-Oriented Measures}
|
|
Let $D$ be the document set, $R$ be the set of relevant documents, $A$ be the answer set returned to the users,
|
|
and $U$ be the set of relevant documents previously known to the user.
|
|
Let $AU$ be the set of returned documents previously known to the user.
|
|
$$
|
|
\text{Coverage} = \frac{|AU|}{|U|}
|
|
$$
|
|
Let \textit{New} refer to the set of relevant documents returned to the user that were previously unknown to the user.
|
|
We can define \textbf{novelty} as:
|
|
$$
|
|
\text{Novelty} = \frac{|\text{New}|}{|\text{New}| + |AU|}
|
|
$$
|
|
|
|
The issues surrounding interactive sessions are much more difficult to assess.
|
|
Much of the work in measuring user satisfaction comes from the field of HCI.
|
|
The usability of these systems is usually measured by monitoring user behaviour or via surveys of the user's
|
|
experience.
|
|
Another closely related area is that of information visualisation: ow best to represent the retrieved data for a
|
|
user etc.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
\end{document}
|