uni/year2/semester1/logseq-stuff/pages/Normalisation.md

- #[[CT230 - Database Systems I]]
- **Previous Topic:** [[Joins & Union Queries]]
- **Next Topic:** [[Query Processing: Relational Algebra]]
- **Relevant Slides:** ![normalisation_2022_part1.pdf](../assets/normalisation_2022_part1_1666177004532_0.pdf) ![normalisation_2022_part2.pdf](../assets/normalisation_2022_part2_1666776016494_0.pdf)
-
- # Normalisation
	- What is **normalisation**? #card
	  card-last-interval:: 0.95
	  card-repeats:: 2
	  card-ease-factor:: 2.36
	  card-next-schedule:: 2022-11-18T17:36:23.333Z
	  card-last-reviewed:: 2022-11-17T19:36:23.333Z
	  card-last-score:: 3
		- **Normalisation** takes each table through a series of tests to "verify" whether or not it belongs to a certain **normal form**.
		- Normal forms to check:
			- 1^{st}, 2^{nd}, & 3^{rd} normal forms (**NF**).
			- Boyce-Codd normal form - strong 3NF.
			- 4^{th} & 5^{th} Normal Forms.
		- ### Normalisation Provides:
			- 1. A formal framework for analysing relation schemas based on **keys** & **functional dependencies** among attributes.
			  2. A series of **tests** so that a database can be normalised to any degree (e.g., from 1NF to 5NF).
			- However, normalisation does not necessarily provide a good design if considered in isolation to everything else.
	- What are **normalisation rules**? #card
	  card-last-interval:: -1
	  card-repeats:: 1
	  card-ease-factor:: 2.5
	  card-next-schedule:: 2022-11-24T00:00:00.000Z
	  card-last-reviewed:: 2022-11-23T12:19:46.203Z
	  card-last-score:: 1
		- **Normalisation** rules gives us a *formal measure* of why one grouping of attributes in a relation schema may be better than the other.
	- Why normalise? #card
	  card-last-interval:: 0.95
	  card-repeats:: 2
	  card-ease-factor:: 2.36
	  card-next-schedule:: 2022-11-22T11:06:25.159Z
	  card-last-reviewed:: 2022-11-21T13:06:25.160Z
	  card-last-score:: 3
		- 1. Redundancy will be reduced or eliminated, reducing storage space as a result.
		  2. The task of maintaining data integrity is made easier.
		- However, with normalisation, tables are usually added to the schema and are linked with foreign keys, which causes queries to become more complex as the often require data from multiple tables (requiring joins or subqueries).
	- ## Normalised & Un-Normalised Databases
		- Both normalised & un-normalised databases have advantages & disadvantages.
		- If a data base is **normalised**:
			- No (or very little) redundancy.
			- No anomalies when inserting, deleting, or modifying data.
			- More tables.
			- More foreign & primary keys to link tables.
			- More complex queries.
		- ### Redundancy
			- What is **redundancy**? #card
			  card-last-interval:: 4
			  card-repeats:: 2
			  card-ease-factor:: 2.7
			  card-next-schedule:: 2022-11-18T20:19:53.452Z
			  card-last-reviewed:: 2022-11-14T20:19:53.453Z
			  card-last-score:: 5
				- **Redundancy** is the unnecessary duplication of data in a database.
				- Consequences of redundancy:
					- Space is wasted.
					- Data can become inconsistent due to potential problems with update, insert, & delete operations.
			- What is **duplication**? #card
			  card-last-interval:: 2.97
			  card-repeats:: 2
			  card-ease-factor:: 2.6
			  card-next-schedule:: 2022-11-17T19:17:11.015Z
			  card-last-reviewed:: 2022-11-14T20:17:11.015Z
			  card-last-score:: 5
				- Duplicated data can naturally be present in a database and is not necessarily redundant.
				- For example, an attribute can have two identical values.
					- In the company database `ESSN` in `works_on` may be duplicated across many projects.
				- Data is **duplicated** rather than **redundant** if information is lost when deleting data
				-
	- ## Alternatives to Normalisation #card
	  card-last-interval:: -1
	  card-repeats:: 1
	  card-ease-factor:: 2.6
	  card-next-schedule:: 2022-11-15T00:00:00.000Z
	  card-last-reviewed:: 2022-11-14T20:19:50.503Z
	  card-last-score:: 1
		- The alternative to normalisation is to retain redundant data and maintain data integrity by means of code consistency checks.
		- In some applications, the number of insertions may be very small or non-existent and in such cases, the overhead of normalised tables is generally not required.
	- ## De-Normalisation
		- What is **de-normalisation**? #card
		  card-last-interval:: -1
		  card-repeats:: 1
		  card-ease-factor:: 2.5
		  card-next-schedule:: 2022-11-19T00:00:00.000Z
		  card-last-reviewed:: 2022-11-18T18:34:49.533Z
		  card-last-score:: 1
			- **De-normalisation** is a process of making compromises to the normalised tables by ^^introducing intentional redundancy^^ for performance reasons (specifically, querying performance).
			- Typically, de-normalisation will improve query times at the expense of data updates (insert, delete, update).
- # Functional Dependencies
	- What is **Functional Dependency**? #card
	  card-last-interval:: -1
	  card-repeats:: 1
	  card-ease-factor:: 2.5
	  card-next-schedule:: 2022-11-22T00:00:00.000Z
	  card-last-reviewed:: 2022-11-21T13:06:48.090Z
	  card-last-score:: 1
		- If $A$ & $B$ are attributes of a relation $R$, then $B$ is **functionally dependent (FD)** on $A$ if each value of $A$ is associated with exactly one value of $B$.
			- i.e., values in $B$ are uniquely determined by values of $A$.
		- Functional Dependency is one of the main concepts associated with normalisation.
		- It describes the ^^relationship between attributes.^^
		- $A$ -> $B$:
			- FD from $A$ to $B$.
			- $B$ is FD on $A$.
			- ![image.png](../assets/image_1666178440078_0.png)
		- $A$ -> $B$ does not necessarily imply $B$ -> $A$.
		- $A$ <-> $B$ denotes $A$ -> $B$ & $B$ -> $A$.
		- $A$ -> $\{B,C\}$ denotes $A$ -> $B$ & $A$ -> $C$.
		- $\{A,B\}$ -> $C$ denotes that it is the **combination** of $A$ & $B$ that uniquely determines $C$.
	- ### Note on FDs
		- A functional dependency is a property of a relation schema $R$ and cannot be inferred automatically. Instead, it must be defined explicitly by someone who knows the **semantics** of $R$.
		- You will either be explicitly given all FDs, or given enough information about the attributes & the domain to *reasonably* infer the FDs (perhaps having to make assumptions).
	- ## Types of FDs #card
	  card-last-interval:: -1
	  card-repeats:: 1
	  card-ease-factor:: 2.5
	  card-next-schedule:: 2022-11-18T00:00:00.000Z
	  card-last-reviewed:: 2022-11-17T20:19:30.192Z
	  card-last-score:: 1
		- ### Full Functional Dependency
			- What is a **Full Functional Dependency**? #card
			  card-last-interval:: -1
			  card-repeats:: 1
			  card-ease-factor:: 2.5
			  card-next-schedule:: 2022-11-18T00:00:00.000Z
			  card-last-reviewed:: 2022-11-17T20:15:37.929Z
			  card-last-score:: 1
				- A functional dependency $\{X, Y\}$ -> $Z$ is a **full functional dependency** if when some attribute (either $X$ or $Y$) is removed from the left-hand side, the dependency ^^does not hold.^^
					- There may be any number of attributes on the LHS.
		- ### Partial Functional Dependency
			- What is a **Partial Functional Dependency**? #card
			  card-last-interval:: -1
			  card-repeats:: 1
			  card-ease-factor:: 2.5
			  card-next-schedule:: 2022-11-15T00:00:00.000Z
			  card-last-reviewed:: 2022-11-14T20:15:25.672Z
			  card-last-score:: 1
				- A functional dependency $\{X, Y\}$ -> $Z$ is a **partial functional dependency** if some attribute (either $X$ or $Y$) can be removed from the LHS and the dependency ^^still holds.^^
					- There may be any number of attributes on the LHS.
		- ### Transitive Functional Dependency
			- What is a **Transitive Functional Dependency**? #card
			  card-last-interval:: -1
			  card-repeats:: 1
			  card-ease-factor:: 2.5
			  card-next-schedule:: 2022-11-23T00:00:00.000Z
			  card-last-reviewed:: 2022-11-22T13:40:05.155Z
			  card-last-score:: 1
				- A functional dependency $X$ -> $Y$ is a **transitive functional dependency** in the relation $R$ if there is a set of attributes $Z$ that is neither a candidate key nor a subset of any key of $R$, and both $X$ -> $Z$ & $Z$ -> $Y$ hold
	- What is a **Candidate Key (CK)**? #card
	  card-last-interval:: 3.45
	  card-repeats:: 2
	  card-ease-factor:: 2.46
	  card-next-schedule:: 2022-11-18T06:20:02.759Z
	  card-last-reviewed:: 2022-11-14T20:20:02.759Z
	  card-last-score:: 3
		- A **candidate key (CK)** is one or more attribute(s) in a relation with which you can determine all the attributes in the relation.
		- Every relation has one or more candidate keys.
		- We pick one such candidate key as the primary key of a relation.
	- # Inference Rules for FDs
		- Typically, the main obvious functional dependencies $F$ are specified for a schema.
			- However, many others can be inferred from $F$.
				- We call these the **closure** of $F$: $F^+$.
		- **1. Reflexive:** Trivially, an attribute, or a set of attributes, always determines itself.
		- **2. Augmentation:** If $X$ - $Y$, we can infer $XZ$ -> $YZ$.
		- **3. Transitive:** If $X$ -> $Y$ & $Y$ -> $Z$, we can infer $X$ -> $Z$.
		- **4. Decomposition:** If $X$ -> $YZ$, we can infer $X$ -> $Y$.
		- **5. Union (additive):** If $X$ -> $Y$ and $X$ -> $Z$, we can infer if $X$ -> $YZ$.
		- **6. Pseudotransitive:** If $X$ -> $Y$ and $WY$ -> $Z$, we can infer $WX$ -> $Z$.
			- Note: Rules 1,2, & 3 are collectively called **Armstrong's Axioms**.
- # Normal Forms
	- ## First Normal Form (1NF) #card
	  card-last-interval:: -1
	  card-repeats:: 1
	  card-ease-factor:: 2.5
	  card-next-schedule:: 2022-11-15T00:00:00.000Z
	  card-last-reviewed:: 2022-11-14T20:12:58.023Z
	  card-last-score:: 1
		- A table is in in **1NF** if the table ==does not have any repeating groups== (a group of attributes that occur a variable number of times in each record (non-atomic)).
		- To ensure first normal form, choose an appropriate primary key (if one is not already specified) and if required, split the table into two or more tables to remove repeating groups.
	- ## Second Normal Form (2NF) #card
	  card-last-interval:: -1
	  card-repeats:: 1
	  card-ease-factor:: 2.5
	  card-next-schedule:: 2022-11-15T00:00:00.000Z
	  card-last-reviewed:: 2022-11-14T20:15:09.806Z
	  card-last-score:: 1
		- A relation in **2NF** must be in 1NF and be such that where there is a composite primary key, all non-key attributes must be dependent on the *entire* primary key.
		- If partial dependencies exist, create new relations to split the attributes such that the partial dependency no longer holds.
	- ## Third Normal Form (3NF) #card
	  card-last-interval:: -1
	  card-repeats:: 1
	  card-ease-factor:: 2.5
	  card-next-schedule:: 2022-11-15T00:00:00.000Z
	  card-last-reviewed:: 2022-11-14T20:15:39.277Z
	  card-last-score:: 1
		- A relation is in **3NF** if it is in 3NF and there are no dependencies between attributes that are not primary keys.
			- That is, no transitive dependencies exist in the table.
		- ### Steps to Normalise to 3NF #card
		  card-last-interval:: -1
		  card-repeats:: 1
		  card-ease-factor:: 2.5
		  card-next-schedule:: 2022-11-15T00:00:00.000Z
		  card-last-reviewed:: 2022-11-14T20:17:36.919Z
		  card-last-score:: 1
			- 1. Identify an appropriate **Primary Key** if not already given.
				- This puts the table into **1NF**.
			- 2. Draw a diagram of **Functional Dependencies** from the primary key.
			  3. Identify if the dependencies are Full, Partial, or Transitive.
			  4. Using the diagram of the functional dependencies from the previous steps:
				- 5. Normalise to **2NF** by ^^removing **partial dependencies**^^ - creating new tables as a result. <ins>Ensure that all new tables have Primary Keys.</ins>
				  6. Normalise to **3NF** by ^^removing **transitive dependencies**^^ (if they exist), creating new tables as a result. <ins>Ensure that any new tables have Primary Keys and are in 2NF</ins>.
				  7. Check that all resulting tables are themselves in 1NF, 2NF, and 3NF (in particular, make sure that they all have PKs of their own).
	- ## Boyce-Codd Normal Form (BCNF) #card
	  card-last-interval:: -1
	  card-repeats:: 1
	  card-ease-factor:: 2.5
	  card-next-schedule:: 2022-11-24T00:00:00.000Z
	  card-last-reviewed:: 2022-11-23T12:16:35.097Z
	  card-last-score:: 1
		- Only in rare cases does a 3NF table not meet the requirements off **BCNF**.
			- These cases are when a table has more than one candidate key.
				- Depending on the functional dependencies, a 3NF table with two or more overlapping candidate keys may or may not be in BCNF.
		- If a table in 3NF **does not** have multiple overlapping candidate keys, then it is guaranteed to be in **BCNF**.
		-
-