12 KiB
12 KiB
- #CT230 - Database Systems I
- Previous Topic: Joins & Union Queries
- Next Topic: Query Processing: Relational Algebra
- Relevant Slides:
-
Normalisation
- What is normalisation? #card
card-last-interval:: 0.95
card-repeats:: 2
card-ease-factor:: 2.36
card-next-schedule:: 2022-11-18T17:36:23.333Z
card-last-reviewed:: 2022-11-17T19:36:23.333Z
card-last-score:: 3
- Normalisation takes each table through a series of tests to "verify" whether or not it belongs to a certain normal form.
- Normal forms to check:
- 1^{st}, 2^{nd}, & 3^{rd} normal forms (NF).
- Boyce-Codd normal form - strong 3NF.
- 4^{th} & 5^{th} Normal Forms.
-
Normalisation Provides:
-
- A formal framework for analysing relation schemas based on keys & functional dependencies among attributes.
- A series of tests so that a database can be normalised to any degree (e.g., from 1NF to 5NF).
- However, normalisation does not necessarily provide a good design if considered in isolation to everything else.
-
- What are normalisation rules? #card
card-last-interval:: -1
card-repeats:: 1
card-ease-factor:: 2.5
card-next-schedule:: 2022-11-24T00:00:00.000Z
card-last-reviewed:: 2022-11-23T12:19:46.203Z
card-last-score:: 1
- Normalisation rules gives us a formal measure of why one grouping of attributes in a relation schema may be better than the other.
- Why normalise? #card
card-last-interval:: 0.95
card-repeats:: 2
card-ease-factor:: 2.36
card-next-schedule:: 2022-11-22T11:06:25.159Z
card-last-reviewed:: 2022-11-21T13:06:25.160Z
card-last-score:: 3
-
- Redundancy will be reduced or eliminated, reducing storage space as a result.
- The task of maintaining data integrity is made easier.
- However, with normalisation, tables are usually added to the schema and are linked with foreign keys, which causes queries to become more complex as the often require data from multiple tables (requiring joins or subqueries).
-
-
Normalised & Un-Normalised Databases
- Both normalised & un-normalised databases have advantages & disadvantages.
- If a data base is normalised:
- No (or very little) redundancy.
- No anomalies when inserting, deleting, or modifying data.
- More tables.
- More foreign & primary keys to link tables.
- More complex queries.
-
Redundancy
- What is redundancy? #card
card-last-interval:: 4
card-repeats:: 2
card-ease-factor:: 2.7
card-next-schedule:: 2022-11-18T20:19:53.452Z
card-last-reviewed:: 2022-11-14T20:19:53.453Z
card-last-score:: 5
- Redundancy is the unnecessary duplication of data in a database.
- Consequences of redundancy:
- Space is wasted.
- Data can become inconsistent due to potential problems with update, insert, & delete operations.
- What is duplication? #card
card-last-interval:: 2.97
card-repeats:: 2
card-ease-factor:: 2.6
card-next-schedule:: 2022-11-17T19:17:11.015Z
card-last-reviewed:: 2022-11-14T20:17:11.015Z
card-last-score:: 5
- Duplicated data can naturally be present in a database and is not necessarily redundant.
- For example, an attribute can have two identical values.
- In the company database
ESSN
inworks_on
may be duplicated across many projects.
- In the company database
- Data is duplicated rather than redundant if information is lost when deleting data
- What is redundancy? #card
card-last-interval:: 4
card-repeats:: 2
card-ease-factor:: 2.7
card-next-schedule:: 2022-11-18T20:19:53.452Z
card-last-reviewed:: 2022-11-14T20:19:53.453Z
card-last-score:: 5
-
Alternatives to Normalisation #card
card-last-interval:: -1 card-repeats:: 1 card-ease-factor:: 2.6 card-next-schedule:: 2022-11-15T00:00:00.000Z card-last-reviewed:: 2022-11-14T20:19:50.503Z card-last-score:: 1- The alternative to normalisation is to retain redundant data and maintain data integrity by means of code consistency checks.
- In some applications, the number of insertions may be very small or non-existent and in such cases, the overhead of normalised tables is generally not required.
-
De-Normalisation
- What is de-normalisation? #card
card-last-interval:: -1
card-repeats:: 1
card-ease-factor:: 2.5
card-next-schedule:: 2022-11-19T00:00:00.000Z
card-last-reviewed:: 2022-11-18T18:34:49.533Z
card-last-score:: 1
- De-normalisation is a process of making compromises to the normalised tables by ^^introducing intentional redundancy^^ for performance reasons (specifically, querying performance).
- Typically, de-normalisation will improve query times at the expense of data updates (insert, delete, update).
- What is de-normalisation? #card
card-last-interval:: -1
card-repeats:: 1
card-ease-factor:: 2.5
card-next-schedule:: 2022-11-19T00:00:00.000Z
card-last-reviewed:: 2022-11-18T18:34:49.533Z
card-last-score:: 1
- What is normalisation? #card
card-last-interval:: 0.95
card-repeats:: 2
card-ease-factor:: 2.36
card-next-schedule:: 2022-11-18T17:36:23.333Z
card-last-reviewed:: 2022-11-17T19:36:23.333Z
card-last-score:: 3
-
Functional Dependencies
- What is Functional Dependency? #card
card-last-interval:: -1
card-repeats:: 1
card-ease-factor:: 2.5
card-next-schedule:: 2022-11-22T00:00:00.000Z
card-last-reviewed:: 2022-11-21T13:06:48.090Z
card-last-score:: 1
- If
A
&B
are attributes of a relationR
, thenB
is functionally dependent (FD) onA
if each value ofA
is associated with exactly one value ofB
.- i.e., values in
B
are uniquely determined by values ofA
.
- i.e., values in
- Functional Dependency is one of the main concepts associated with normalisation.
- It describes the ^^relationship between attributes.^^
A
->B
:A
->B
does not necessarily implyB
->A
.A
<->B
denotesA
->B
&B
->A
.A
->\{B,C\}
denotesA
->B
&A
->C
.\{A,B\}
->C
denotes that it is the combination ofA
&B
that uniquely determinesC
.
- If
-
Note on FDs
- A functional dependency is a property of a relation schema
R
and cannot be inferred automatically. Instead, it must be defined explicitly by someone who knows the semantics ofR
. - You will either be explicitly given all FDs, or given enough information about the attributes & the domain to reasonably infer the FDs (perhaps having to make assumptions).
- A functional dependency is a property of a relation schema
-
Types of FDs #card
card-last-interval:: -1 card-repeats:: 1 card-ease-factor:: 2.5 card-next-schedule:: 2022-11-18T00:00:00.000Z card-last-reviewed:: 2022-11-17T20:19:30.192Z card-last-score:: 1-
Full Functional Dependency
- What is a Full Functional Dependency? #card
card-last-interval:: -1
card-repeats:: 1
card-ease-factor:: 2.5
card-next-schedule:: 2022-11-18T00:00:00.000Z
card-last-reviewed:: 2022-11-17T20:15:37.929Z
card-last-score:: 1
- A functional dependency
\{X, Y\}
->Z
is a full functional dependency if when some attribute (eitherX
orY
) is removed from the left-hand side, the dependency ^^does not hold.^^- There may be any number of attributes on the LHS.
- A functional dependency
- What is a Full Functional Dependency? #card
card-last-interval:: -1
card-repeats:: 1
card-ease-factor:: 2.5
card-next-schedule:: 2022-11-18T00:00:00.000Z
card-last-reviewed:: 2022-11-17T20:15:37.929Z
card-last-score:: 1
-
Partial Functional Dependency
- What is a Partial Functional Dependency? #card
card-last-interval:: -1
card-repeats:: 1
card-ease-factor:: 2.5
card-next-schedule:: 2022-11-15T00:00:00.000Z
card-last-reviewed:: 2022-11-14T20:15:25.672Z
card-last-score:: 1
- A functional dependency
\{X, Y\}
->Z
is a partial functional dependency if some attribute (eitherX
orY
) can be removed from the LHS and the dependency ^^still holds.^^- There may be any number of attributes on the LHS.
- A functional dependency
- What is a Partial Functional Dependency? #card
card-last-interval:: -1
card-repeats:: 1
card-ease-factor:: 2.5
card-next-schedule:: 2022-11-15T00:00:00.000Z
card-last-reviewed:: 2022-11-14T20:15:25.672Z
card-last-score:: 1
-
Transitive Functional Dependency
- What is a Transitive Functional Dependency? #card
card-last-interval:: -1
card-repeats:: 1
card-ease-factor:: 2.5
card-next-schedule:: 2022-11-23T00:00:00.000Z
card-last-reviewed:: 2022-11-22T13:40:05.155Z
card-last-score:: 1
- A functional dependency
X
->Y
is a transitive functional dependency in the relationR
if there is a set of attributesZ
that is neither a candidate key nor a subset of any key ofR
, and bothX
->Z
&Z
->Y
hold
- A functional dependency
- What is a Transitive Functional Dependency? #card
card-last-interval:: -1
card-repeats:: 1
card-ease-factor:: 2.5
card-next-schedule:: 2022-11-23T00:00:00.000Z
card-last-reviewed:: 2022-11-22T13:40:05.155Z
card-last-score:: 1
-
- What is a Candidate Key (CK)? #card
card-last-interval:: 3.45
card-repeats:: 2
card-ease-factor:: 2.46
card-next-schedule:: 2022-11-18T06:20:02.759Z
card-last-reviewed:: 2022-11-14T20:20:02.759Z
card-last-score:: 3
- A candidate key (CK) is one or more attribute(s) in a relation with which you can determine all the attributes in the relation.
- Every relation has one or more candidate keys.
- We pick one such candidate key as the primary key of a relation.
-
Inference Rules for FDs
- Typically, the main obvious functional dependencies
F
are specified for a schema.- However, many others can be inferred from
F
.- We call these the closure of
F
:F^+
.
- We call these the closure of
- However, many others can be inferred from
- 1. Reflexive: Trivially, an attribute, or a set of attributes, always determines itself.
- 2. Augmentation: If
X
-Y
, we can inferXZ
->YZ
. - 3. Transitive: If
X
->Y
&Y
->Z
, we can inferX
->Z
. - 4. Decomposition: If
X
->YZ
, we can inferX
->Y
. - 5. Union (additive): If
X
->Y
andX
->Z
, we can infer ifX
->YZ
. - 6. Pseudotransitive: If
X
->Y
andWY
->Z
, we can inferWX
->Z
.- Note: Rules 1,2, & 3 are collectively called Armstrong's Axioms.
- Typically, the main obvious functional dependencies
- What is Functional Dependency? #card
card-last-interval:: -1
card-repeats:: 1
card-ease-factor:: 2.5
card-next-schedule:: 2022-11-22T00:00:00.000Z
card-last-reviewed:: 2022-11-21T13:06:48.090Z
card-last-score:: 1
-
Normal Forms
-
First Normal Form (1NF) #card
card-last-interval:: -1 card-repeats:: 1 card-ease-factor:: 2.5 card-next-schedule:: 2022-11-15T00:00:00.000Z card-last-reviewed:: 2022-11-14T20:12:58.023Z card-last-score:: 1- A table is in in 1NF if the table ==does not have any repeating groups== (a group of attributes that occur a variable number of times in each record (non-atomic)).
- To ensure first normal form, choose an appropriate primary key (if one is not already specified) and if required, split the table into two or more tables to remove repeating groups.
-
Second Normal Form (2NF) #card
card-last-interval:: -1 card-repeats:: 1 card-ease-factor:: 2.5 card-next-schedule:: 2022-11-15T00:00:00.000Z card-last-reviewed:: 2022-11-14T20:15:09.806Z card-last-score:: 1- A relation in 2NF must be in 1NF and be such that where there is a composite primary key, all non-key attributes must be dependent on the entire primary key.
- If partial dependencies exist, create new relations to split the attributes such that the partial dependency no longer holds.
-
Third Normal Form (3NF) #card
card-last-interval:: -1 card-repeats:: 1 card-ease-factor:: 2.5 card-next-schedule:: 2022-11-15T00:00:00.000Z card-last-reviewed:: 2022-11-14T20:15:39.277Z card-last-score:: 1- A relation is in 3NF if it is in 3NF and there are no dependencies between attributes that are not primary keys.
- That is, no transitive dependencies exist in the table.
-
Steps to Normalise to 3NF #card
card-last-interval:: -1 card-repeats:: 1 card-ease-factor:: 2.5 card-next-schedule:: 2022-11-15T00:00:00.000Z card-last-reviewed:: 2022-11-14T20:17:36.919Z card-last-score:: 1-
- Identify an appropriate Primary Key if not already given.
- This puts the table into 1NF.
-
- Draw a diagram of Functional Dependencies from the primary key.
- Identify if the dependencies are Full, Partial, or Transitive.
- Using the diagram of the functional dependencies from the previous steps:
-
- Normalise to 2NF by ^^removing partial dependencies^^ - creating new tables as a result. Ensure that all new tables have Primary Keys.
- Normalise to 3NF by ^^removing transitive dependencies^^ (if they exist), creating new tables as a result. Ensure that any new tables have Primary Keys and are in 2NF.
- Check that all resulting tables are themselves in 1NF, 2NF, and 3NF (in particular, make sure that they all have PKs of their own).
-
- A relation is in 3NF if it is in 3NF and there are no dependencies between attributes that are not primary keys.
-
Boyce-Codd Normal Form (BCNF) #card
card-last-interval:: -1 card-repeats:: 1 card-ease-factor:: 2.5 card-next-schedule:: 2022-11-24T00:00:00.000Z card-last-reviewed:: 2022-11-23T12:16:35.097Z card-last-score:: 1- Only in rare cases does a 3NF table not meet the requirements off BCNF.
- These cases are when a table has more than one candidate key.
- Depending on the functional dependencies, a 3NF table with two or more overlapping candidate keys may or may not be in BCNF.
- These cases are when a table has more than one candidate key.
- If a table in 3NF does not have multiple overlapping candidate keys, then it is guaranteed to be in BCNF.
- Only in rare cases does a 3NF table not meet the requirements off BCNF.
-