Depending on the design of a database, the database
may contain redundant data. Such a database becomes:
1. Inefficient - the database engine will need to
process more data for each query or update
2. Bloated - storage requirements increase due to
the redundant data
3. Error-prone - someone needs to input the redundant
data into the database. The more times the same
data is input into the database,
the more chances there are for errors to occur.
In order to remove redundancy in database, normalization
is applied. Normalization is the process of eliminating
redundant data from database tables. There are 5 levels
of normalization - also termed as the 5 normal forms.
Most database designers stop at either levels 2 or
3. This is because although normalization reduces
data redundancy, it also results in increased complexity
which will cause a decrease in performance. This decrease
in performance is due to the requirement to join the
normalized tables in queries. Levels 4 and 5 of normalization
remains largely an academic field of study and is
not applied in industry.
The normal forms are progressive - e.g. a table that
has achieved the 3rd normal form is by definition
already in the 2nd and 1st normal forms.
The 1st Normal Form (1NF)
The 1st normal form
deals with data redundancy within each table record.
The 1st normal form attempts to rid each table row
from repeating data.
The 2nd Normal Form (2NF)
The 2nd normal form
deals with data redundancy that occurs vertically
across the different rows in a table. The 2nd normal
form attempts to rid each table column from repeating
data.
The 3rd Normal Form (3NF)
The 3rd normal form
requires that each column in a database table must
be:
1. dependent on the table's primary key
2. independent on all other non-key columns in the
table