The Concept of Database Normalization

Imagine retrieving data from a huge database and all you find are errors, this is a sign to show that the data is messed up, to avoid that, let’s grasp a simple concept of Database Normalization.

We shall discuss Normalization, the problems that result without Normalization, how to resolve this issue using the first 3 forms of Normalization.

First, let’s get to know what normalization means.

What is Normalization?

Normalization is a technique of organizing data into multiple related tables to minimize data redundancy.

What is Data redundancy?

Repetition of similar data at multiple places.

Why would you reduce it?

  • Reduce the size of the database.
  • To avoid the 3 main problems i .e Insertion problems, Deletion problems and Updation problems.

Now that we know what normalization is, let’s look at the main 3 problems that result without Normalization.

The 3 main problems caused by Data Redundancy

  1. Insertion Anomaly
  2. Deletion Anomaly
  3. Updation Anomaly


To clarify these 3 problems, let’s take a look at this table.

Student NoStudent NameCourseOffice_telHOD
1SharonCSC039112440Eng. Bainomugisha
2FaithCSC039112440Eng. Bainomugisha
3PriscaCSC039112440Eng. Bainomugisha

Insertion Anomaly


To insert redundant data for every new row(Student Name from the above table) leads to insertion problems/anomalies.

Reason for repetition

  • Two different but related data are stored in the same table.
  • Deleting student information leads to deleting course information
  • Course information is deleted along with Student data which leads to Deletion problems/Anomaly.

Deletion Anomaly

Loss of related data-set when some other data-set is deleted.

Updation Anomaly

Assuming Eng. Bainomuisha leaves and Mr X joins as the new HOD for CSC, as a result, the student administrator has to update all the student records with the new name of HOD(Mr. X). This leads to Updation Anomaly because the system administrator has to update a lot of data.

Imagine only one row is left out while updating the data, this will lead to data inconsistency. Hence Data Redundancy leads to Data Inconsistency.

Finally, here is how Normalization will resolve these problems.

How will Normalization solve these problems?


Normalization will break the existing student tables into 2 tables. Student table and Course table.

The student table will save the data of the student and the Course Table will save course data

database normalization

Qn.Isn’t Course Name still repeating in the tables?

Note that Normalization is not about Eliminating data redundancy but Minimizing data redundancy.

Our new tables now look like this.

Student Table
Student NoStudent NameCourse
1SharonCSC
2FaithCSC
3PriscaCSC
Course Table
CourseOffice_telHOD
CSC039112440Eng. Bainomugisha
CSC039112440Eng. Bainomugisha
CSC039112440Eng. Bainomugisha
  • To insert new data, we only enter the Student No, Student Name and Course Name and . Course information is already stored in the Course table, we don’t have to update or insert it again.
  • If you want to store new student information, you can delete all the student’s records and you still have course information not deleted.
  • To update the HOD and Office tel, we just do it in one place.

Types of Normalization

1st Normal Form 1NF-Step 1 of the Normalization

Design your table in such a way that can be extended.

Every Database should be at least in 1NF or else it is taken to be a bad Database Design.

How to achieve 1NF, 4 basic rules.

  • Firstly, each column in the table should contain single values, not multiple values.
  • Secondly, each column should contain a value of the same type.
  • Thirdly, each column should have a unique name- to avoid confusion in the time of retrieving data.
  • In addition, the order in which data is saved doesn’t matter.
Example1. Rule No.1 is being violated in column 3
Student No.Student NameSubject
101SharonDBD, PHP
102FaithDBD
103PriscaPHP, JS
INF-Resolve the issue
Student No.Student NameSubject
101SharonPHP
102FaithDBD
103PriscaPHP
101SharonPHP
103PriscaJS

2nd Normal Form, 2NF-Step2 of Normalizaion

2 Conditions must be fulfilled

  1. Firstly, the table must be in 1NF
  2. Secondly, it should not have any Partial Dependencies
What is Partial Dependency?

Let’s have a look at this table.

Student No.Student Namereg_noCourseAddress
1Sharon101CSCKLA
2Sharon221CSCKLA
3Faith 214SEMUK
4Prisca169ITJJ
5Cynthia441EEKR

Assuming you have the same student names, you can easily fetch data using the student_id which could be the primary key.

A primary key is a column/group of columns that can uniquely identify each row in a table.

For example Course Name of a student with id 5- EE

Name of the student with id 5- Cynthia

Imagine we have 2 more tables, a subject table to store subjects and a score table to store marks for the students. This will result to;

score_idsubject_idstudent_idmarksteacher_name
1110195Mr. Js
2210182Mr. Js
3321478Mr.PHP
4421480Mr.PHP
5521476Mr. DBD

Qn. Get marks of student with student id 101

You don’t know for which subject and if given a subject id, you don’t know for which student hence we need student_id and subject_id together to uniquely identify any row of data in the SCORE table because we have a many to many relationships meaning:

  • One student can opt for more than one subject.
  • One subject can be opted for by more than one student.

The teacher’s name only depends on the subject. It has nothing to do with student_id. This is partial dependency. To eliminate partial dependency, we have to remove teacher_name from the score table.

You can move the teacher_name to the subject table.

You can also create another table called teacher table with teacher_id and teacher_name and use the teacher_id whenever u want.

subject_idstudent_nameTeacher
1JSMr. JS
2PHPMr. PHP
3DBDM. DBD
4JSMr. JS
teacher_idteacher_name
1 Mr. JS
2 Mr. PHP
3 M. DBD

3rd Normal Form 3NF

Conditions for 3NF

  • Firstly, the table should be in 2nd Normal form
  • Secondly, the table should have a transitive dependency.
Adding exam name and total marks to the score table
score_idsubject_idstudent_idmarksexam_nametotal_marks
1110190Baino95
2210182Baino85
3321478Grace88
4421480Grace78
5521468Eng76


Subject_id and student_id are primary keys so other columns depend on it except Total marks which depend on exam name but exam name is not part of our primary keys. This is a transitive dependency

Solution.

Get exam_name and total_marks, put them in an exam table and use exam_id whenever required.

exam_nametotal_marks
DBD19O
DBD285
PHP188
PHP278
JS68

Score table in 3NF

score_idsubject_idstudent_idmarksexam_name
1110195DBD1
2210182DBD2
3321478PHP1
4421480PHP2
5521476JS

We’ve made it!!!

In conclusion, 1st Normal Form, 2nd Normal Form and 3rd Normal Form are the three ways that you can prevent issues that come along with data redundancy.

Share this article

Ready to level up? Tell Us About Your Project.

Contact Us