Normalization – NF1, NF2, NF3. How to normalize your database?

分類:IT技術 時間:2017-03-26

What is normalization?

database normalization is the process of organizing the columns and tables of a relational database to minimize data redundancy.

And another definition:

Normalization splits a large table into smaller tables and define relationships between them to increases the clarity in organizing data.

In general I like to see tables that have one purpose.  Having the table serve many purposes introduces many of the challenges like: data duplication, data update issues, and increased effort to query data.

database table purpose

Single purpose and relations instead of mixed up table

Why normalize?

If you have big database that is not normalized, people working with it most likely will face lots of problems. Performance will be bad, there are high chances of having multiple rows for one object, then, some day one row will change, and then you have a conflict.

Long story short – successfully normalized databases will increase performance and reduce storage.

So this is our table, let’s say we need some kind of database for an agency.

 

EmployeeId Person Phone
1 John Jones 27716602, 26153215,
2 Peter Janes 47716602, 36153215, 52321262
3 Michael Juines 17716602

 

We are storing our employee id, the name and their phone numbers.

 

Common misconception: Our database structure heavily depends on the required functionality. For example. If we always use a full name of the person, it is not necessary to split the “Person” column into first name and last name columns.

About phone numbers, if we know, that there is a limit, for example: work phone, personal mobile phone and fax these three columns wouldn’t count as duplicative columns and would still pass first normal form requirements.

 

To clarify, let’s assume that we need this functionality for the finished product:

  • We are going to send personalized letters. (e.g. Hello John, how are you doing?)
  • People can only add their mobile phone number.
  • There is no limit of the mobile phone numbers they can add.

 

1NF

Firstnormal formsets the very basic rules for an organized database:

  • Each column is unique
  • Create separate tables for each group of related data and identify each row with a unique column or set of columns (the primary key).

Requirements for 1NF

 

  • Table must have at least one candidate key (that all attributes depend on)
  • Every column stores atomic values (example, if you use full name all the time it makes sense to store full name, if you will use first name and last name separately – you should store name and last name separately)
  • Not a repeating group (multiple phone numbers in one row) – row that stores multiple similar values instead of one

 

EmployeeId Person Phone
1 John Jones 27716602
1 John Jones 26153215
2 Peter Janes 47716602
2 Peter Janes 36153215
2 Peter Janes 52321262
3 Michael Juines 17716602

 

Now first normal form is satisfied, as the columns in this table hold just one value.

 

2 Normal Form

Second normal form (2NF) further addresses the concept of removing duplicative data:

  • Meet all the requirements of the first normal form.
  • Remove subsets of data that apply to multiple rows of a table and place them in separate tables.
  • Create relationships between these new tables and their predecessors through the use of foreign keys

Requirements for 2NF

  • Must meet all 1NF requirements
  • Non-key attributes must not depend on subset of any candidate key.

sql table example

As we can see in this example, we have a new table to store all the phones, and another table to determine who owns which phone number.

 

Employees

EmployeeId Name Last
1 John Jones
2 Peter Janes
3 Michael Juines

 

Phones

PhoneId Phone
1 27716602
2 26153215
3 47716602
4 36153215
5 52321262
6 17716602

 

This means EmployeeId and PhoneId are both Primary keys of the table? Not exactly, they are composite primary keys referring to entries in other two tables.

SQL code to create that table would look like this.

Transact-SQL

Create table EmployeesPhones (    CustomerID int references Employees(EmployeeId) not null,    ProductID int references Phones(PhoneId) not null    Primary key (CustomerID, ProductID) )

1

2

3

4

5

6

Create table EmployeesPhones

(

   CustomerID int references Employees(EmployeeId) not null,

   ProductID int references Phones(PhoneId) not null

   Primary key (CustomerID, ProductID)

)

 

 

EmployeesPhones

EmployeeId PhoneId
1 1
1 2
2 3
2 4
2 5
3 6

 

Third Normal Form (3NF)

Third normal form (3NF) goes one large step further:

 

  • Meet all the requirements of the second normal form.
  • Remove columns that are not dependent upon the primary key.

 

For third normal form, I adjusted the table a little bit. I added Zip code and city columns.

 

We can still save a little bit of space, ZIP code determines a city, meaning that many people can live in the same city and have the same ZIP code.

 

We find that in the above Employees table, EmployeeId is the key and only prime key attribute. We find that City can be identified by EmployeeId as well as Zip itself. Neither Zip is a superkey nor is City a prime attribute. In this case –   transitive dependency exists.

 

Employees

EmployeeId Name Last ZIP City
1 John Jones 8000 Aarhus C
2 Peter Janes 8200 Aarhus N
3 Michael Juines 8520 Lystrup

 

To bring this table into third normal form, we break the table into two tables.

Employees

EmployeeId Name Last ZIP
1 John Jones 8000
2 Peter Janes 8200
3 Michael Juines 8520

Zip

Name? City? Zip?
John Aarhus C 8000
Rick Aarhus N 8200
Coral Lystrup 8520
Carl Aarhus C 8000
Maggie Lystrup 8520
Daryl Aarhus N 8200
Michonne Aarhus N 8200

Tags: definition databases multiple increase purposes

文章來源:


ads
ads

相關文章
ads

相關文章

ad