Concept: Normalization
Normalization is the process of reducing a complex data structure into its simplest, most stable structure to minimize redundancy.
Relationships
Related Elements
Main Description

Introduction

This concept document provides a brief discussion of the topic of data normalization as it applies to the development of the Artifact: Data Model. It does not provide a full treatment of normalization, because the subject is quite broad and has been documented in many texts on database design. In [NBG01], normalization is defined as "an analytic technique used to produce a correct relational database design." In practice, normalization is a procedure for eliminating redundancy in the Data Model by means of applying restrictive rules. Elimination of data redundancy in the tables of the Data Model helps enforce referential integrity of the data in the database. 

Normalization is usually performed on the Data Model after an initial version of the tables and their relationships has been developed in the model. The exact timing of when to apply normalization depends on the specific project situation and is up to the Database Designer. The normalization process is applied to the tables in the Data Model in series of steps in which each step applies rules that are stricter than the last. 

Levels of Normalization

Normalization is hierarchically classified into numeric forms, with the most common being first, second, and third normal form. Each level of normalization is more restrictive than the previous. The first three hierarchical levels of normalization are:

  • First Normal Form-Repeating groups of data columns in tables have been eliminated such that data is organized into atomic units.
  • Second Normal Form-Data is in first normal form, and redundancy on primary key fields has been eliminated such that column values are wholly dependent on the primary key field.
  • Third Normal Form-Data is in second normal form, and each column is not dependent on any other non-key column.

Other levels of normalization are possible but are not covered in this discussion.  Information on additional levels of normalization can be found in [DAT99]. The exact level of normalization to apply to the Data Model is a decision that the database designer must make based on the specifics of the project situation.