1. Tables Overview
2. Data Integrity Overview
2.1. Entity Integrity
2.2. Domain Integrity
2.3. Referential Integrity
3. Constraints Overview
4. Normalization
1. Tables Overview
Tables are database objects that represent real world entities
and contain all the data related to the entities they represent.
A table is made up of columns and rows. In tables, data is organized
in a row-and-column format similar to a spreadsheet. For example,
a table containing customer data can contain a row for each
customer and columns representing the customers detailed information
such as his name, address, country, and phone number.
2. Data Integrity
Enforcing Data Integrity ensures the quality of a table. A table
is said to have low integrity if any given row in a table is
not uniquely identified. In other words, if a table contains
a customer named ABCD, the system should not accept another
customer with the same name, ABCD. Data Integrity is divided
into three categories:
1- Entity Integrity: Entity Integrity
or Row Integrity defines a row as a unique entity for a particular
table. For example, a table that contains customer data will
have one and only one row for every customer. This is done by
giving every row a unique identifier column(s) called a primary
key.
2- Domain Integrity: Domain Integrity
or Column Integrity maintains the validity of entries for a
given column. For example, if a table contains a country column
for each customer you can enforce domain integrity by restricting
the type of the column through data types (i.e. whether the
column should allow names or numbers), the format through CHECK
constraints and rules, or the range of possible values (through
FOREIGN KEY constraints, CHECK constraints, DEFAULT definitions,
NOT NULL definitions, and rules). Therefore, by using domain
integrity, you can rest assured that an age column (that is
supposed to contain a number) will never contain a countries
name.
3- Referential Integrity: When more
than one type of information is included in a table, i.e. employees
data and departments data, it is recommended that the integrity
be maintained for all “foreign data” (i.e. data
belonging to another table). This is accomplished by using referential
integrity which preserves the relationship between the two tables.
To put it in easier terms, foreign data, or a copy of a subset
of another tables data that does not belong to a table but has
to be included because of a necessity should always be the same
as it’s original. For example, an employees table contains
employee information and a departments table contains department
information. Every employee belongs to a department and every
department has its own set of attributes (columns) that store
information about each department such as the department name,
department head, the department type, and so on. Instead of
having all the columns in the departments table copied into
the employees table in order to know what department a given
employee belongs to, it is recommended that one includes only
one column called a “foreign key” column that references
the “primary key” column of the referenced table
or a unique column. The following diagram is a visual example
of what I’ve just explained:
3. Constraints Overview
In order to able to efficiently design a table complete with
its entity, domain, and referential integrity, one has to have
some knowledge about constraints. Constraints are the programming
elements used to enforce such integrity. There are six types
of constraints in Microsoft SQL Server and they are as follows:
1- Primary Key constraints: a table usually has one
or more columns whose values uniquely identify each row in a
table. This column or columns is referred to as the “primary
key” of a table and enforces the entity integrity of the
table. A table can have only one primary key constraint, and
a column or columns that participates in the primary key cannot
accept null values. If a primary key constraint is defined on
more than one column (referred to as a composite key), values
may be duplicated within one column, but each combination of
values from all the columns in the primary key constraint definition
must be unique. Figure 1.1. shows an example of a primary key.
2- Foreign Key constraints: a foreign key is a column
or combination of columns used to establish and enforce a link
between the data in two tables. This is accomplished by adding
the column or columns that hold one table's primary key values
to the other table. This column in the second table is then
referred to as a “foreign key”. Although the primary
purpose of a foreign key is to control the values stored in
the foreign key table, it also controls changes to data in the
primary key table. For example, all employees that work in a
company are assigned to departments and no employee can be assigned
to a non-existent department. On the other hand, no department
that has an employee assigned to it can be deleted from the
departments table. If the sole purpose of the foreign key would
be to control values stored in the foreign key table, some one
would be able to delete a referenced row and hence break the
link and the referential integrity.
3- Unique constraints: if every employee in the employees
table has a Social Security Number, one would want to ensure
that no two Social Security Numbers are the same throughout
the rows. Since it is not appropriate to use a primary key in
order to enforce such uniqueness, a unique constraint can be
defined on the Social Security Number column and hence prevent
duplicate values from being entered into the Social Security
Number column.
4- Check constraints: a check constraint is used to
enforce domain integrity (i.e. column integrity) by limiting
the values that are accepted by a column. For example, a table
that stores membership information for Junior Members in a sports
club might contain an age field. A member is not accepted as
a “junior” if his age is younger than eight or older
than sixteen. In order to restrict the age values being entered
in the age column, you would want to specify a check constraint
to enforce that rule. You can create a check constraint with
any logical expression that returns TRUE or FALSE. The logical
expression for the previous example would be:
Age > = 8 and Age <= 16
5- Default constraints: every column in a table should
contain a value. There are certain situations where a column’s
value can be automatically discerned from the values entered
in another column. For example, if a table contains a birth
date column and an age column, the date entered in the birth
date column subtracted from the current date would be the age
of the person. Hence, a default constraint is defined on the
age column and programmed with the logical expression that would
produce the person’s age.
6- NULL or NOT NULL constraints: people typically confuse
NULL with zero, blank, or a zero length character (i.e. “”);
NULL means no entry has been made. The presence of NULL often
defines that there either is no value or that the value is not
known. For example, is the price column of the products table
contains null, that doesn’t mean the product doesn’t
have a price, it means the price hasn’t been entered.
In order to ensure that no product will be added to the database
unless its price is entered, you can specify a NOT NULL constraint
on the price column. A column defined with the NULL constraint
will allow a NULL entry from the user whereas a column defined
with a NOT NULL constraint will reject NULL entries from the
user. By carefully using NULL and NOT NULL constraints, you
can maintain a very high level of data integrity.
4. Normalization
In the previous sections we looked at entities and data integrity.
The following section is about putting the picture together.
As we discussed in the previous sections, a database usually
consists of one or more tables representing different entities.
A database is said to be normalized if it complies with a set
of formal rules, referred to as normal forms. These formal rules
define how data that represents different entities should be
organized in a database.
The figure above shows a table (the only table in the database)
that is not normalized. In order to normalize this table and
hence the database, you have to go through three steps or rules.
1- First Normal Form: the first step is to make the
table compliant with the first normal form. A table
complies with the first normal form if no columns define similar
attributes and if no column contains multiple values. Let’s
take each condition one at a time.
a. No columns define similar attributes:
in this scenario, the Customer Purchases Table includes
three Address columns (i.e. Address1, Address2, and Address3)
which define similar attributes (i.e. the customers address).
By simply altering the three column’s names to Address,
City, and Country respectively, would make the table compliant
with the first normal form’s first rule.
b. No column contains multiple values: in this
scenario, the Customer Name column in the Customer Purchases
Table contains multiple values. Multiple values do not mean
text separated by spaces. For example, with this current
design, if the company’s regional sales officer where
to ask you to produce a year-to-date report of all customer
purchases sorted out by the customer’s last name,
one would have to query the column then separate the values
and finally sort out the results by the last name. This
process not only requires a few more lines of unnecessary
code, it also causes a substantial deterioration to the
speed of the query and hence blocks up resources and increases
the network traffic. So, by splitting up the whole customer’s
full name into three separate columns (i.e. First Name,
Middle Name, and Last Name), you reduce the amount of effort
needed to produce your report and the amount of traffic
and blocked resources on the network. Hence, by doing that,
we suffice the second rule in the first normal form
and are ready to move on to the second normal form.
The following diagram shows the Customer Purchases Table
normalized to the first normal form.

2- Second Normal Form: the second
step in normalization is to make the table(s) compliant with
the second normal form. A table complies with the
second normal form if it complies with the first normal
form, and if each column that is not part of the primary
key depends on all of the columns that are covered by the
primary key in the table and not on a subset of the columns
that are covered by the primary key. Let’s look at each
one at a time.
a. Primary Key: the first sub
rule in the second normal form is that every table
includes a column that uniquely identifies each row within
the table. It is highly recommended that this column be
an identity column. An identity column is an auto-incremented
number that is assigned automatically to every new row inserted
into the table, hence preventing any user typo and increasing
the efficiency and speed of an indexed table. It is recommended
that the primary key column be named after the entity’s
name followed by “ID” or “Number”,
thus maintaining consistency throughout the database design.
b. Each column that is not part of the primary key depends
on all columns that are covered by the primary key in a
table and not a subset of it: this second sub rule
is one of the most important rules in normalization and
is the rule responsible for splitting up a big confusing
chuck of data so that it becomes more but smaller tables
that represent specific things. In this scenario, the table
contains three different pieces of data (i.e. customer data,
product data, and purchase data). Since the purchase details
(i.e. quantity, discount) depend on the purchase, the product
details (i.e. product name and price) depend on the product,
and the customer details (i.e. address, city, and country)
depend on the customer, we have columns that depend on other
columns other than the primary key. In order to meet the
rule’s requirement, we are forced to break up the
big chunk into several tables that each represents only
one entity. By doing so, we are left with three tables;
the Customers table, the Products table, and the Purchases
table similar to the following table diagram:

Since a purchase combines both a customer
and a product being purchased, the purchases table must
include a reference (i.e. foreign key, as we discussed in
previous sections) to the customers table and the products
table. This referential link is shown above in Figure 1.4
3- Third Normal Form: the last rule
in normalization, the third normal form, puts in
the final touches to the design. A database complies with
the third normal form if it complies with the second
normal form and if, in each table, columns that are not
covered by the primary key do not depend on each other. In
this scenario, the quantity and discount columns in the Purchases
table depend on the ProductID rather than the PurchaseID.
In order to meet the requirement, we have to split up the
purchases table’s information so that we have a separate
table for purchase information only, and another for the purchase
details. The new table PurchaseDetails is usually referred
to as a junction table. A junction table is a table
that has 2 one-to-many relationships with the two tables it
junctions, hence creating a many-to-many relationship. The
junction table’s primary key is a combination of the
two foreign columns combines in order to make the junction.
This combined primary key is referred to as a composite
key. In this scenario, a purchase can contain more than
one product, and a product can be included in more than one
purchase. By defining a composite key on the ProductID and
PurchaseID column, we have a many-to-many relationship compliant
with the third normal form. Since we haven’t
yet reached relationships, this will be covered in later TechNotes.
The following diagram shows the table structure of what the
database would look like after we normalized it to the third
normal form.

|