Courses/Computer Science/CPSC 203/CPSC 203 Template/Lecture Template/Lecture 13

Required Reading
as per Required Reading for Lecture 10

Introduction
Last class we examined "Relational Thinking" and the process of creating analyses via sequences of queries.

We end the database section with an exploration of Relational Database Design. Relational Database Design could be said to integrate the various components of "Relational Thinking" from last lecture

At the end of today's lecture you will have:
 * Been Introduced to some Principles for Relational Database Design
 * Worked through the Design of a few simple databases
 * Been Introduced to the idea of Database Normalization

Glossary

 * Functional dependency -- A functional dependency (FD) exits between two columns  when values in one column are associated with values in another column. The kinds of functional dependencies we "want" in a database, is for all columns in a table to depend directly on the primary key. We try and avoid all other kinds of functional dependencies amongs columns.

Database Design Goals
Database Design is an iterative process. rarely do we get to design a database 'from scratch'. Often some data already exists -- often in spreadsheets and text files. Our job is to figure out how to turn it into a well structured relational database that meets the following Design Goals :
 * 1) The "Entities" -- (things in the real world) are clear.
 * 2) The Relationships between entities are maintained.
 * 3) Storage is efficient (i.e. we want to take up the least space in the computer).
 * 4) Redundancy is minimized (i.e. the same information is not showing up in several places).
 * 5) No data is lost (i.e. from the relational database -- we can re-create the original data set).

Database Design Principles
Fabian Pascal in "Practical Issues in Database Management" has aptly summarized the essence of database design with respect to normalization as, "The Key, the Whole Key and Nothing but the Key".

Recognizing "base sets" in a database, and tying them to unique keys takes one far along the path of database design. We will not cover "Normalization" proper in this course, but will sample from Normalization theory to offer the following general guidelines in developing a database as a series of related tables:


 * 1) Identify Base Sets (the distinct entities you are dealing with). Call these base sets your entities.
 * 2) Identify the unique "Primary"  key (stands in for the members of the base set)of each entity.
 * 3) Identify Primary-Key/ForeignKey relationships amongst entities. These relatinships allow you to join entities together.
 * 4) Make sure all non-primary-key attributes in an entity depend on the unique Primary key, and only the primary key. Or in other words: "The Key, the Whole Key and Nothing but the Key".

Finally -- database design is a human activity. It has elements of science (see normalization below). It also has elements of art. Two competent designers could design databases for a system with similar functionality, but where one may be seen by users as more elegant than the other.

Database Normalization
Often, our starting point for database design is some pre-existing data, either in a spreadsheet, an existing database, or often in some legacy file format.

In this case, we often go through an iterative process of what could be called "table therapy". Moving the existing data closer to a form that matches our key principles above. This process is called Normalization. Normalization is a way of systematically breaking an existing table of data down into smaller tables to bring the dataset into line with our database design goals. To eliminate duplication and inconsistencies in the data structure.

There are 5 levels of Normalization, called the Normal Forms. Most functional databases are brought to 3rd Normal Form (3NF), so we will only focus on the first 3 Normal forms.

The Examples below are taken from "Practical Issues in Database Management" by Fabian Pascal.

1NF:
Remove repeating groups (they are essentially parts of the same base set). These may take the form of several columns which represent the same kind of data ... or multivalued columns where rows are repeated with essentially the same data.

Repeating Groups





1NF -- Removal of Repeating Groups 



2NF :
Ensure all columns in an entity are functionally dependant on the WHOLE KEY. This requires you to first define the primary key of an entity, and second determine that columns are dependant on that key. All non-key columns should depend on the primary key.

 Functional Dependencies : DEPT functionally dependant on PMGR



2NF -- All non-key columns depend on the primary key 



3NF:
Ensure all columns in an entity are functionally dependant DIRECTLY on the whole key. This requires you to further ensure from 2NF that all columns are dependant on only the primary key, and no other non-key colummns. If further dependancies are found, a table in 2NF is further broken to tables in 3NF.

 A Non-key dependancy
 * PMGR# is functionally dependant on PROJ#
 * BUT, DEPT# is directly dependant on PMGR# -- a non-key column (and thus only indirectly dependant on the primary key).



3NF -- All non-key columns depend ONLY on the primary key



The above examples, with a bit more detail are in the following file:

[[Media:DBDesignExamples.xls]]

The end result of a process of normalization is that a few (or single) large tables get broken down into smaller tables which can be more compactly stored, and in which entities are separated, but which also allow the recreation of the original dataset.

The normalization process solves three common problems that often occurs during the design and creation of databases. These three problems are as follows:
 * If a database is not properly normalized, then it cannot represent real-world information.
 * The database contains duplicates entries. This wastes both time and storage.
 * Important information has been excluded during the design process of the database.

Summarizing Normal Forms as a Series of Questions

 * 1st Normal Form: Are there any repeating groups?


 * 2nd Normal Form: How many entities are there?
 * What are the primary keys for each entity?
 * Is every column in the entity dependant on the primary key?


 * 3rd Normal Form: Is every column in the entity ONLY dependent on the primary key.

Summary
You can consider the process of Database Design as:
 * 1) Following the Database Design Principles to form an initial database design
 * 2) Checking the design via the Database Normalization Rules, and refining the design.

Resources

 * "Practical Issues in Database Management. Chapter 5. By Fabian Pascal.
 * The principles of Database Design are very practically set forth in this book.

Homework
No homework today.

Questions

 * /Lecture 13 Questions