Courses/Computer Science/CPSC 203/CPSC 203 2007Fall L04/CPSC 203 2007Fall L04 Lectures/Lecture 12

Lecture 12
We revisit the Query Trace Mechanism. Once you understand how to Trace Queries -- you should be able to  predict  what a particular query is going to return, and are in a good position to begin developing more complex queries, and analyzing data using queries. Remember -- queries are a way of asking questions of a database.

The objectives of today's class are:


 * House Keeping
 * Text Reading -- Chapter 11 -- Pay particular attention to the sections: Data Storage, 'Managing Data , and Data Mining.
 * Reminders:
 * I'll put up a small practice test on BB tonight (about a half dozen questions).
 * Half of next Tuesday's session will be Question/Answer session to help you prepare for the first mid-term.
 * Midterm is next Thursday (you have the whole class).


 * Topics
 * Review of the Query Trace Mechanism
 * Database Design Principles
 * Database Design Exercise

Review of Query Definition and Glossary (from last class)

 * Query – A way of asking questions of a database. In general a query of a table(s) returns a subset of the table(s) for which the query parameters are true. For example, which people in the class listed Google as their most frequently visited website. More formally, a query is a proposition about data extracting a subset of information based on specific requirements using the primary and foreign keys. A query language is used to set up these requests and  the query can be repeated at different times for updated information


 * SQL -- "Structured Query Language" -- the language underlying most relational database
 * SQL Keywords (also called 'clauses')
 * SELECT -- Identifies columns that will be selected
 * FROM -- Identifies tables from which data will originate
 * WHERE -- Identifies constraints on which ROWS of data will be returned
 * GROUP BY -- In an Aggregate Query (one with groups) identifes the columns for which data will be grouped.
 * HAVING -- In an Aggregate Query, identifes constraints on which GROUPS will be returned -- each row is data on a group.

It is important to understand the role of each of these key words to understand the Query Trace Mechanism -- in general
 * SELECT (which columns?)
 * FROM (which tables?)
 * WHERE (which rows do you want returned?? -- Constrain on rows that match your criteria)
 * GROUP BY (which groups do you wish to summarize? -- group on columns)
 * HAVING ( which groups do you want returned? -- Constrain on groups that match your criteria)

The Query Trace Mechanism With a Few Extra Notes
... modified from "The Essence of SQL" by David Rozenshtein


 * 1) Take the Cross-Product of all tables in the FROM clause
 * 2) Result -- A 'temporary table' from which you will delete rows and columns according to the rules below
 * 3) Consider every row from Step 1, exactly once and evaluate the WHERE clause condition for it
 * 4) Result -- Some rows will meet the condition, "TRUE", and others will not, "FALSE".
 * 5) If the condition from Step 2 returns "TRUE", formulate a row according to the SELECT clause and retrieve it.
 * 6) Result -- Cross out all lines that are false. Cross out columns that are not in the SELECT clause.
 * 7) Group (re-arrange the rows in the result of step 3 according to the GROUP BY clause.
 * 8) Result -- You now have one row for each group.
 * 9) Evaluate the HAVING clause condition for each group, and eliminate those groups for which this condition fails.
 * 10) Result -- Rows representing groups for which HAVING is "FALSE" are crossed out.
 * 11) Formulate one resulting row for each remaining group (i.e. the ones for which HAVING was true according to the SELECT clause and retrieve it.
 * 12) Result -- Returns summary data for all 'TRUE' rows in HAVING, based on the (summary" variables chosen in the SELECT clause.

Steps 1-3 are called the  "Type 1"  query evaluation mechanism. When extended to step 6, they represent the query evaluation method for Aggregation  (data summarization).

Database Design Guidelines
Database Design is an iterative process. rarely do we get to design a database 'from scratch'. Often some data already exists -- often in spreadsheets and text files. Our job is to figure out how to turn it into a well structured relational database that meets the following Design Goals :
 * 1) The "Entities" -- (things in the real world) are clear.
 * 2) The Relationships between entities are maintained.
 * 3) Storage is efficient (i.e. we want to take up the least space in the computer).
 * 4) Redundancy is minimized (i.e. the same information is not showing up in several places).
 * 5) No data is lost (i.e. from the relational database -- we can re-create the original data set).

Fabian Pascal in "Practical Issues in Database Management" has aptly summarized the essence of database design with respect to normalization as, "The Key, the Whole Key and Nothing but the Key".

Recognizing "base sets" in a database, and tying them to unique keys takes one far along the path of database design. We will not cover "Normalization" proper in this course, but will sample from Normalization theory to offer the following general guidelines in developing a database as a series of related tables:


 * 1) Identify the unique "Primary"  key (stands in for the members of the base set).
 * 2) Remove repeating groups (they are essentially parts of the same base set).
 * 3) Break out columns that are dependant on each other into another table.
 * 4) Make sure all the other columns in a table depend on the unique Primary key.

Iterative Database Design Exercise
We will use the Survey Data Set.
 * We will work in "Teams" for 20 minutes -- then present our results to the group.
 * Each Team is a long table.
 * There are 2 Sub-Teams:
 * White Sub-Team are the Designers
 * Black Sub-Team are the 'Bug Finders' -- Critique design in terms of the Design Goals  above.
 * Rapidly work your way through at least 2 iterations.
 * Iteration 1: White Team initial design. Black Team Critique.
 * Iteration 2: White Team re-design. Black Team Re-Critique.
 * Last Part of Class we will have a couple of Teams present their Design.
 * Tuesday I will present a 'Normalized' Design.

TEXT READINGS
TIA 3rd Edn: Chapter 11. 462 -- 503

TIA 4th Edn: Chapter 11. pp 484 -- 525

Resources
The Query Trace mechanism covered in this lecure comes from this book.
 * The Essence of SQL. A Guide to Learning the Most SQL in the Least Amount of Time. 1996. By David Rozenshtein

A practical introduction to developing queries.
 * SQL Visual Quickstart Guide. 2005. By Chris Fehily.

The principles of Database Design are very practically set forth in this book.
 * "Practical Issues in Database Management. Chapter 5. By Fabian Pascal.