Courses/Computer Science/CPSC 203/CPSC 203 Template/Lecture Template/Lecture 10

Housekeeping
(none)

Required Reading

 * Text Readings: as per Lecture 9.


 * Article Reading: E.F. Codd. 1970.  A Relational Model of Data for Large Shared Data Banks: [Original Relational Model Paper]
 * Please Read section 2.1 (pp. 383-385, top). Suggestion: Focus on the examples in Fig. 4-12. You are not required to understand everything in these pages, but to develop an appreciation for the thinking style involved.

Introduction
Relational Databases are nominally systems for organizing and storing data. Their real power, is that once data is organized and stored in a relational database, one may ask questions of the data, i.e. query it. Relational Databases share their focus on defining and then querying relations with at least one programming language, (Prolog: see -- http://en.wikipedia.org/wiki/Prolog). But, to understand how to intelligently query a relational database -- we require a model as to how the database deals with queries.

In this lecture, our focus will be to move from set-theoretic thinking, to what is often called, "Relational Thinking", building up a mental model of how a relational database processes queries. Queries in relational databases are "Declarative", in that you focus on stating "what" you want, and the system has a general process for providing you with the data that matches your query. This is contrasted with the "Imperative" style common to many programming languages where you must explicitly define the algorithm to be run.

At the end of today's lecture you will have:
 * Been introduced to SQL -- the most common language for database querying
 * Been introduced to the "Rules for Relational Thinking" in the context of SQL
 * Seen examples of how those rules are applied in several simple queries.

Glossary

 * Query – A way of asking questions of a database. In general a query of a table(s) returns a subset of the table(s) for which the query parameters are true. For example, which people in the class listed Google as their most frequently visited website. More formally, a query is a proposition about data extracting a subset of information based on specific requirements using the primary and foreign keys. A query language is used to set up these requests and  the query can be repeated at different times for updated information, The Query Language we will be using to describe the Query Trace Mechanism is called SQL and is the standard query language used in most relational databases.


 * Join – the relations between entities or Parent Table (on primary key) and Child Table (on foreign key). Effectively a JOIN on two tables is a subset of the cartesian product of those two tables. Join Conditions are defined by setting relationships between the Primary Key of the Parent table and the Foreign Key of the Child Table. E.g: PKeyparentTable = FkeyChildTable.
 * One-to-One: Ex: a student can only borrow one book
 * One-to-Many: Ex a student can take many courses
 * Many-to-Many: Ex: many students can take many courses. Joins can break down into multiple One-to-Many joins


 * SQL -- "Structured Query Language" -- the language underlying most relational database. SQL is often called a "data sublanguage" in that it is not meant to be a full programming language. However, in the context of allowing complex questions to be asked of data, it is extremely powerful.
 * SQL Keywords (also called 'clauses')
 * SELECT -- Identifies columns that will be selected
 * FROM -- Identifies tables from which data will originate
 * WHERE -- Identifies constraints on which ROWS of data will be returned. The WHERE clause is also used to set Join conditions between multiple tables.
 * GROUP BY -- In an Aggregate Query (one with groups) identifes the columns for which data will be grouped.
 * HAVING -- In an Aggregate Query, identifes constraints on which GROUPS will be returned -- each row is data on a group.

Concepts
Recall our Meta Model from last class.



Rules for Relational Thinking: The Query Trace Mechanism
... modified from "The Essence of SQL" by David Rozenshtein

The Query Trace Mechanism is an abstract model of how a relational database processes a query. In particular it is a Logical Model, which asks which set theoretic operations a database needs to conduct. It does not worry about the 'hows' of implementation details. However queries are implemented in a relational database, they must provide the same answer as given by the query trace mechanism. As such, you can imagine tracing a query by hand by drawing out a set of tables and manipulating them on graph paper. The relational database should give you exactly the same answer (and hopefully much more quickly).

Query Trace Mechanism

 * 1) Take the Cartesian-Product of all tables in the FROM clause. Result -- A 'temporary table' from which you will delete rows and columns according to the rules below
 * 2) Consider every row from Step 1, exactly once and evaluate the WHERE clause condition for it. Result -- Some rows will meet the condition, "TRUE", and others will not, "FALSE".
 * 3) If the condition from Step 2 returns "TRUE", formulate a row according to the SELECT clause and retrieve it. Result -- Cross out all lines that are false. Cross out columns that are not in the SELECT clause.
 * 4) Group (re-arrange the rows in the result of step 3 according to the GROUP BY clause. Result -- You now have one row for each group.
 * 5) Evaluate the HAVING clause condition for each group, and eliminate those groups for which this condition fails. Result -- Rows representing groups for which HAVING is "FALSE" are crossed out.
 * 6) Formulate one resulting row for each remaining group (i.e. the ones for which HAVING was true according to the SELECT clause and retrieve it. Result -- Returns summary data for all 'TRUE' rows in HAVING, based on the "summary" variables chosen in the SELECT clause.

Steps 1-3 are called the  "Type 1"  query evaluation mechanism. When extended to step 6, they represent the query evaluation method for Aggregation  (data summarization). There are more complicated query types which are outside the scope of this course, but which can be traced in similar manner. The Type 1 query mechanism deals with the most common types of queries encountered.

It is important to understand the role of each of the SQL key words to understand the Query Trace Mechanism -- in general
 * SELECT (which columns?)
 * FROM (which tables?)
 * WHERE (which rows do you want returned?? -- Constrain on rows that match your criteria)
 * GROUP BY (which groups do you wish to summarize? -- group on columns)
 * HAVING ( which groups do you want returned? -- Constrain on groups that match your criteria)

Query Trace Mechanism Examples
We put the query trace mechanism to work via a small dataset and several examples in the document below.

[[Media: RulesForRelationalThinking.doc]]

Summary

 * There is a small set of Keywords in SQL --- SELECT, FROM, WHERE, GROUP BY, and HAVING that allow us to express a large range of queries.
 * Essential to Relational Thinking you must understand the Query Trace Mechanism which is an abstract model of how relational databases process queries set-theoretically.

Text Readings
as per "Required Readings" for lecture 9.

Resources

 * The Essence of SQL. A Guide to Learning the Most SQL in the Least Amount of Time. 1996. By David Rozenshtein
 * The Query Trace mechanism covered in this lecure comes from this book.


 * SQL Visual Quickstart Guide. 2005. By Chris Fehily.
 * A practical introduction to developing queries.

Homework

 * 1) Using Entity Relationship Diagrams (E-RD) and the notion of set cardinality, develop a graphical argument that cycles are not possible for relations amongst tables within a relational database.
 * 2) Investigate Relational Thinking using the following Pre-Built File:[[Media: RelationalThinkingExamples.mdb]]
 * 3) Queries that begin with "qry" are examples of common types of queries you might encounter (for example in a TBA).
 * 4) Queries that begin with "trace" link back to our Query Trace Examples above.
 * 5) To investigate each query:
 * 6) Copy it and give it a new name
 * 7) Make Changes to the Query Window (Design View) or SQL Window (SQL View)
 * 8) Do you get the results you predict

Questions

 * /Lecture 10 Questions