Courses/Computer Science/CPSC 203/CPSC 203 Template/Lecture Template/Lecture 11
Contents
Housekeeping
Required Reading
see Required Reading for lecture 10.
Introduction
Last class we introduced some conventions for drawing simplified Entity Relationship Diagrams (E-RDs) and introduced the Query Trace Mechanism.
Our focus today is to practice Relational Thinking -- i.e. building up our mental model of how relational databases process queries. Today we will continue to refine our use of E-RD's and the Query Trace Mechanism. Finally, we will look at eight basic query patterns that occur frequently.
At the end of today's lecture you will have:
- Looked at several E-RDs and refined our E-RD conventions
- Looked at several more examples of the Query Trace Mechanism
- Been introduced to 8 basic query patterns.
Glossary
Parts of glossaries for Lectures 9/10 repeated for ease of reference.
From Lecture 9
- Tuple -- A sequence (ordered list) of values. In a database, the values for every row in a table or query can be called a tuple.
- Key – a unique identifier: data value that makes the row of information completely different from any other row
- Primary: primary unique id of an instance (case)
- Foreign: key linked to Primary key of parent table
- Index - A look up table for a column with repeating values that makes queries on this column more efficient.
- Referential Integrity -- Put simply, the restriction that no entry in a child table can be made unless the foreign key for that entry matches an existing primary key in the parent table(s). See Wikipedia: http://en.wikipedia.org/wiki/Referential_Integrity
Wow, this wasn't a simple definition at all Mishtu, what were you thinking? And then linking to Wikipedia,where anybody could edit the information.
- Entity Relationship Diagram (ER-D) -- A diagramatic summary of the tables in a database and their relationships. We use a simple version of ER-D in this course -- for more information see http://en.wikipedia.org/wiki/Entity-relationship_diagram
From Lecture 10
- Query – A way of asking questions of a database. In general a query of a table(s) returns a subset of the table(s) for which the query parameters are true. For example, which people in the class listed Google as their most frequently visited website. More formally, a query is a proposition about data extracting a subset of information based on specific requirements using the primary and foreign keys. A query language is used to set up these requests and the query can be repeated at different times for updated information, The Query Language we will be using to describe the Query Trace Mechanism is called SQL and is the standard query language used in most relational databases.
- Join – the relations between entities or Parent Table (on primary key) and Child Table (on foreign key). Effectively a JOIN on two tables is a subset of the cartesian product of those two tables. Join Conditions are defined by setting relationships between the Primary Key of the Parent table and the Foreign Key of the Child Table. E.g: PKeyparentTable = FkeyChildTable.
- One-to-One: Ex: a student can only borrow one book
- One-to-Many: Ex a student can take many courses
- Many-to-Many: Ex: many students can take many courses. Joins can break down into multiple One-to-Many joins
- SQL -- "Structured Query Language" -- the language underlying most relational database. SQL is often called a "data sublanguage" in that it is not meant to be a full programming language. However, in the context of allowing complex questions to be asked of data, it is extremely powerful.
- SQL Keywords (also called 'clauses')
- SELECT -- Identifies columns that will be selected
- FROM -- Identifies tables from which data will originate
- WHERE -- Identifies constraints on which ROWS of data will be returned. The WHERE clause is also used to set Join conditions between multiple tables.
- GROUP BY -- In an Aggregate Query (one with groups) identifes the columns for which data will be grouped.
- HAVING -- In an Aggregate Query, identifes constraints on which GROUPS will be returned -- each row is data on a group.
Concepts
ER-D Practice
Entity Relationship Diagram (ER-D) are A diagramatic summary of the tables in a database and their relationships. We use a simple version of ER-D in this course, focussed on primary-key foreign key relationships amongst tables.
In class we will:
- extend our conventions for ER-Ds
- go through a few more ER-D examples
Query Trace Mechanism Practice
In essence, the Query Trace Mechanism can be considered to be a series of steps, beginning with the Cartesian Product of a set of Tables, that gradually restricts a subset of data. We express our Query Trace Mechanism in terms of SQL keywords.
In class, we will:
- go through a few more examples of the query trace mechanism, in terms of answering simple questions.
Basic Query Patterns
In addition to being able to trace a query via the Query Trace Mechanism, it is useful to know a small set of commmon patterns.
In class we will go through 8 commonly encountered query patterns.
These patterns sequentially build from simple to more complex.
- All the data in a table
- All the data across several tables
- Selected fields from several tables
- Selected fields from several tables with constraints
- AND constraints
- OR constraints
- Selected fields from several tables with custom calculations
- Formulae (custom calculations for new variables)
- If ... Then (often for building classifications of existing data)
- Aggregate queries using selected fields from several tables
- Which fields are the "Dimensions" for GROUPBY? ("Dimensions" are the fields we do the grouping by).
- Which fields are we going to summarize for a GROUP? (All other fields have a single calculated value for a group)
- Aggregate queries using selected fields from sevreral tables with constraints
- The Crosstab Query Pattern
Basic Query Pattern Examples
Examples of these 8 query patterns are in the file below:
Media:Basic_Query_Patterns.doc
Applying these query patterns sequentially allows us to move forward from considering individual queries, to developing analyses through sequences of queries.
As a teaser for lecture 12, consider the following graph:(which puts in sequence several of the query patterns we have seen today to develop an analysis).
Summary
We have explored:
- Several ER-D examples
- Several examples of the Query Trace Mechanism
- Eight basic query patterns (from which more complex analyses can be built up).
Text Readings
Resources
E.F. Codd's original proposal for the Relational Data Model can be found online at: http://www.seas.upenn.edu/~zives/03f/cis550/codd.pdf "A Relational Model of Data For Large Shared Data Banks" created the theory behind the modern databases used in large organizations today. While technical -- the introductory parts of the article are accessible to the general lay reader and provide a good introduction to the thinking style needed to "Grok" databases.
- The Database Relational Model. A Retrospective Review and Analysis. 2001. By C.J. Date
- Practical Issue in Database Management -- A Reference for the thinking Practioner.2000. By Fabian Pascal
- The Essence of SQL. A Guide to Learning the Most SQL in the Least Amount of Time. 1996. By David Rozenshtein
- SQL Visual Quickstart Guide. 2005. By Chris Fehily.
Homework
Here is the dataset used for todays examples of basic query patterns (slightly modified from the homework database from last lecture.
Media:BasicQryPatternExamples.mdb
Again, explore this database, and the basic query patterns by copying and modifying queries.