Courses/Computer Science/CPSC 203/CPSC 203 2008Winter L03/CPSC 203 2008Winter L03 Lectures/Lecture 13

Lecture 13
Last Class we introduced the Query Trace Mechanism, and illustrated it through several example queries. This lecture we continue that theme, looking at several common query patterns that are useful for analyzing data in a database. Additionally we describe a visual method of thinking about data analysis from a database using the dots-and-edges approach.

OBJECTIVES:
 * You will be exposed to several common query patterns that further illustrate the Query Trace Mechanism
 * You will learn to use a visual dots-and-edges method of sketching out the queries you may need to solve a particular problem in a database.


 * House Keeping
 * Text Reading -- Chapter 11 -- Pay particular attention to the sections: Data Storage, Managing Data , and Data Mining.
 * Midterm Results: Class Average: 16.4/20 (82.2 %). An Exam Key is available in CPSC Office: ICT 602. Ask for Susan Lucas (Undergraduate Administrator)
 * Assignment 1 Extension -- Now Due: Midnight March 24th.
 * NOTE: Tutorials the week of March 17th will be devoted to TA's working with you on Assignment 1
 * Additional Analysis Examples on BB: Under "Course Documents", "AnalysisExamples for Assignment 1".

Basic Query Patterns
These patterns sequentially build from simple to more complex.


 * 1) All the data in a table
 * 2) All the data across several tables
 * 3) Selected fields from several tables
 * 4) Selected fields from several tables with constraints
 * 5) AND constraints
 * 6) OR constraints
 * 7) Selected fields from several tables with custom calculations
 * 8) Formulae (custom calculations for new variables)
 * 9) If ... Then (often for building classifications of existing data)
 * 10) Aggregate queries using selected fields from several tables
 * 11) Which fields are the "Dimensions" for GROUPBY? ("Dimensions").
 * 12) Which fields are we going to summarize for a GROUP?
 * 13) Aggregate queries using selected fields from sevreral tables with constraints
 * 14) The Crosstab Query Pattern

Mini-Tutorial: Data Analysis Design
The following notes offer a disciplined approach to developing a data analysis, and is particularly suitable to the case where much of the data organization and intermediate data processing is done via queries in a relational database.


 * 1) Begin with the End in mind and work Backwards. What is the final goal of your analysis?
 * 2) What are your data sources and how do they need to be organized to achieve your analytical goal.
 * 3) What statistics and calculated variables do you need to use, including interim calculations.
 * 4) Draw a "path" from source data to final analysis and use it to,
 * 5) Break pieces of the analysis down into small steps (in a relational DB each of these steps could be a query, so one ends up with a series of queries that correspond to each step in the analysis).
 * 6) Check the accuracy of each step of the analysis by
 * 7) Confirming the data coming into that stage of the analysis
 * 8) Confirmin the data exitting that stage of the analysis
 * 9) in particular the entering and exiting data in each stage must be in the form, and range that you expect.

A Few Example Questions

 * From a Database on Publishers, Titles, Authors, can we identify the profitability of different book Publishers?
 * From a Database on Student Website preferences, can we identify the most commonly preferred websites?
 * From a Database on Employee Pay and Performance -- can we some standards for pay and performance? (next Tuesday)

Dots and Edges Approach to Sketching An Analysis

 * Every Table and Query Result is a node.
 * The Actual Queries are Edges.
 * Name Queries Sequentially (easier to keep track of inputs) and with meaningful descriptions (what is this query doing).

TEXT READINGS
TIA 3rd Edn: Chapter 11. 462 -- 503

TIA 4th Edn: Chapter 11. pp 484 -- 525

Resources
The Query Trace mechanism covered in this lecure comes from this book.
 * The Essence of SQL. A Guide to Learning the Most SQL in the Least Amount of Time. 1996. By David Rozenshtein

A practical introduction to developing queries.
 * SQL Visual Quickstart Guide. 2005. By Chris Fehily.

The principles of Database Design are very practically set forth in this book.
 * "Practical Issues in Database Management. Chapter 5. By Fabian Pascal.