Courses/Computer Science/CPSC 203/CPSC 203 2008Winter L03/CPSC 203 2008Winter L03 Lectures/Lecture 10

Lecture 10
Today we introduce some basic concepts in set theory needed to understand the basics of Relational Databases, and introduce the core model for a relational database as a dots-and-edges diagram. We also (a) continue with our in-class information-dashboard design exercise from last class, and (b) review the format for the mid-term.


 * House Keeping
 *  MIDTERM: Tuesday February 26th  -- I have posted practice questions on BB as well as an answer key.
 * READING: Chapter 11: Databases and Information Systems (see TEXT READDING section at bottom of today's lecture)
 * We'll discuss last 15 minutes to discussing Midterm format, and provide some study hints.


 * Todays Topics
 * Introducing the Meta model for a relational database; as well as relational DB origins.
 * Some basic set theory concepts via Venn Diagrams
 * Dashboard Exercise

Lecture Glossary

 * Venn Diagrams -- a visual method of representing sets invented by the Reverend John Venn. A square box represents the "Universe" of discourse, and circles within that box represent sets, and the different kinds of operations one can do on sets.
 * Set - A set is a collection of unique objects. In databases, everything is in sets and subsets of information. Sets, Subsets, Supersets, Intersection sets and Union sets were introduced in class via "Venn Diagrams": http://en.wikipedia.org/wiki/Venn_diagram
 * Null Set -- and imaginary set with no elements.
 * Intersection Set -- Given two sets, A and B, the intersection set is the set of those objects that exist BOTH in SetA AND SetB
 * Union Set -- Given two sets, A and B, the union set is the set of those objects that exist in EITHER SetA or SetB.
 * Set Complement -- All the items NOT IN a set A.
 * Subset -- Set B is a subset of A, IF all members of B are also in A.
 * Superset -- Set A is a superset of B, if all members of B are also in A.


 * Table - A row (case) by column (variable) display. An entity that has a group of related records. In Relational Databases, a table if often called a "Relation". It is also called an "Entity".
 * Domain – “Data Type” = accepted values and operations. A set of values of a specific type with allowable operations that can apply to many attributes. Every field or column must be assigned a data type which is a domain (with specific rules) such as:
 * Text
 * Numeric
 * Integer
 * Date
 * Hyperlink
 * Attribute – a feature of an entity (a "variable")
 * Entity – an object in the world, which can have many relationships with other entities
 * Relationship – Intersection set of keys for 2 tables. A link between two entities.
 * Join – the relations between entities or Parent Table (on primary key) and Child Table (on foreign key)

How did the Relational Database Model come about
The relational database model originated in E.F. Codd's 1970 paper, "A Relational Model of Data for Large Shared Data Banks". This paper is still available online at: http://www.seas.upenn.edu/~zives/03f/cis550/codd.pdf

Codd noted that users of a database should not have to know the details of how a database is implemented in a particular computer. He introduced the relational model, as a logical framework for representing data, that is independant of the details of the actual computer implementation. In his model, information is represented in tables where rows are cases and columns are variables. He showed how large amounts of information could be organized into tables, and defined basic operations that can be done on those tables (which we will cover in future lectures). In essence, his notion of a table corresponded to the idea of a set, and so relational databases are essentially set oriented in their operations.

Why did the Relational Database Model eventually "win" over other models
Today, most corporate databases follow the relational model. However that was not always so. From the mid-70's to the mid-8-'s the relational database model began to dominate for several reasons:
 * 1) A Logical Model. As a logical model -- it allowed for multiple implementations. As long as an implementation followed the logical model it should (in theory) give the same results. This allowed it to be implemented in a large number of different types of computers.
 * 2) A Data Sublanguage Fairly soon after the introduction of the Relational Data Model, the Structured Query Language was created, which formed a easy (erhhh, relatively easy) to use language for stating operations in a relational database. This opened up database usage to non-programmers (again, relatively speaking).
 * 3) Early Commercial Implementations. Codd worked at the IBM research laboratories (where SQL was also invented) and IBM soon developed a commercial implementation of the relational database (currently called DB2). But they were beaten to the punch by a small company (at the time) called Oracle.
 * 4) Research Implementations The use of logical models attracted a large group of computer scientists to work on both database theory and implementation issues. Much of this work was done around a series of research oriented databases that resulted in Ingres, and later Postgres databases.

Initially, Relational Database performed slower than other competing databases, but eventually they caught up, and their key advantage was their uniform treatment of all data as tables.

MIDTERM TEST FORMAT AND POLICY

 * 1) Midterm is in regular lecture class -- Tuesday, February 26th, 2p.m.
 * 2) You have the full 75 minutes.
 * 3) Format is Multiple Choice (20-25 questions). (so how many minutes/question???)
 * 4) Your answers are on "Bubble Sheets" -- make sure to fill out your info completely.
 * 5) Exams will not be handed back (as per finals) -- but an answer sheet will be available in the CPSC office.

TEXT READINGS
TIA 3rd Edn: Chapter 11. 462 -- 503

TIA 4th Edn: Chapter 11. pp 484 -- 525

Resources
E.F. Codd's original proposal for the Relational Data Model can be found online at: http://www.seas.upenn.edu/~zives/03f/cis550/codd.pdf  "A Relational Model of Data For Large Shared Data Banks"  created the theory behind the modern databases used in large organizations today. While technical -- the introductory parts of the article are accessible to the general lay reader and provide a good introduction to the thinking style needed to "Grok" databases.


 * The Database Relational Model. A Retrospective Review and Analysis. 2001. By C.J. Date
 * Practical Issue in Database Management -- A Reference for the thinking Practioner.2000. By Fabian Pascal
 * The Essence of SQL. A Guide to Learning the Most SQL in the Least Amount of Time. 1996. By David Rozenshtein
 * SQL Visual Quickstart Guide. 2005. By Chris Fehily.