Courses/Computer Science/CPSC 203/CPSC 203 Template/Lecture Template/Lecture 9

Required Reading

 * fluency textbook: Chapter 16: A Table with a View pp 442- -- 445, 453--478


 * The original paper that introduced the Relational Database Model: E.F. Codd. 1970. A Relational Model of Data for Large Shared Data Banks: [Original Relational Model Paper]
 * Please Read sections 1.1 - 1.3 (pp. 377-381, top).

Introduction
Last class, a fairly large vocabulary was introduced around Set Theory. In this lecture, we will practice using that vocabulary, and slowly building up our understanding of relational databases as operating on sets.

Today we will continue practicing "Thinking in Sets" as a prelude to understanding relational databases. We will then proceed to slowly building up insight into the Relational Database Meta Model (introduced last class).

At the end of today's lecture you will have:
 * Further practiced solving simple problems using Set Theory.
 * Understood the three core concepts for relational databases
 * Gone through an example based introduction to the Relational Database Meta Model.

Extension from Lecture 8

 * Table - A row (case) by column (variable) display. An entity that has a group of related records. In Relational Databases, a table is often called a "Relation". It is also called an "Entity". Each variable is of a particular data type (domain)
 * Parent: A Parent-Table or Entity has it's primary key linked to one to many foreign keys in the Child table.
 * Child: A Child-Table or Entity has a foreign key(s) that link it back to one or more Parent Tables. The child tables foreign keys are primary keys in the parent tables.
 * Domain – “Data Type” = accepted values and operations. A set of values of a specific type with allowable operations that can apply to many attributes. Every field or column must be assigned a data type which is a domain (with specific rules) such as:
 * Text
 * Numeric
 * Integer
 * Date
 * Hyperlink
 * Attribute – a feature of an entity (a "variable")
 * Entity – an object in the world, which can have many relationships with other entities
 * Relationship – A link between two entities. If the entities are objects in the world, the relationship between them could be considered a verb. In relational databases, a relationship is represented via the intersection set of keys for 2 tables. A link between two entities.
 * Join – the relations between entities or Parent Table (on primary key) and Child Table (on foreign key)
 * One-to-One: Ex: a student can only borrow one book
 * One-to-Many: Ex a student can take many courses
 * Many-to-Many: Ex: many students can take many courses. Joins can break down into multiple One-to-Many joins
 * Cardinality (of a set). The number of elements in a set. If set A had 4 elements, we would express its cardinality as |A| = 4. See: http://en.wikipedia.org/wiki/Cardinality
 * Cartesian Product of Sets . Given two sets, S and T, the cartesian product pairs every element in S with every element in T. If we label an element in S as s, and an element in T as T, the cartesian product is the set of all ordered pairs (s,t).  In Relational Databases, a Cartesian product is the default result of including two tables in a query without a join. Essentially, it is multiplying the cardinality of each of the tables. See: http://en.wikipedia.org/wiki/Cartesian_product

Additional Database Concepts and Terminology

 * Meta-Model -- A model of a model. The Relational Database core model, is an example of a Meta-Model in that every specific relational data model conforms to it; i.e. every specific relational data model is constituted of Domains, Attributes, Entities, Relationships.
 * Query – A way of asking questions of a database. In general a query of a table(s) returns a subset of the table(s) for which the query parameters are true. For example, which people in the class listed Google as their most frequently visited website. More formally, a query is a proposition about data extracting a subset of information based on specific requirements using the primary and foreign keys. A query language is used to set up these requests and  the query can be repeated at different times for updated information
 * Tuple -- A sequence (ordered list) of values. In a database, the values for every row in a table or query can be called a tuple.
 * Key – a unique identifier: data value that makes the row of information completely different from any other row
 * Primary: primary unique id of an instance (case)
 * Foreign: key linked to Primary key of parent table
 * Index - A look up table for a column with repeating values that makes queries on this column more efficient.
 * Referential Integrity -- Put simply, the restriction that no entry in a child table can be made unless the foreign key for that entry matches an existing primary key in the parent table(s). See Wikipedia: http://en.wikipedia.org/wiki/Referential_Integrity
 *  Entity Relationship Diagram (ER-D)  -- A diagramatic summary of the tables in a database and their relationships. We use a simple version of ER-D in this course -- for more information see http://en.wikipedia.org/wiki/Entity-relationship_diagram

From the Relational Database Meta Model to Tables
(from last class)



Three Core Concepts for Relational Databases
We will cover the triad of core concepts:


 * 1) In A Database --- Everything is Abstacted to 2-D tables. ("It's Tables all the way down").
 * 2) A table represents a set of values for some object -- identified by the Primary Key. Each row in a table specifies a set of attributes for an "instance" of an object (an individual). The rows are often referred to as "tuples". Relationships between tables are via the intersection of Foreign Keys (of the child table) with the Primary Keys (of the parent table).
 * 3) All operations in a database are operations on sets of values.


 * The Relational Database Meta Model -- tells you 'what' a database is in the abstract. It represents sets, and mappings between sets.
 * In any specific relational database, the Relational Database Meta Model is itself implemented as a set of tables.
 * That is, there is an underlying database (from the meta-model) that keeps track of every specific database you build.
 * In this sense, the Relational Database Meta Model specifies a set of "meta-tables" needed for a relational database to maintain information about itself.

How did the Relational Database Model come about
The relational database model originated in E.F. Codd's 1970 paper, "A Relational Model of Data for Large Shared Data Banks". This paper is still available online at: http://www.seas.upenn.edu/~zives/03f/cis550/codd.pdf

Codd noted that users of a database should not have to know the details of how a database is implemented in a particular computer. He introduced the relational model, as a logical framework for representing data, that is independant of the details of the actual computer implementation. In his model, information is represented in tables where rows are cases and columns are variables. He showed how large amounts of information could be organized into tables, and defined basic operations that can be done on those tables (which we will cover in future lectures). In essence, his notion of a table corresponded to the idea of a set, and so relational databases are essentially set oriented in their operations.

Why did the Relational Database Model eventually "win" over other models
Today, most corporate databases follow the relational model. However that was not always so. From the mid-70's to the mid-8-'s the relational database model began to dominate for several reasons:
 * 1) A Logical Model. As a logical model -- it allowed for multiple implementations. As long as an implementation followed the logical model it should (in theory) give the same results. This allowed it to be implemented in a large number of different types of computers.
 * 2) A Data Sublanguage Fairly soon after the introduction of the Relational Data Model, the Structured Query Language was created, which formed a easy (erhhh, relatively easy) to use language for stating operations in a relational database. This opened up database usage to non-programmers (again, relatively speaking).
 * 3) Early Commercial Implementations. Codd worked at the IBM research laboratories (where SQL was also invented) and IBM soon developed a commercial implementation of the relational database (currently called DB2). But they were beaten to the punch by a small company (at the time) called Oracle.
 * 4) Research Implementations The use of logical models attracted a large group of computer scientists to work on both database theory and implementation issues. Much of this work was done around a series of research oriented databases that resulted in Ingres, and later Postgres databases.

Initially, Relational Database performed slower than other competing databases, but eventually they caught up, and their key advantage was their uniform treatment of all data as tables.

Summary

 * To understand Relational Databases, you must think in terms of Sets, and Set Operations (union, intersection, difference, Cartesian Product)
 * These sets, are represented in relational databases as Tables
 * The Relational Database Meta Model specifies a set of "meta-tables" needed for a relational database to maintain information about itself.

Text Readings
See Required Readings above.

Resources
E.F. Codd's original proposal for the Relational Data Model can be found online at: http://www.seas.upenn.edu/~zives/03f/cis550/codd.pdf  "A Relational Model of Data For Large Shared Data Banks"  created the theory behind the modern databases used in large organizations today. While technical -- the introductory parts of the article are accessible to the general lay reader and provide a good introduction to the thinking style needed to "Grok" databases.


 * The Database Relational Model. A Retrospective Review and Analysis. 2001. By C.J. Date
 * Practical Issue in Database Management -- A Reference for the thinking Practioner.2000. By Fabian Pascal
 * The Essence of SQL. A Guide to Learning the Most SQL in the Least Amount of Time. 1996. By David Rozenshtein
 * SQL Visual Quickstart Guide. 2005. By Chris Fehily.

Database Questions

 * 1) Table A has 8 rows. Table B has 10 rows. If you were to do a query on Table A and Table B together, without a join -- how many rows would be returned?
 * 2) A table is made up of tuples. Each row of data is a tuple. The unique identifier for each tuple is the Primary Key. A query returns a sub-set of the rows in a table(s) based on the Primary Key. With respect to queries -- what would the Power Set of the Primary Key for a table represent?
 * 3) Table A is a table about Publishers. it has 100 rows. Table B is a table about books. It has 20 rows. The primary key for Table A is "Publishers". The Primary key for Table B is "Books". Table B has a Foreign Key, "Publishers". There are no publishers in common between Table A and Table B. How many rows will be returned for the join on these 2 tables?
 * 4) How many primary keys can a table have?
 * 5) Distinguish Primary Keys and Foreign Keys.

More Set Theory and Venn Diagram Questions
(modified from :Wallis, W.D. 2003. A Beginner's Guide to Discrete Mathematics. Birkhauser).


 * 1) No students in this class are loud. John is a math major. All math major's are loud. So John is not a student of this class.
 * 2) Some animals walk on two legs. Human beings are animals. From these facts, can we conclude that human beings walk on two legs?
 * 3) All my friends are scientists. None of my neighbours are scientists. Ruth is my friend. Is Ruth a scientist?
 * 4) All expensive food contains cholesterol. Steak contains no cholesterol. Is steak expensive?
 * 5) All designer clothes are expensive. None of my clothes are expensive. All expensive clothes are well made. Based on these facts:
 * 6) Do I own designer clothes?
 * 7) Are all designer clothes well made?
 * 8) Are any of my clothes well made?

Questions

 * /Lecture 9 Questions