Courses/Computer Science/CPSC 203/CPSC 203 2008Winter L03/CPSC 203 2008Winter L03 Lectures/Lecture 7

Lecture 7
Today we recap the spreadsheet design principles from the last lecture, then focus on visually introducing a number of basic statistical terms and concepts.


 * House Keeping
 *  MIDTERM: Tuesday February 26th  -- Note I'll post some practice questions to BB about a week before.
 * PREREADING: Chapter 11: Databases and Information Systems to prepare for next week
 * Participatory Excercises and Peer Review
 * No office hour Thursday Feb 7th -- re-schedule for today or next tuesday.
 * Assignment 1 will be introduced on Thursday


 * Todays Topics
 * Quick Recap of Spreadsheet Design Principles -- illustrated with Presidents Data Set
 * A Visual Introduction to basic statistical terms and concepts

Lecture Glossary
Know both the meaning of these statistics, and how to access them in a spreadsheet:
 * Mean - the 'centre' of a set of values, aka the 'Average'
 * =AVERAGE(Cell:Cell)
 * Median - the middle value in a set of values
 * =MEDIAN(Cell:Cell)
 * Mode - The most frequently occuring value in a set of values
 * =MODE(Cell:Cell)
 * Standard Deviation - a statistical measurement of the spread of its values on either side of the mean
 * =STDEV(Cell:Cell)
 * Count - a function that counts the amount of data values
 * =COUNT(Cell:Cell)
 * Sum - adds all the numbers in a range of cells
 * =SUM(Cell:Cell)
 *  Min -- minimum in a set of values
 * = MIN(Cell:Cell)
 *  Max -- maximum in a set of values
 * = MAX(Cell:Cell)
 * Range  -- Max - Min
 *  Precision  The limits of our measuring instruments. The "box" within which all observations appear equal.
 *  Scattergram  -- A 2 dimensional display of data points on an X-Y plane.
 *  Cartesian Plane  A rectangular coordinate system that associates each point with a pair of numbers. The basis of Scattergrams. See http://dl.uncw.edu/digilib/mathematics/algebra/mat111hb/functions/coordinates/coordinates.html
 *  Population  All the observations we are interested in.
 *  Sample  A subset of the observations we are interested in. Usually created via some sampling process -- random sampling, stratified sampling, systematic sampling, etc. To make correct inferences from samples, we must assume they reflect the population we are interested in.
 *  Correlation  How strongly related two variables are.
 * Regression  The degree and form to which a dependant variable (Y) is a function of an independant variable (X). The resulting regression equation expresses Y as a function of X, + an error term.
 *  Classification Individual cases(data observations) are placed into groups based on one to severable variables . Classification is used to break a data set up into groups often by searching for natural "breaks" in the data (i.e. areas with sparse data) that separate areas with dense data.
 *  Outlier An observation that is numerically distant from the rest of the data.

This lecture is focussed on giving a visual introduction to these statistics. Note: the Mean, Median, Mode are all measures of 'location'.

A Visual Introduction to Statistics

 * Our data 'lives' in the Cartesian Plane (or extensions therof).
 * The data could represent the Population or a Sample from the Population we are interested in.
 * We can represent this in 2D as a 'Scattergram
 * The Min and 'Max set the boundaries for where data resides in our Scattergram.
 * The 'cells' in the Scattergram reflect the Precision of data.
 * The 'cell' with the most data is the  Mode 
 * The Mean and Median are two ways of estimating where data is most frequent in a Scattergram, i.e. the central location
 * The  Standard Deviation  is a measure of how data varies around the Mean. It can be imagined as an ellipse drawn in a Scattergram.
 * We can imagine the Correlation between two variables as an ellipsoid.
 * We can imagine the Regression between two variables as that line through the data that minimizes the deviations of Y from the line.
 * We can imagine a 'multivariate' data set as a matrix of scattergrams.

We will follow up today's visual intro on Thursday with an introduction to Charting, and the principles by which we make our Scattergram (and other charts) both visually appealing, and accurate in their communication of information.

Resources
See any "basic" statistics text for review, if you are unfamiliar with any of the statistical terms covered

"Presidents Data" from "Political Control of the Economy" by Edward R. Tufte. Princeton, 1978.