Courses/Computer Science/CPSC 203/CPSC 203 2007Fall L04/CPSC 203 2007Fall L04 Lectures/Lecture 20

Lecture 20
First we examine our in-class developed algorithms for finding "Keystone Websites"

We will illustrate the keystone species in a simple graphic at: http://www.vtaide.com/png/foodchains.htm

Then we look at how Search Technology answers this question, with special emphasis on Google's Page Rank system.

Finally we examine how the increasing pervasiveness of search technology in modern day life raises various control issues with respect to:
 * economics
 * privacy
 * freedom

The objectives of today's class are:


 * House Keeping


 * Group Project Presentations begin the week of Nov 26th. Will be by lab/tutorial sections. Schedule posted today. Group Projects websites will be marked based on version TA's download Dec 7th.
 * Assignment 2 due by midnight Nov. 23rd. TA's will focus on this for this week's tutorials.
 * Course Ratings Time -- win a prize!


 * Reminders:
 * Final Exam Date and Time has been set: Monday Dec 17, 12-2p.m. (room unknown).


 * Topics
 * Review of "Keystone Websites Algorithms"
 * How Web Search Engines Work
 * The Page Rank Algorithm
 * Search and Control

How Search Engines Work

 * Top Down Design Example
 * The Spider or Web Crawler
 * Searches pages
 * Follows Links
 * The Index
 * Transforms results from Spiders forays into a single index
 * The Query Interface
 * How to make Search Easy For Humans
 * Link between questions you can ask and the Index

Page Rank
As a formula: PR(u) = SUM (PR(v)/L(v))

In English: The Page Rank of a page, "u" is based on summing all the Page Ranks/ # of Links from each page "v". Where for a given Page "u", "v" are all the pages that link into it.

Technical Difficulty: This algorithm is actually recursive. So, to find the Page Ranks for "v", you have to apply the algorithm to the pages that link into v, and so on, and so on. This is called a "Recursive Algorithm".


 * illustrative example using Dots-and-Edges graph.

For more technical details go to: http://en.wikipedia.org/wiki/PageRank


 * Page Rank
 * How do we sort items in the Index?


 * Back to "Keystone Website". The Keystone Website would be that with the highest Page Rank.

Control Issues

 * Economics
 * Are the rankings objective
 * How do paid ads show up
 * What happens if a pages rank suddenly changes due to changes in Google's algorithm
 * Click Fraud
 * Link Spamming


 * Privacy
 * "I took it off the net, but now it's cached by Google."
 * Blogs and Emails often searchable
 * Moving Search Technology to the Desktop: google office, Google mail, etc.
 * Mining Search Data (see Searching the Searches)


 * Freedom
 * Political Control of Website Search Results
 * Searching the Searches


 * Big Picture -- Back to Scale Free Networks
 * Probability of a new site linking to existing site with many links ..... higher page rank.
 * So .... Does Page Rank algorithm promote Internet as a Scale Free Network.
 * and, do the "issues" described above affect structure of a scale free network.

I will leave you with that question to ponder.

Resources
The Google Story. 2005. By David A. Vise.

The Search. How Google and its Rivals Rewrote the Rules of Business and Transformed Our Culture. 2005. by John Batelle.

Information Politics on the Web. 2004. Ny Richard Rogers.