Courses/Computer Science/CPSC 441.W2014/Tutorials

= Introduction = Hello, my name is Carrie Mah and I am currently in my 3rd year of Computer Science with a concentration in Human Computer Interaction. I am also an Executive Officer for the Computer Science Undergraduate Society. If you have any questions (whether it be CPSC-related, events around the city, or an eclectic of random things), please do not hesitate to contact me.

I hope you find my notes useful, and if there are any errors please correct me by clicking "Edit" on the top of the page and making the appropriate changes. You should also "Watch this page" for the CPSC 441 page to check when I upload notes. Just click "Edit" and scroll down to check mark "Watch this page."

You are welcome to use my notes, just credit me when possible. If you have any suggestions or major concerns, please contact me at cmah[at]ucalgary[dot]ca. Thanks in advance for reading!

= Course Information =

Disclaimer: This is a page created by a student. Everything created and written by me is not associated with the Department of Computer Science or the University of Calgary. I am not being paid to do this nor am getting credit. I was a class scribe for CPSC 457 and enjoyed it so much that I wanted to continue it with a course related to it. I find writing Wiki notes for courses I am not familiar with are helpful for me, so I hope you find these notes helpful. I encourage that you still write your own notes, as it helps you retain information, but my notes are public and free to use!

This course is for W2014, taught by Dr. Carey Williamson.

The course website is here: http://pages.cpsc.ucalgary.ca/~carey/CPSC441

All tutorial slides are on the d2l website under "Tutorial Materials"

= Week 2 =

Tutorial 1: C Programming Review

 * Compiling C/C++
 * is a driver which calls the preprocessor, compiler (cc1 or cc1plus), assembler, and linker as needed


 * Compiler options
 * : specifies output file for object or executable
 * : shows all warnings (highly recommended)
 * : links the library  (e.g. -lsocket)
 * If you get errors saying the library cannot be found, make sure the path is correctly set and you have the libraries you need

Class
int main(int argc, char *argv[])


 * : number of arguments passed to the program
 * : array of strings showing command line arguments
 * Name of executable and space-separated arguments
 * Name of executable is stored in


 * Return value is type int
 * Convention: 0 means success, > 0 is some error

Primitive Data Types

 * char: character or smaller integer, 1 Byte
 * short int (short): short integer, >=2 Bytes
 * int: integer (most efficient), >=2, typically 4 Bytes
 * long int (long): long integer, >=4 Bytes
 * long long int: long long integer, >=8 Bytes
 * float: floating point number, 4 Bytes
 * double: double precision floating point number, 8 Bytes
 * long double: long double precision floating point number, >=8, typically 16 Bytes


 * Typecasting Example
 * Typecasting int to double (explicit)
 * Typecasting double to int (implicit)
 * Typecasting double to int (implicit)

Arrays

 * Array declaration (on the stack):
 * C/C++ arrays have no length attribute
 * Note: when passing an array to a function, typically have to pass the array size as a separate argument as well


 * Must take care of array bounds yourself, as you may not run into compiling errors but your program might behave unexpectedly or crash
 * Array's name is a pointer to its first element

Structures

 * C  is a way to logically group related types
 * Similar to (not same as) C++/Java classes
 * Somehow a class without methods
 * Members are always public (no encapsulation concept in C)


 * A  component can be of any type (including other   types), but cannot be recursive, unless it is a pointer to itself
 * In C, there are two different namespaces of types:
 * A namespace of / /  tag names
 * A namespace of  names
 * These can return something


 * In C++, all struct/union/enum/class declarations act like they are implicitly typedef'd, as long as the name is not hidden by another declaration with the same name
 * Example:

struct address { char* street; char* city; char* zip; }; typedef struct { char* name; unsigned int ID; struct address Address; } student_item; struct link_list { student_item student_info; struct link_list *next; }; typedef struct link_list student;

Pointers

 * A pointer is just an address to some memory location
 * Another variable
 * Some dynamically allocated memory
 * Some function
 * Even


 * Reference operator:
 * Dereference operator:
 * Examples:

int *p = &x; int x = 4;
 * &x(address of x), x contains 4

int *p = malloc(sizeof int);
 * Address of allocated memory -> ? (allocated memory)

Pointers in C

 * Declaration
 * Use  before variable name


 * Allocation
 * Allocate new memory to a pointer using  in C (  in C++)


 * Deallocation
 * Clear the allocated memory when you are using it, otherwise you have a memory leak


 * Dereferencing
 * Accessing data from the pointer


 * Referencing
 * Getting the memory location for the data

Strings

 * In C, a string is an array of char terminated with  (a null terminator)
 * Example: "hello" is hello\0

String Library

 * Functions:
 * Copies chars from  array into   array up to NULL
 * Copies chars; stops after  chars if no NULL before that; appends NULL
 * Returns number of chars, excluding NULL
 * Returns pointer to first occurrence of  in  ; NULL if none
 * Return pointer to first occurrence of  in
 * Returns number of chars, excluding NULL
 * Returns pointer to first occurrence of  in  ; NULL if none
 * Return pointer to first occurrence of  in
 * Returns pointer to first occurrence of  in  ; NULL if none
 * Return pointer to first occurrence of  in
 * Return pointer to first occurrence of  in

Formatting Strings

 * Parse the contents of  according to
 * Return the number of successful conversions
 * Formatting codes:
 * : char, matches single character
 * : int, matches an integer in decimal
 * : float, matches a real number
 * : char *, matches a string up to a white space
 * : char *, matches a string up to next c char
 * Values normally right-justified; use negative field width to get left-justified
 * : char, char in field of n spaces
 * : int, integer in field of n spaces
 * : float, double; real number in width n.m decimals
 * : char *, first m chars from string in width n
 * : writes a single % to the stream
 * : writes a single % to the stream


 * Produce a string formatted according to  directives and place this string into the
 * Produce a string formatted according to  directives and place this string into the

return number of successful conversions

Standard C Library

 * Formatted I/O
 * Read from standard input and store according to format
 * Write to standard output according to format
 * Read from standard input and store according to format
 * Write to standard output according to format
 * Write to standard output according to format


 * File I/O:
 * Open a file and return the file descriptor
 * Close the file; return 0 if successful, EOF if not
 * Close the file; return 0 if successful, EOF if not
 * Close the file; return 0 if successful, EOF if not


 * Other I/O operations:
 * Read the next character from ; returns EOF if none
 * Read the next line from a file into
 * Output the string to a file, stopping at ‘\0’
 * Returns number of characters written or EOF
 * Output the string to a file, stopping at ‘\0’
 * Returns number of characters written or EOF
 * Returns number of characters written or EOF

Back To Navigation

Internet Protocol Stack

 * Basic, low level layer: physical
 * Medium between laptops and people: air
 * Physical tool: vocal chords, ears
 * May be a middle man between the two machines


 * Highest layer: application
 * Virtual connection between machines
 * Nothing directly from your keyboard to Google's servers


 * Below that: transport layer
 * Envelope passed from browser to transport layer (handled by OS)
 * Virtual connection to another machine - send lots of envelopes in a large bag


 * Below that: network layer
 * Think of it like the main, large postal office
 * Speaks language that every computer in the world understands
 * IP protocol - common rules for different devices
 * Virtual connection between machines


 * Below that: link layer
 * Physical transmission, depends on the machine you have
 * Connects to physical layer

What is a Socket?

 * Interface between the application and the network (the lower levels of the protocol stack)
 * Application and transport layer
 * The application creates a socket
 * Asks the OS to creates socket to send/receive things
 * The socket type dictates the style of communication
 * Reliable (TCP) vs best effort (UDP)
 * OS makes sure things get sent
 * Connection-oriented vs connectionless


 * Once a socket is setup the application can:
 * Pass data to the socket for network transmission
 * Receive data from the socket (transmitted through the network, sent by some other host)

Most Popular Types of Sockets

 * 1st assignment: TCP, 2nd assignment: may use UDP
 * TCP Socket
 * Type:
 * Reliable delivery
 * In-order guaranteed
 * Connection-oriented
 * Bidirectional
 * When you send something, guaranteed it's delivered or error message saying it can't be sent
 * Packets are in same order, sent/receive something


 * UDP Socket
 * Type:
 * Unreliable delivery
 * No order guaranteed
 * No notion of 'connection' - app indicates destination for each packet
 * Can send or receive

Socket Creation in C

 * How to open a file in C?
 * Have a descriptor (integer), tells OS which file you're talking about


 * You need socket library
 * : socket descriptor, an integer (like a file-handle)
 * : integer, communication domain
 * IPv4, IPv6 - network layer, default protocols, difference is addressing space
 * Example:  (IPv4 protocol) - typically used
 * : communication type
 * UDP, TCP
 * : reliable, two-way, connection-based service
 * : unreliable, connectionless
 * Other values: need root permission, rarely used, or obsolete
 * : specifies protocol (see file  for a list of options) - usually set to 0
 * : specifies protocol (see file  for a list of options) - usually set to 0


 * call does not specify where data will be coming from, nor where it will be going to; it just creates the interface
 * Sets up interface, no 'virtual connection' established yet; just establishing the ones between layers of the same machine


 * Have not reserved a port yet (connection from transport -> application layer)

Ports

 * Each host machine has an IP address (or more)
 * Each host has 65 536 ports (2^16)
 * Some ports are reserved for specific apps
 * 20, 21: FTP
 * 23: Telnet
 * 80: HTTP
 * See RFC 1700 (about 2000 ports are reserved)


 * A socket provides an interface to send data to/from the network through a port
 * Can have multiple sockets opened in one program for OS

Addresses, Ports and Sockets

 * Like apartments and mailboxes
 * You are the application
 * Your apartment building address is the address
 * Your mailbox is the port
 * The post office is the network
 * The socket is the key that gives you access to the right mailbox (one difference: assume outgoing mail is placed by you in your mailbox)


 * How to choose which port a socket connects to?

The Function

 * function associates and (can exclusively) reserves a port for use by the socket
 * : error status, -1 if  failed
 * : error status, -1 if  failed


 * : integer the socket call returned, socket descriptor
 * :, the (IP) address and port of the machine (address usually set to   - chooses a local address
 * Particular structure that includes IP and port of machine that you're on
 * Want to own port number, anything sent to there is given to your machine
 * Need to tell OS on which IP and port to listen to because one machine can have multiple IPs depending on the physical interfaces
 * : include all IP addresses this machine is assigned to; send whatever the port is, send to me


 * : size (in bytes) of  structure
 * Need size and struct because & sends pointer only, size to deal with structure itself

On the Connecting End

 * When connecting to another host (i.e. connecting end is the client and the receiving end is the server), the OS automatically assigns a free port for the outgoing connection
 * During connection setup, receiving end is informed of port
 * Server listens for incoming requests (for someone to connect to server)


 * You can bind to a specific port if need be
 * From machine2, moves through layers so machine1 establishes virtual connection to machine2 (using )

Connection Setup

 * A connection occurs between two ends
 * Server: waits for an active participant to request connection
 * Client: initiates connection request to passive side


 * Once connection is established, server and client ends are 'similar'
 * Both can send and receive data
 * Either can terminate the connection

Server and Clients
http://i.imgur.com/xpmgP8t.png


 * connect calls go down to protocol stack
 * accept waits until something comes in and accepts; now connection established and both OS takes care of the delivery of packets
 * When syscall comes back, can start to read/write (send whatever you want) and now both sides are equal because both sides can read/write

Connection Setup Steps

 * Step 1: Server listens for incoming requests
 * Step 2: Client requests and establishes connection
 * Step 3: Server accepts a request
 * Step 4: Client and Server sends/receive
 * The accepted connection is on a new socket
 * The old socket continues to listen for other active participants
 * Server has a socket (bound to port)
 * Client, from its own OS gets a socket, wants to connect to l-sock
 * When l-sock receives client call, it creates a socket exclusive (but l-sock still listens)

Server Socket: Listen & Accept

 * Called on server side:


 * : 0 if listening, -1 if error
 * : integer, socket descripor
 * : integer, number of active participants that can 'wait' for a connection
 * Can access multiple calls, can define how many to wait in queue before serving one-by-one
 * Handles queue length number of trying to establish connection
 * Listen is non-blocking : returns immediately
 * Listen is non-blocking : returns immediately


 * : integer, the new socket (used for data-transfer)
 * : integer, the original socket (being listened on)
 * Pass in sock (listening), spawns out another descriptor that takes care of a specific connection
 * :, address of the active participant
 * : sizeof(addr) - value/result parameter
 * Must be set appropriately before call
 * Adjusted by OS upon return
 * Picks something from queue from listen and establish connection
 * Accept is blocking : waits for connection before returning
 * System call - tells OS "give me something from queue. If nothing, you're going to wait until something comes into the queue"
 * System call - tells OS "give me something from queue. If nothing, you're going to wait until something comes into the queue"

Connect

 * From client:
 * Don't need to pre-define port, connect - OS assigns you a random free port, and you don't even know it
 * Don't need it because you're not even listening, the OS is the one that needs about it; no need to bind (implicit)
 * When you send data to server, port number is sent - only focused on what connect does


 * : 0 if successful connect, -1 otherwise
 * : integer, socket to be used in connection
 * :  - address of server
 * : integer, sizeof(addr)
 * : integer, sizeof(addr)


 * connect is blocking

Sending/Receiving Data

 * Buffers: pointers, memory management
 * Get some memory from OS, pass on handle from memory with send and send on particular socket
 * If  is not valid, will get error message
 * : number of bytes transmitted (-1 if error)
 * : void*, buffer to be transmitted
 * : integer, length of buffer (in bytes) to transmit
 * : integer, special options, usually just 0
 * How to do sending - block, type of error message to receive
 * How to do sending - block, type of error message to receive


 * : number of bytes received (-1 if error)
 * : void*, stores received bytes
 * : integer, number of bytes received
 * : integer, special options, usually just 0
 * Want to receive something, need to get some portion of memory and pass function to OS to write whatever comes in in that bulk of memory
 * Tell how much space is available
 * Tell how much space is available


 * Typecasting - sending doesn't matter, just needs to know about bit sections (word you're saying)
 * Only needs

Close

 * Close socket to release port
 * If don't close, OS still thinks ports is used
 * Closes connection, frees up the port used by the socket


 * : 0 if successful, -1 if error
 * : the file descriptor (socket being closed)
 * : the file descriptor (socket being closed)

The

 * The struct to store the Internet address of a host:

struct sockaddr_in { short         sin_family; u_short       sin_port; struct in_addr sin_addr; char          sin_zero[8]; };


 * Specifies the address family
 * Example:
 * Example:


 * Specifies the port number (0-65535)
 * Specifies the port number (0-65535)


 * Specifies the IP address
 * Specifies the IP address


 * Unused!
 * Unused!

Example
struct sockaddr_in server;                 // definition memset(&server, 0, sizeof(server));        // init to 0 server.sin_family = AF_INET;               // address family server.sin_port = htons(MYPORTNUM);        // port server.sin_addr.s_addr = htonl(INADDR_ANY); // address


 * Host Byte-Ordering: the byte ordering used by a host (big-endian or little-endian)
 * Extra Notes


 * Network Byte-Ordering: the byte ordering used by the network – always big-endian


 * Any words sent through the network should be converted to Network Byte-Order prior to transmission (and back to Host Byte-Order once received)

Network Byte-Ordering

 * On big-endian machines, these routines do nothing
 * On little-endian machines, they reverse the byte order
 * Example:
 * 128, 119, 40, 12 (htonl) - Big-Endian machine
 * -> 12, 40, 119, 128 (ntohl) - Little-Endian machine
 * On little-endian machines, they reverse the byte order
 * Example:
 * 128, 119, 40, 12 (htonl) - Big-Endian machine
 * -> 12, 40, 119, 128 (ntohl) - Little-Endian machine

Tips

 * Sometimes, an ungraceful exit from a program (e.g. ctrl-c) does not properly free up a port
 * Eventually (after a few minutes), the port will be freed
 * You can kill the process, or to reduce the likelihood of this problem, include the following code:
 * In header include:
 * In socket code add:
 * In socket code add:
 * In socket code add:


 * Q: How to find the IP address of the machine my server program is running on?
 * Use 127.0.0.1 or localhost for accessing a server running on your local machine
 * For a remote server running Linux use the bash shell command:
 * For Windows, use cmd to invoke:

Back To Navigation

= Week 3 =

Demo:

 * Link
 * Get a socket handler
 * First: initialization, then parsing input arguments
 * If valid port not given, one will be generated within a range


 * Get a socket at line 47 -
 * Line 86:  is TCP - a stream, flow of data will continue vs UDP which sends only one packet with no two-sided connection
 * Always check if this is successful or not


 * Bind to a port
 * Get variable in line 50
 * Structure needed for binding purposes


 * Line 77:  has bulk of memory set to 0
 * Line 79: byte ordering between PCs and network machines are different
 * Big endian vs little endian: 8-bit to 16-bit to 32-bit machines (0->32)
 * How many bits you read depends on bus size
 * Example: bus size of 16 and want sto read 32-bit string, starts from left hand bulk then right
 * If you're a little endian machine and you're setting up address, ensure most and least significant bit are properly set up, so that when things travel between computers there's no confusion
 * Also do the same thing when setting up address - use any address whic hbelongs to this machine


 * Check if socket exists
 * Line 94: binds socket
 * Check if bind function is -1 (unsuccessful)


 * Line 100: listen for incoming connection requests
 * Give socket you prepared and listen to queue length 5 (FIFO)
 * While serving incoming connection, if 10 other incoming requests are lined up before accepted, there's not going to be room for them


 * Line 116: want to know who client is - passing along an address structure to accept function
 * Active socket - the socket accept returns, end point for pipe between me and client
 * Line 108: ensure you're successful
 * Want to port process so main process can still listen for other requests, and the second process serves/talk to the client; this way, one main process is listening in a loop. When something comes in, it spawns a new process to communicate to


 * Line 112: no need for listensocket because you're child process (fork)
 * Line 117: Want to figure out IP and port of client, get some memory
 * Want to look at IP before, the actual address, and the other IP (want to be populated)


 * Read - this server receives something and sends it back
 * Want to have some memory allocated
 * Want to read from active socket. If any bytes received, sends back

echoserverHTTP demo


 * In browser: copy IP of machine you're talking to (ex. CPSC server) and port
 * Echoes what browser is sending
 * keep-alive: by default, keeps connection alive unless one is closed actively or times out
 * Use same pipe from same server otherwise you need to set up another connection

Introduction to HTTP

 * HTTP: HyperText Transfer Protocol
 * Communication protocol between clients and servers
 * Application layer protocol for www


 * Client/Server model
 * Server owns information and gives whatever you need, client doesn't provide information unless for credentials or info on server
 * Client: browser that requests, receives, displays object
 * Server: receives requests and responds to them


 * Protocol consists of various operations
 * Few for HTTP 1.0 (RFC 1945, RFC 1996)
 * Many more in HTTP 1.1 (RFC 2616, RFC 1999)

HTTP Request Generation

 * Scheme is first thing - what kind of protocol? http/https/ftp
 * Next thing after  is the host you're trying to contact
 * Hostname is converted from a name to a 32-bit IP address (DNS lookup)
 * DNS lookup - servers that contain translation for all the things (register with some place that keeps track of all names)
 * Connection is established to server (TCP)


 * Usually if it's http, you're on port 80; can have servers that work on other ports
 * After, the path is meaningful to that particular server
 * Path from base folder ('home' folder) or looking up database and see what's the redirection

What Happens Next?

 * Client downloads HTML page
 * Sends back a file (or script), usually HTML file with some formatting
 * Typically in text format (ASCII)
 * Contains instructions for rendering (e.g. background color, frames)


 * Many have embedded objects
 * Images: GIF, JPG (logos, banner ads)
 * Embedded objects usually automatic as you can ensure you get page
 * Usually automatically retrieved
 * I.e. without user involvement
 * User may be able to change setting for automatic retrievals

Web Server Role

 * Respond to client requests, typically a browser
 * Could be a proxy sending request on behalf of browser, it forwards user requests
 * Could be a search engine spider or robot (e.g. google)


 * May have work to do on client's behalf
 * Is the client's cached copy still good?
 * Is client authorized to get this document?


 * Hundreds or thousands of simultaneous clients
 * Hard to predict how many will show up on some day (e.g. 'flash crowds', diurnal cycle, global presence)
 * Many requests are in progress concurrently

HTTP Request Types

 * Text-based thing
 * First words of request to shape structure
 * These are called methods
 * : retrieves a file (95% of requests)
 * Method sent from browser to web server
 * Some file or resource to be retrieved


 * : gets meta-data (e.g. modified time)
 * Asks the server to return the response headers only


 * : submitting a form/file to a server
 * : stores enclosed document as URI
 * removed name resource
 * : in 1.0, gone in 1.1
 * : http 'echo' for debugging (added in 1.1)
 * : used by proxies for tunnelling (1.1)
 * : request for server/proxy options (1.1)

HTTP Request Format

 * Format: key word, value
 * Finish with carriage return and line-feed - server figures out where header ended
 * Messages in ASCII (human-readable)
 * Headers may communicate private information (browser, OS, cookie information, etc.)

Response Format

 * When server generates - header browser isn't going to show
 * Code that tells browser what kind of response (successful, error, redirection) it is
 * Only thing important is first line (e.g. )

HTTP Response Types

 * 1XX: Informational
 * 100 Continue, 101 Switching Protocols


 * 2XX: Success
 * 200 OK, 206 Partial Content


 * 3XX: Redirection
 * 301 Moved Permanently, 304 Not Modified


 * 4XX: Client error
 * 400 Bad Request, 403 Forbidden, 404 Not Found


 * 5XX: Server error
 * 500 Internal Server Error, 503 Services Unavailable, 505 HTTP Version Not Supported

HTTP Server in a Nutshell
Initialize; Setup the listening socket; forever do { get request; process; send response; log request (optional); }


 * Get socket, bind
 * Forever loop
 * Accept incoming thing, get request; process what needs to be done; send response and maybe log (print) something

Initializing a Server
s = socket; /* allocate listen socket */ bind(s, 80); /* bind to TCP port 80 */ listen(s); /* indicate willingness to accept */ while (1) { newconn = accept(s);/* accept new connection */ ... }


 * First allocate a socket and bind it to address
 * HTTP requests are usually sent on TCP to port 80
 * Other services use different ports (e.g. SSL is on 443)


 * Call listen on the socket to indicate willingness to receive requests
 * Call accept to wait for a request to come in (blocking)
 * When the accept returns, we have a new socket which represented a new connection to a client

Processing a Request
remoteIP = getsockname(newconn); remoteHost = gethostbyname(remoteIP); gettimeofday(currentTime); read(newconn, reqBuffer, sizeof(reqBuffer)); reqInfo = serverParse(reqBuffer);


 * read is called on a new socket to retrieve request
 * Type of request is determined by parsing the read data
 * For logging purposes (optional, but done by most):
 * Figure out the remote host name
 * Figure out the name of other end
 * Get time of request


 * Optional: who is calling (address, IP), read what client was sending
 * Return a file, figure if the file is there, if it's accessible and read it, send it to client

fileName = parseOutFileName(requestBuffer); fileAttr = stat(fileName); serverCheckFileStuff(fileName, fileAttr); open(fileName);
 * Assuming the request is for a file (e.g. penguin.gif)
 * Test file path and meta-data
 * See if file exists/is accessible
 * Check permissions
 * Check meta-data: e.g. size of file, last modified time


 * Assuming all is OK: open the file

Responding to a Request
read(fileName, fileBuffer); headerBuffer = serverFigureHeaders(fileName, reqInfo); write(newSock, headerBuffer); write(newSock, fileBuffer); close(newSock); close(fileName); write(logFile, requestInfo);


 * Read the file into user space
 * Send HTTP headers on socket
 * Write the file on the socket
 * If connection is not persistent, close the socket
 * Close the open file descriptor
 * Write on the log file

Proxy Server

 * Two faces: one talks to client, another to server
 * Client: plays role of a server
 * Server: plays role of a client


 * For each HTTP request from the browser:
 * (1) Accepts HTTP requests from the browser
 * (2) Gets the data from the target web server (or from its cache), modifies the response if need be
 * (3) Sends HTTP respond with the data


 * Must handle concurrent browser requests
 * May be designed for different purposes

Back to Navigation

What is HTTP?

 * HTTP stands for Hypertext Transfer Protocol
 * Used to deliver virtually all files and other data (collectively called resources) on the World Wide Web
 * Usually, HTTP takes place through TCP/IP sockets


 * A browser is an HTTP client
 * It sends requests to an HTTP server (Web server)
 * The standard/default port for HTTP servers to listen on is 80


 * A resource is some chunk of data that is referred to by a URL
 * The most common kind of resource is a file
 * A resource may also be a dynamically-generated content, e.g. query result, CGI script output, etc.

Structure of HTTP TRansactions

 * Start with method name (GET, etc.)
 * HTTP uses the client-server model
 * An HTTP client opens a connection and sends a request message to an HTTP server
 * The server then returns a response message, usually containing the resource that was requested
 * After delivering the response, the server closes the connection (except for persistent connections)


 * Format of the HTTP request and response messages
 * Almost the same, human readable (English-oriented)
 * An initial line specifying the method
 * Zero or more header lines
 * A blank line (i.e. a CRLF by itself)
 * An optional message body (e.g. a file, or query data, or query output

Generic HTTP Header Format
 Header1: value1 Header2: value2 Header3: value3 

Initial Request Line

 * A request line has three parts, separated by spaces
 * A method name,
 * The local path of the requested resource,
 * And the version of HTTP being used
 * Example:


 * : most common HTTP request. Says "give me this resource"
 * Method names are always uppercase
 * The path is the part of the URL after the host name, also called the request URI
 * URI is like a URL, but more general


 * The HTTP version always takes the form, uppercase

Initial Response Line

 * Status line
 * The HTTP version
 * The response status code: result of the request
 * A reason phrase describing the status code


 * Response categories and most common status codes
 * 1xx: an informational message
 * 2xx: success of some kind
 * 200 OK: the request succeeded, and the resulting resource is returned in the message body
 * 3xx: redirections
 * 301 Moved Permanently
 * 302 Moved Temporarily
 * 303 See Other (HTTP 1.1 only): the resource has moved to another URL
 * 4xx: an error on the client's part
 * 404 Not Found
 * 5xx: an error on the server's part

The Message Body

 * Reading of a string, know when to stop reading/expect how many come
 * Need content length and value
 * After headers, there may be a body of data
 * In a response, may be:
 * Requested resource
 * Explanatory text if there's an error


 * In a request this may be:
 * The user-entered data
 * Uploaded files


 * If an HTTP message includes a body, there are usually header lines in the message that describes the body
 * Content-Type : the MIME-type of the data (e.g. text/html or image/gif)
 * Content-Length : the number of bytes in the body

Sample HTTP Exchange
GET /path/f.htm HTTP/1.1 Host: www.host1.com:80 User-Agent: HTTPTool/1.0 [blank line here] HTTP/1.1 200 OK Date: Fri, 31 Dec 1999 23:59:59 GMT Content-Type: text/html Content-Length: 1354 Happy New Millennium! (more file contents) ...
 * HTTP Request
 * HTTP Response

The HEAD Method

 * Meta data of page
 * Need for caching
 * HEAD is like a GET request, except:
 * It asks the server to return the response headers only, not the actual resource (i.e. no message body)
 * Used to check characteristics of a resource without actually downloading it
 * HEAD is used when you don't actually need a file's contents


 * The response to a HEAD request must never contain a message body, just the status line and headers

The POST Method

 * POST is used to send data to the server
 * POST is different from a GET request:
 * Data is sent with the request, in the message body
 * There are usually extra headers to describe this message body (e.g. Content-Type and Content-Length)
 * The request URI is not a resource to retrieve; it's usually a program to handle the data you're sending
 * The HTTP response is normally program output, not a static file


 * Example: submitting HTML form data to CGI scripts
 * Content-Type: header is usually application/x-www-form-urlencoded
 * Content-Length: header gives the length of the URL-encoded form data


 * Blank user password - transparent (don't see it but it's there)
 * Accounts get hacked easily

POST Method Example
POST /login.jsp HTTP/1.1 Host: www.mysite.com User-Agent: Mozilla/4.0 Content-Length: 27 Content-Type: application/x-www-form-urlencoded userid=me&password=guessme
 * Can use POST request to send whatever data you want, not just form submissions. Just make sure the sender and the receiving program agree on the format
 * GET method can also be used to submit forms. The form data is URL-encoded and appended to the request URI

Persistent Connections

 * Persistent HTTP connection:
 * Used to increase performance, some servers allow persistent HTTP connections
 * The server does not immediately close the connection after sending the response
 * The response should be sent back in the same order as requests
 * The  header in a request indicates the final request for the connection
 * The server should close the connection after sending the response. Also, the server should close an idle connection after some timeout period

Caching

 * Locally figure out what is up to date, or send a HEAD request and server will tell us if there are any changes
 * Saves bandwidth and improves efficiency
 * Proxy or web browser avoids transferring resources for which a local up-to-date copy exists
 * A copy of the previous content is saved in the cache
 * Upon a new request, first the cache is searched
 * If found in cache, return the content from cache
 * If not in cache, send request to the server


 * But what if the content is out of date?
 * We need to check if the content is modified since last access

The Date: Header

 * We need time-stamped responses for caching
 * Servers must timestamp every response with a  header containing the current time e.g.
 * All responses except those with 100-level status (but including error responses) must clude the  header
 * All time values in HTTP use Greenwich Mean Time

Conditional Get Example
GET /sample.html HTTP/1.1 Host: example.com If-Modified-Since: Wed, 01 Sep 2004 13:24:52 GMT If-None-Match: “4135cda4″ HTTP/1.1 304 Not Modified Expires: Tue, 27 Dec 2005 11:25:19 GMT Date: Tue, 27 Dec 2005 05:25:19 GMT Server: Apache/1.3.33 (Unix) PHP/4.3.10
 * REQUEST
 * RESPONSE

Redirection Example
GET /~carey/index.html HTTP/1.1 Host: www.cpsc.ucalgary.ca Connection: keep-alive User-Agent: Mozilla/5.0 […] Accept: text/html,application/ […] Accept-Encoding: gzip,deflate,sdch […] \r\n HTTP/1.1 302 Found Date: Sat, 21 Jan 2012 01:10:43 GMT Server: Apache/2.2.4 (Unix) mod_ssl/2.2.4OpenSSL/0.9.7a PHP/5.2.9 mod_jk/1.2.25 Location: http://pages.cpsc.ucalgary.ca/~carey/index.html \r\n
 * REQUEST 1
 * RESPONSE 1

GET /~carey/index.html HTTP/1.1 Host: pages.cpsc.ucalgary.ca Connection: keep-alive User-Agent: Mozilla/5.0 […] Accept: text/html,application/ […] Accept-Encoding:gzip,deflate […] \r\n HTTP/1.1 200 OK Date: Sat, 21 Jan 2012 01:11:49 GMT Server: Apache/2.2.4 (Unix) […] Last-Modified: Mon, 16 Jan 2012 05:40:45 GMT Content-Length: 3157 Keep-Alive: timeout=5 Connection: Keep-Alive Content-Type: text/html \r\n <!DOCTYPE HTML PUBLIC "- //W3C//DTD HTML 4.0 Transitional//EN"> […] \r\n
 * REQUEST 2
 * RESPONSE 2

HTTP 1.0 vs HTTP 1.1

 * Host header
 * HTTP 1.1 has a required host header


 * Persistent connections
 * By default, connections are assumed to be kept open after the transmission of a request and its response
 * The protocol permits closing of connections at any point
 * Connection: close header is used to inform the recipient that the connection will not be reused


 * Pipelining
 * A client need not wait to receive the response for one request before sending another request on the same connection


 * Strong cache support
 * Chunked transfer-encoding
 * OPTIONS method
 * A way for a client to learn about the capabilities of a server without actually requesting a resource

Assignment Tips

 * In our proxy:
 * (1) Pass along the 300-level message
 * GET request - does not ensure contents match as it's a redirection
 * Handle it locally in proxy and make a new connection on behalf of browser and get content from target server, or let browser take care of itself (send message to browser)


 * If you want to do extra:
 * Read about SELECT - set up sockets, give sockets the function and have one process and one loop; SELECT tells you if anything interesting has happened in the meantime while other stuff happens (saves resources)

// make sure you have read enough // handle GET request // figure out he host // connect to the host // get response from host // do modification // send back response

Back to Navigation

= Week 4 =

= Week 6 =

= Week 8 =

= Week 9 =

= Week 10 =

= Week 11 =

= Week 12 =

= Week 13 =

= Week 14 =