Navigation: Tutorials

Assignment 1	Assignment 2	Important Information
Tut 1	Tut 6	Tut 8
Tut 2	Tut 7	Tut 9
Tut 3		Tut 10
Tut 4		Tut 11
Tut 5		Tut 12

Introduction

Hello, my name is Carrie Mah and I am currently in my 3rd year of Computer Science with a concentration in Human Computer Interaction. I am also an Executive Officer for the Computer Science Undergraduate Society. If you have any questions (whether it be CPSC-related, events around the city, or an eclectic of random things), please do not hesitate to contact me.

I hope you find my notes useful, and if there are any errors please correct me by clicking "Edit" on the top of the page and making the appropriate changes. You should also "Watch this page" for the CPSC 441 page to check when I upload notes. Just click "Edit" and scroll down to check mark "Watch this page."

You are welcome to use my notes, just credit me when possible. If you have any suggestions or major concerns, please contact me at cmah[at]ucalgary[dot]ca. Thanks in advance for reading!

Course Information

Disclaimer: This is a page created by a student. Everything created and written by me is not associated with the Department of Computer Science or the University of Calgary. I am not being paid to do this nor am getting credit. I was a class scribe for CPSC 457 and enjoyed it so much that I wanted to continue it with a course related to it. I find writing Wiki notes for courses I am not familiar with are helpful for me, so I hope you find these notes helpful. I encourage that you still write your own notes, as it helps you retain information, but my notes are public and free to use!

This course is for W2014, taught by Dr. Carey Williamson.

The course website is here: http://pages.cpsc.ucalgary.ca/~carey/CPSC441

All tutorial slides are on the d2l website under "Tutorial Materials"

Assignment 1: Web Proxy

Tutorial 1: C Programming Review

C versus Java
C Program	Java Program
Collection of functions One function `main()` is called by the operating system as the starting function Compile output: executable file Running the executable (default name `a.out`) starts main function Typically, single program with all user code linked in - but can be dynamic libraries (.dll, .so) Note: C++ is C extended with object oriented functionality (and more)	Collection of classes Class containing `main` method is the starting class Compile output: jar file Running `java StartClass` invokes `StartClass.main` method JVM loads other classes as required

Compiling C/C++

gcc is a driver which calls the preprocessor, compiler (cc1 or cc1plus), assembler, and linker as needed
gcc hello.c, a.out
gcc hello.c -o hello, ./hello

Compiler options

[-o <file>]: specifies output file for object or executable
[-Wall]: shows all warnings (highly recommended)
[-l libnam]: links the library libnam (e.g. -lsocket)

If you get errors saying the library cannot be found, make sure the path is correctly set and you have the libraries you need

`Main` Class

int main(int argc, char *argv[])

argc: number of arguments passed to the program
argv: array of strings showing command line arguments

Name of executable and space-separated arguments
Name of executable is stored in argv[0]

Return value is type int

Convention: 0 means success, > 0 is some error

Primitive Data Types

char: character or smaller integer, 1 Byte
short int (short): short integer, >=2 Bytes
int: integer (most efficient), >=2, typically 4 Bytes
long int (long): long integer, >=4 Bytes
long long int: long long integer, >=8 Bytes
float: floating point number, 4 Bytes
double: double precision floating point number, 8 Bytes

long double: long double precision floating point number, >=8, typically 16 Bytes

Typecasting Example

Typecasting int to double (explicit)

f = (double) i;

Typecasting double to int (implicit)

unsigned long int i; i = 1.2;

Arrays

Array declaration (on the stack): int a[];
C/C++ arrays have no length attribute

Note: when passing an array to a function, typically have to pass the array size as a separate argument as well

Must take care of array bounds yourself, as you may not run into compiling errors but your program might behave unexpectedly or crash
Array's name is a pointer to its first element

Structures

C struct is a way to logically group related types

Similar to (not same as) C++/Java classes
Somehow a class without methods
Members are always public (no encapsulation concept in C)

A struct component can be of any type (including other struct types), but cannot be recursive, unless it is a pointer to itself
In C, there are two different namespaces of types:

A namespace of struct/union/enum tag names
A namespace of typedef names

These can return something

In C++, all struct/union/enum/class declarations act like they are implicitly typedef'd, as long as the name is not hidden by another declaration with the same name
Example:

struct address {
   char* street;
   char* city;
   char* zip;
};

typedef struct {
   char* name;
   unsigned int ID;
   struct address Address;
} student_item;

struct link_list {
   student_item student_info;
   struct link_list *next;
};
typedef struct link_list student;

Pointers

A pointer is just an address to some memory location

Another variable
Some dynamically allocated memory
Some function
Even NULL

Reference operator: &
Dereference operator: *
Examples:

int *p = &x;
int x = 4;

&x(address of x), x contains 4

int *p = malloc(sizeof int);

Address of allocated memory -> ? (allocated memory)

Pointers in C

Declaration

Use * before variable name

int * ptr = NULL; // creates pointer for integer

Allocation

Allocate new memory to a pointer using malloc in C (new in C++)

int *p = malloc(sizeof(int));

int *p = (int *) malloc(10 * sizeof(int)); // array of int

Deallocation

Clear the allocated memory when you are using it, otherwise you have a memory leak

free(p);

Dereferencing

Accessing data from the pointer

x = *p;

Referencing

Getting the memory location for the data

p = &x;

Strings

In C, a string is an array of char terminated with \0 (a null terminator)

Example: "hello" is hello\0

String Library

#include <string.h>
Functions:

char *strcpy(char *dest, char *source)

Copies chars from source array into dest array up to NULL

char *strncpy(char *dest, char *source, int num)

Copies chars; stops after num chars if no NULL before that; appends NULL

int strlen(const char *source)

Returns number of chars, excluding NULL

char *strchr(const char *source, const char ch)

Returns pointer to first occurrence of ch in source; NULL if none

char *strstr(const char *source, const char *search)
Return pointer to first occurrence of search in source

Formatting Strings

int sscanf(char *string, char *format, ...)

Parse the contents of string according to format
Return the number of successful conversions
Formatting codes:

%c: char, matches single character
%d: int, matches an integer in decimal
%f: float, matches a real number
%s: char *, matches a string up to a white space
%[^c]: char *, matches a string up to next c char

Values normally right-justified; use negative field width to get left-justified

%nc: char, char in field of n spaces
%nd: int, integer in field of n spaces
%n.mf: float, double; real number in width n.m decimals
%n.ms: char *, first m chars from string in width n
%%: writes a single % to the stream

int sprintf(char *buffer, char *format, ...)

Produce a string formatted according to format directives and place this string into the buffer

return number of successful conversions

Standard C Library

#include <stdio.h>
Formatted I/O

int scanf(const char *format, ...)

Read from standard input and store according to format

int printf(const char *format, ...)

Write to standard output according to format

File I/O: FILE *

FILE *fopen(const char *path, const char *mode)

Open a file and return the file descriptor

int fclose(FILE *stream)

Close the file; return 0 if successful, EOF if not

Other I/O operations:

int getchar()

Read the next character from stdin; returns EOF if none

char *fgets(char *buf, int size, FILE *in)

Read the next line from a file into buf

int fputs(const char *str, FILE *out)

Output the string to a file, stopping at ‘\0’
Returns number of characters written or EOF

Back To Navigation

Tutorial 2: Socket Programming

Internet Protocol Stack

Basic, low level layer: physical

Medium between laptops and people: air
Physical tool: vocal chords, ears
May be a middle man between the two machines

Highest layer: application

Virtual connection between machines
Nothing directly from your keyboard to Google's servers

Below that: transport layer

Envelope passed from browser to transport layer (handled by OS)
Virtual connection to another machine - send lots of envelopes in a large bag

Below that: network layer

Think of it like the main, large postal office
Speaks language that every computer in the world understands
IP protocol - common rules for different devices
Virtual connection between machines

Below that: link layer

Physical transmission, depends on the machine you have
Connects to physical layer

What is a Socket?

Interface between the application and the network (the lower levels of the protocol stack)

Application and transport layer
The application creates a socket

Asks the OS to creates socket to send/receive things

The socket type dictates the style of communication

Reliable (TCP) vs best effort (UDP)

OS makes sure things get sent

Connection-oriented vs connectionless

Once a socket is setup the application can:

Pass data to the socket for network transmission
Receive data from the socket (transmitted through the network, sent by some other host)

Most Popular Types of Sockets

1st assignment: TCP, 2nd assignment: may use UDP
TCP Socket

Type: SOCK_STREAM
Reliable delivery
In-order guaranteed
Connection-oriented
Bidirectional
When you send something, guaranteed it's delivered or error message saying it can't be sent
Packets are in same order, sent/receive something

UDP Socket

Type: SOCK_DGRAM
Unreliable delivery
No order guaranteed
No notion of 'connection' - app indicates destination for each packet
Can send or receive

Socket Creation in C

How to open a file in C?

Have a descriptor (integer), tells OS which file you're talking about

int s = socket(domain, type, protocol);

You need socket library
s: socket descriptor, an integer (like a file-handle)
domain: integer, communication domain

IPv4, IPv6 - network layer, default protocols, difference is addressing space
Example: PF_INET (IPv4 protocol) - typically used

type: communication type

UDP, TCP
SOCK_STREAM: reliable, two-way, connection-based service
SOCK_DGRAM: unreliable, connectionless
Other values: need root permission, rarely used, or obsolete

protocol: specifies protocol (see file /etc/protocols for a list of options) - usually set to 0

socket call does not specify where data will be coming from, nor where it will be going to; it just creates the interface

Sets up interface, no 'virtual connection' established yet; just establishing the ones between layers of the same machine

Have not reserved a port yet (connection from transport -> application layer)

Ports

Each host machine has an IP address (or more)
Each host has 65 536 ports (2^16)
Some ports are reserved for specific apps

20, 21: FTP
23: Telnet
80: HTTP
See RFC 1700 (about 2000 ports are reserved)

A socket provides an interface to send data to/from the network through a port

Can have multiple sockets opened in one program for OS

Addresses, Ports and Sockets

Like apartments and mailboxes

You are the application
Your apartment building address is the address
Your mailbox is the port
The post office is the network
The socket is the key that gives you access to the right mailbox (one difference: assume outgoing mail is placed by you in your mailbox)

How to choose which port a socket connects to?

The `Bind` Function

Bind function associates and (can exclusively) reserves a port for use by the socket
int status = bind(sockid, &addrport, size);

status: error status, -1 if bind failed

sockid: integer the socket call returned, socket descriptor
addrport: struct sockaddr, the (IP) address and port of the machine (address usually set to INADDR_ANY - chooses a local address

Particular structure that includes IP and port of machine that you're on
Want to own port number, anything sent to there is given to your machine
Need to tell OS on which IP and port to listen to because one machine can have multiple IPs depending on the physical interfaces
INADDR_ANY: include all IP addresses this machine is assigned to; send whatever the port is, send to me

size: size (in bytes) of addrport structure

Need size and struct because & sends pointer only, size to deal with structure itself

On the Connecting End

When connecting to another host (i.e. connecting end is the client and the receiving end is the server), the OS automatically assigns a free port for the outgoing connection
During connection setup, receiving end is informed of port

Server listens for incoming requests (for someone to connect to server)

You can bind to a specific port if need be
From machine2, moves through layers so machine1 establishes virtual connection to machine2 (using connect)

Connection Setup

A connection occurs between two ends

Server: waits for an active participant to request connection
Client: initiates connection request to passive side

Once connection is established, server and client ends are 'similar'

Both can send and receive data
Either can terminate the connection

Server and Clients

connect() calls go down to protocol stack
accept() waits until something comes in and accepts; now connection established and both OS takes care of the delivery of packets

When syscall comes back, can start to read/write (send whatever you want) and now both sides are equal because both sides can read/write

Connection Setup Steps

Step 1: Server listens for incoming requests
Step 2: Client requests and establishes connection
Step 3: Server accepts a request
Step 4: Client and Server sends/receive
The accepted connection is on a new socket
The old socket continues to listen for other active participants
Server has a socket (bound to port)

Client, from its own OS gets a socket, wants to connect to l-sock
When l-sock receives client call, it creates a socket exclusive (but l-sock still listens)

Server Socket: Listen & Accept

Called on server side:

int status = listen(sock, queuelen);

status: 0 if listening, -1 if error
sock: integer, socket descripor
queuelen: integer, number of active participants that can 'wait' for a connection

Can access multiple calls, can define how many to wait in queue before serving one-by-one
Handles queue length number of trying to establish connection

Listen is non-blocking: returns immediately

int s = accept(sock, &addr, &addrlen);

s: integer, the new socket (used for data-transfer)
sock: integer, the original socket (being listened on)

Pass in sock (listening), spawns out another descriptor that takes care of a specific connection

addr: struct sockaddr, address of the active participant
addrlen: sizeof(addr) - value/result parameter
Must be set appropriately before call
Adjusted by OS upon return
Picks something from queue from listen and establish connection
Accept is blocking: waits for connection before returning

System call - tells OS "give me something from queue. If nothing, you're going to wait until something comes into the queue"

Connect

From client:
Don't need to pre-define port, connect() - OS assigns you a random free port, and you don't even know it

Don't need it because you're not even listening, the OS is the one that needs about it; no need to bind (implicit)
When you send data to server, port number is sent - only focused on what connect does

int status = connect(sock, &addr, addrlen);

status: 0 if successful connect, -1 otherwise
sock: integer, socket to be used in connection
addr: struct sockaddr - address of server
addrlen: integer, sizeof(addr)

connect is blocking

Sending/Receiving Data

Buffers: pointers, memory management
Get some memory from OS, pass on handle from memory with send and send on particular socket
If sock is not valid, will get error message
int count = send(sock, &buf, len, flags);

count: number of bytes transmitted (-1 if error)
buf: void*, buffer to be transmitted
len: integer, length of buffer (in bytes) to transmit
flags: integer, special options, usually just 0

How to do sending - block, type of error message to receive

int count = recv(sock, &buf, len, flags);

count: number of bytes received (-1 if error)
buf: void*, stores received bytes
len: integer, number of bytes received
flags: integer, special options, usually just 0
Want to receive something, need to get some portion of memory and pass function to OS to write whatever comes in in that bulk of memory
Tell how much space is available

Typecasting - sending doesn't matter, just needs to know about bit sections (word you're saying)

Only needs len

Close

Close socket to release port

If don't close, OS still thinks ports is used
Closes connection, frees up the port used by the socket

status = close(s);

status: 0 if successful, -1 if error
s: the file descriptor (socket being closed)

The `struct sockaddr`

The struct to store the Internet address of a host:

struct sockaddr_in {
   short          sin_family;
   u_short        sin_port;
   struct in_addr sin_addr;
   char           sin_zero[8];
};

sin_family

Specifies the address family
Example: AF_INET

sin_port

Specifies the port number (0-65535)

sin_addr

Specifies the IP address

sin_zero

Unused!

Example

struct sockaddr_in server;                  // definition
memset(&server, 0, sizeof(server));         // init to 0
server.sin_family = AF_INET;                // address family
server.sin_port = htons(MYPORTNUM);         // port
server.sin_addr.s_addr = htonl(INADDR_ANY); // address

Host Byte-Ordering: the byte ordering used by a host (big-endian or little-endian)

Extra Notes

Network Byte-Ordering: the byte ordering used by the network – always big-endian

Any words sent through the network should be converted to Network Byte-Order prior to transmission (and back to Host Byte-Order once received)

Network Byte-Ordering

u_long htonl(u_long x);
u_short htons(u_short x);
u_long ntohl(u_long x);
u_short ntohs(u_short x);
On big-endian machines, these routines do nothing
On little-endian machines, they reverse the byte order
Example:

128, 119, 40, 12 (htonl) - Big-Endian machine

-> 12, 40, 119, 128 (ntohl) - Little-Endian machine

Tips

Sometimes, an ungraceful exit from a program (e.g. ctrl-c) does not properly free up a port
Eventually (after a few minutes), the port will be freed
You can kill the process, or to reduce the likelihood of this problem, include the following code:

In header include:

#include <signal.h>

void cleanExit(){exit(0);}

In socket code add:

signal(SIGTERM, cleanExit);

signal(SIGINT, cleanExit);

Q: How to find the IP address of the machine my server program is running on?

Use 127.0.0.1 or localhost for accessing a server running on your local machine
For a remote server running Linux use the bash shell command: ifconfig
For Windows, use cmd to invoke: ipconfig

Back To Navigation

Tutorial 3: HTTP Servers and Proxies

Demo: `echoserver`

Link

Get a socket handler

First: initialization, then parsing input arguments

If valid port not given, one will be generated within a range

Get a socket at line 47 - listensocket

Line 86: SOCK_STREAM is TCP - a stream, flow of data will continue vs UDP which sends only one packet with no two-sided connection

Always check if this is successful or not

Bind to a port

Get variable in line 50
Structure needed for binding purposes

Line 77: memset has bulk of memory set to 0
Line 79: byte ordering between PCs and network machines are different

Big endian vs little endian: 8-bit to 16-bit to 32-bit machines (0->32)

How many bits you read depends on bus size
Example: bus size of 16 and want sto read 32-bit string, starts from left hand bulk then right

If you're a little endian machine and you're setting up address, ensure most and least significant bit are properly set up, so that when things travel between computers there's no confusion
Also do the same thing when setting up address - use any address whic hbelongs to this machine

Check if socket exists
Line 94: binds socket

Check if bind function is -1 (unsuccessful)

Line 100: listen for incoming connection requests

Give socket you prepared and listen to queue length 5 (FIFO)

While serving incoming connection, if 10 other incoming requests are lined up before accepted, there's not going to be room for them

Line 116: want to know who client is - passing along an address structure to accept function

Active socket - the socket accept returns, end point for pipe between me and client

Line 108: ensure you're successful

Want to port process so main process can still listen for other requests, and the second process serves/talk to the client; this way, one main process is listening in a loop. When something comes in, it spawns a new process to communicate to

Line 112: no need for listensocket because you're child process (fork)

Line 117: Want to figure out IP and port of client, get some memory
Want to look at IP before, the actual address, and the other IP (want to be populated)

Read - this server receives something and sends it back
Want to have some memory allocated
Want to read from active socket. If any bytes received, sends back

echoserverHTTP demo

In browser: copy IP of machine you're talking to (ex. CPSC server) and port
Echoes what browser is sending
keep-alive: by default, keeps connection alive unless one is closed actively or times out

Use same pipe from same server otherwise you need to set up another connection

Introduction to HTTP

HTTP: HyperText Transfer Protocol

Communication protocol between clients and servers
Application layer protocol for www

Client/Server model

Server owns information and gives whatever you need, client doesn't provide information unless for credentials or info on server
Client: browser that requests, receives, displays object
Server: receives requests and responds to them

Protocol consists of various operations

Few for HTTP 1.0 (RFC 1945, RFC 1996)
Many more in HTTP 1.1 (RFC 2616, RFC 1999)

HTTP Request Generation

Scheme is first thing - what kind of protocol? http/https/ftp
Next thing after :// is the host you're trying to contact

Hostname is converted from a name to a 32-bit IP address (DNS lookup)

DNS lookup - servers that contain translation for all the things (register with some place that keeps track of all names)

Connection is established to server (TCP)

Usually if it's http, you're on port 80; can have servers that work on other ports
After /, the path is meaningful to that particular server

Path from base folder ('home' folder) or looking up database and see what's the redirection

What Happens Next?

Client downloads HTML page

Sends back a file (or script), usually HTML file with some formatting
Typically in text format (ASCII)
Contains instructions for rendering (e.g. background color, frames)

Many have embedded objects

Images: GIF, JPG (logos, banner ads)
Embedded objects usually automatic as you can ensure you get page
Usually automatically retrieved

I.e. without user involvement
User may be able to change setting for automatic retrievals

Web Server Role

Respond to client requests, typically a browser

Could be a proxy sending request on behalf of browser, it forwards user requests
Could be a search engine spider or robot (e.g. google)

May have work to do on client's behalf

Is the client's cached copy still good?
Is client authorized to get this document?

Hundreds or thousands of simultaneous clients

Hard to predict how many will show up on some day (e.g. 'flash crowds', diurnal cycle, global presence)
Many requests are in progress concurrently

HTTP Request Types

Text-based thing
First words of request to shape structure
These are called methods
GET: retrieves a file (95% of requests)

Method sent from browser to web server
Some file or resource to be retrieved

HEAD: gets meta-data (e.g. modified time)

Asks the server to return the response headers only

POST: submitting a form/file to a server
PUT: stores enclosed document as URI
DELETE removed name resource
LINK/UNLINK: in 1.0, gone in 1.1
TRACE: http 'echo' for debugging (added in 1.1)
CONNECT: used by proxies for tunnelling (1.1)
OPTIONS: request for server/proxy options (1.1)

HTTP Request Format

Format: key word, value
Finish with carriage return and line-feed - server figures out where header ended
Messages in ASCII (human-readable)
Headers may communicate private information (browser, OS, cookie information, etc.)

Response Format

When server generates - header browser isn't going to show
Code that tells browser what kind of response (successful, error, redirection) it is
Only thing important is first line (e.g. HTTP/1.0 200 OK)

HTTP Response Types

1XX: Informational

100 Continue, 101 Switching Protocols

2XX: Success

200 OK, 206 Partial Content

3XX: Redirection

301 Moved Permanently, 304 Not Modified

4XX: Client error

400 Bad Request, 403 Forbidden, 404 Not Found

5XX: Server error

500 Internal Server Error, 503 Services Unavailable, 505 HTTP Version Not Supported

HTTP Server in a Nutshell

Initialize;
Setup the listening socket;
forever do {
  get request;
  process;
  send response;
  log request (optional);
}

Get socket, bind
Forever loop

Accept incoming thing, get request; process what needs to be done; send response and maybe log (print) something

Initializing a Server

s = socket(); /* allocate listen socket */ 
bind(s, 80); /* bind to TCP port 80 */ 
listen(s); /* indicate willingness to accept */ 
while (1) { 
  newconn = accept(s);/* accept new connection */
  ...
}

First allocate a socket and bind() it to address

HTTP requests are usually sent on TCP to port 80
Other services use different ports (e.g. SSL is on 443)

Call listen() on the socket to indicate willingness to receive requests
Call accept() to wait for a request to come in (blocking)

When the accept() returns, we have a new socket which represented a new connection to a client

Processing a Request

remoteIP = getsockname(newconn);
remoteHost = gethostbyname(remoteIP);
gettimeofday(currentTime);
read(newconn, reqBuffer, sizeof(reqBuffer));
reqInfo = serverParse(reqBuffer);

read() is called on a new socket to retrieve request
Type of request is determined by parsing the read data
For logging purposes (optional, but done by most):

Figure out the remote host name
Figure out the name of other end
Get time of request

Optional: who is calling (address, IP), read what client was sending
Return a file, figure if the file is there, if it's accessible and read it, send it to client

fileName = parseOutFileName(requestBuffer); 
fileAttr = stat(fileName); 
serverCheckFileStuff(fileName, fileAttr); 
open(fileName);

Assuming the request is for a file (e.g. penguin.gif)
Test file path and meta-data

See if file exists/is accessible
Check permissions
Check meta-data: e.g. size of file, last modified time

Assuming all is OK: open the file

Responding to a Request

read(fileName, fileBuffer);
headerBuffer = serverFigureHeaders(fileName, reqInfo);
write(newSock, headerBuffer);
write(newSock, fileBuffer);
close(newSock);
close(fileName);
write(logFile, requestInfo);

Read the file into user space
Send HTTP headers on socket
Write the file on the socket
If connection is not persistent, close the socket
Close the open file descriptor
Write on the log file

Proxy Server

Two faces: one talks to client, another to server

Client: plays role of a server
Server: plays role of a client

For each HTTP request from the browser:

(1) Accepts HTTP requests from the browser

(2) Gets the data from the target web server (or from its cache), modifies the response if need be

(3) Sends HTTP respond with the data

Must handle concurrent browser requests
May be designed for different purposes

Back to Navigation

Tutorial 4: HTTP Protocol Specification

What is HTTP?

HTTP stands for Hypertext Transfer Protocol

Used to deliver virtually all files and other data (collectively called resources) on the World Wide Web
Usually, HTTP takes place through TCP/IP sockets

A browser is an HTTP client

It sends requests to an HTTP server (Web server)
The standard/default port for HTTP servers to listen on is 80

A resource is some chunk of data that is referred to by a URL

The most common kind of resource is a file
A resource may also be a dynamically-generated content, e.g. query result, CGI script output, etc.

Structure of HTTP Transactions

Start with method name (GET, etc.)
HTTP uses the client-server model

An HTTP client opens a connection and sends a request message to an HTTP server
The server then returns a response message, usually containing the resource that was requested
After delivering the response, the server closes the connection (except for persistent connections)

Format of the HTTP request and response messages

Almost the same, human readable (English-oriented)
An initial line specifying the method
Zero or more header lines
A blank line (i.e. a CRLF by itself)
An optional message body (e.g. a file, or query data, or query output)

Generic HTTP Header Format

<initial line, different for request vs. response> 
Header1: value1 
Header2: value2 
Header3: value3 
<optional message body, like file or query data; may be many lines, may be binary $&*%@!^$@>

Initial Request Line

A request line has three parts, separated by spaces

A method name,
The local path of the requested resource,
And the version of HTTP being used
Example: GET /path/to/file/index.html HTTP/1.1

GET: most common HTTP request. Says "give me this resource"
Method names are always uppercase
The path is the part of the URL after the host name, also called the request URI

URI is like a URL, but more general

The HTTP version always takes the form HTTP/x.x, uppercase

Initial Response Line

Status line

The HTTP version
The response status code: result of the request
A reason phrase describing the status code

Response categories and most common status codes

1xx: an informational message
2xx: success of some kind

200 OK: the request succeeded, and the resulting resource is returned in the message body

3xx: redirections

301 Moved Permanently
302 Moved Temporarily
303 See Other (HTTP 1.1 only): the resource has moved to another URL

4xx: an error on the client's part

404 Not Found

5xx: an error on the server's part

The Message Body

Reading of a string, know when to stop reading/expect how many come
Need content length and value
After headers, there may be a body of data
In a response, may be:

Requested resource
Explanatory text if there's an error

In a request this may be:

The user-entered data
Uploaded files

If an HTTP message includes a body, there are usually header lines in the message that describes the body

Content-Type: the MIME-type of the data (e.g. text/html or image/gif)
Content-Length: the number of bytes in the body

Sample HTTP Exchange

HTTP Request

GET /path/f.htm HTTP/1.1
Host: www.host1.com:80
User-Agent: HTTPTool/1.0 
[blank line here]

HTTP Response

HTTP/1.1 200 OK 
Date: Fri, 31 Dec 1999 
23:59:59 GMT 
Content-Type: text/html 
Content-Length: 1354 
<html> 
<body> 
< h1>Happy New Millennium!</ h1> 
(more file contents) ...
</body> 
</html>

The HEAD Method

Meta data of page
Need for caching
HEAD is like a GET request, except:

It asks the server to return the response headers only, not the actual resource (i.e. no message body)
Used to check characteristics of a resource without actually downloading it
HEAD is used when you don't actually need a file's contents

The response to a HEAD request must never contain a message body, just the status line and headers

The POST Method

POST is used to send data to the server
POST is different from a GET request:

Data is sent with the request, in the message body
There are usually extra headers to describe this message body (e.g. Content-Type and Content-Length)
The request URI is not a resource to retrieve; it's usually a program to handle the data you're sending
The HTTP response is normally program output, not a static file

Example: submitting HTML form data to CGI scripts

Content-Type: header is usually application/x-www-form-urlencoded
Content-Length: header gives the length of the URL-encoded form data

Blank user password - transparent (don't see it but it's there)

Accounts get hacked easily

POST Method Example

POST /login.jsp HTTP/1.1
Host: www.mysite.com
User-Agent: Mozilla/4.0
Content-Length: 27
Content-Type: application/x-www-form-urlencoded
userid=me&password=guessme

Can use POST request to send whatever data you want, not just form submissions. Just make sure the sender and the receiving program agree on the format
GET method can also be used to submit forms. The form data is URL-encoded and appended to the request URI

Persistent Connections

Persistent HTTP connection:

Used to increase performance, some servers allow persistent HTTP connections
The server does not immediately close the connection after sending the response
The response should be sent back in the same order as requests
The Connection:close header in a request indicates the final request for the connection

The server should close the connection after sending the response. Also, the server should close an idle connection after some timeout period

Caching

Locally figure out what is up to date, or send a HEAD request and server will tell us if there are any changes
Saves bandwidth and improves efficiency
Proxy or web browser avoids transferring resources for which a local up-to-date copy exists

A copy of the previous content is saved in the cache
Upon a new request, first the cache is searched
If found in cache, return the content from cache
If not in cache, send request to the server

But what if the content is out of date?

We need to check if the content is modified since last access

The Date: Header

We need time-stamped responses for caching
Servers must timestamp every response with a Date: header containing the current time e.g. Date: Fri, 31 Dec 1999 23:59:59 GMT
All responses except those with 100-level status (but including error responses) must clude the Date: header
All time values in HTTP use Greenwich Mean Time

Conditional Get Example

REQUEST

GET /sample.html HTTP/1.1
Host: example.com
If-Modified-Since: Wed, 01 Sep 2004 13:24:52 GMT
If-None-Match: “4135cda4″

RESPONSE

HTTP/1.1 304 Not Modified
Expires: Tue, 27 Dec 2005 11:25:19 GMT
Date: Tue, 27 Dec 2005 05:25:19 GMT
Server: Apache/1.3.33 (Unix) PHP/4.3.10

Redirection Example

REQUEST 1

GET /~carey/index.html HTTP/1.1
Host: www.cpsc.ucalgary.ca
Connection: keep-alive
User-Agent: Mozilla/5.0 […]
Accept: text/html,application/ 
[…]
Accept-Encoding: gzip,deflate,sdch
[…]
\r\n

RESPONSE 1

HTTP/1.1 302 Found
Date: Sat, 21 Jan 2012 01:10:43 GMT
Server: Apache/2.2.4 (Unix) 
   mod_ssl/2.2.4OpenSSL/0.9.7a 
   PHP/5.2.9 mod_jk/1.2.25
Location: http://pages.cpsc.ucalgary.ca/~carey/index.html
\r\n

REQUEST 2

GET /~carey/index.html HTTP/1.1
Host: pages.cpsc.ucalgary.ca
Connection: keep-alive
User-Agent: Mozilla/5.0 […]
Accept: text/html,application/ 
[…]
Accept-Encoding:gzip,deflate
[…]
\r\n

RESPONSE 2

HTTP/1.1 200 OK
Date: Sat, 21 Jan 2012 01:11:49 GMT
Server: Apache/2.2.4 (Unix) […]
Last-Modified: Mon, 16 Jan 2012 05:40:45 GMT
Content-Length: 3157
Keep-Alive: timeout=5
Connection: Keep-Alive
Content-Type: text/html
\r\n
<!DOCTYPE HTML PUBLIC "-
//W3C//DTD HTML 4.0 
Transitional//EN">
<html>
[…]
</html>
\r\n

HTTP 1.0 vs HTTP 1.1

Host header

HTTP 1.1 has a required host header

Persistent connections

By default, connections are assumed to be kept open after the transmission of a request and its response
The protocol permits closing of connections at any point
Connection: close header is used to inform the recipient that the connection will not be reused

Pipelining

A client need not wait to receive the response for one request before sending another request on the same connection

Strong cache support
Chunked transfer-encoding
OPTIONS method

A way for a client to learn about the capabilities of a server without actually requesting a resource

Assignment Tips

In our proxy:

(1) Pass along the 300-level message

GET request - does not ensure contents match as it's a redirection

Handle it locally in proxy and make a new connection on behalf of browser and get content from target server, or let browser take care of itself (send message to browser)

If you want to do extra:

Read about SELECT - set up sockets, give sockets the function and have one process and one loop; SELECT tells you if anything interesting has happened in the meantime while other stuff happens (saves resources)

// make sure you have read enough
// handle GET request
   // figure out he host
       // connect to the host
       // get response from host
       // do modification
       // send back response

Back to Navigation

Tutorial 5: Wireshark

Using Wireshark to Capture and Analyze Network Data

Wireshark (originally named Ethereal)is a free and open-source packet analyzer
It is used for network troubleshooting, analysis, software and communication protocol development, and education
It has a graphical front-end, and many more information sorting and filtering options.
Tool to capture traffic

Passive: gets a copy of protocol stack

Features and Functions

Wireshark is software that "understands" the structure of different networking protocols. Thus, it is able to display the encapsulation and the fields along with their meanings of different packets specified by different networking protocols.
Physical layer frame has header and footer, Wireshark can figure that out and its payload

Looking at header, can tell what kind of package it is; goes until application layer to figure out the protocol it's using

Works with live data and recorded/captured data
Data display can be refined using a display filter

Before Capturing Data

Not always allowed to capture data, follow laws!
Are you allowed to do this?

Ensure that you have permission to capture packets from the network you are connected with
Corporate policies or applicable laws may prohibit capturing data from the network

General Setup

Operating system must support packet capturing, e.g. capture support is enabled :* You must have sufficient privileges to capture packets, e.g. root / administrator privileges
Your computer's time and time zone settings should be correct

Example: Wireshark Demonstration

Detects interfaces (can be associated with Wi-Fi, ethernet, etc.)

Capture -> Interfaces

See traffic on interfaces (figure out the active one)

Start (can stop whenever)

Fills page with packets and information

NBNS: helps with discovery of devices (routers, iOS, etc.)
ARP: routing purposes (adhoc network)
Can filter in Options -> Capture Filter: <words> then Start

Capture filter: capture the ones I tell you

Example: (host <IP address>)

Or filter by Start then Filter: <words>

Captures everything and just show what you want

Example: HTTP

Wireshark Filters

Wireshark has two types of filters:

Capture Filters

A powerful capture filter engine helps remove unwanted packets from a packet trace and only retrieve the packets of interest

Display Filters

Let you compare the fields within a protocol against a specific value, compare fields against other fields, and check the existence of specified fields or protocols

Comparison Operators

Fields can also be compared against values. The comparison operators can be expressed either through English-like abbreviations or through C-like symbols

eq, == Equal
ne, != Not Equal
gt, > Greater Than
lt, < Less Than
ge, >= Greater than or Equal to
le, <= Less than or Equal to

Logical Expressions

Tests can be combined using logical expressions. These too are expressible in C-like syntax or with English-like abbreviations:

and, && Logical AND
or, || Logical OR
not, ! Logical NOT

Some valid display filters

tcp.port == 80 and ip.src == 192.168.2.1
http and frame[100-199] contains "wireshark"

Slice Operator

You can take a slice of a field if the field is a text string or a byte array.
For example, you can filter the HTTP header fields

Here the header “location” indicates that redirection happens: http.location[0:12]=="http://pages"
Another example: http.content_type[0:4] == "text"

Capture Filters


Syntax	Protocol	Direction	Host(s)	Logical Operand	Other Expressions
Example	tcp	dst	136.159.5.20	and	host 136.159.5.6

Protocol

Values can be ether, fddi, ip, arp, rarp, decnet, lat, sca, moprc, mopdl, tcp and udp
If no protocol is specified, all the protocols are used

Direction

Values can be src, dst, src and dst, src or dst
If no source or destination is specified, the "src or dst" keywords are applied
For example, “host 136.159.5.20” is equivalent to “src or dst host 136.159.5.20”

Host(s)

Values can be net, port, host, portrange
If no host(s) is specified, the "host" keyword is used
For example, "src 136.159.5.20" is equivalent to "src host 136.159.5.20"

Logical Operations

Values can be not, and, or
Negation ("not") has highest precedence. Alternation ("or") and concatenation ("and") have equal precedence and associate left to right
For example, "not tcp port 3128 and tcp port 80" is equivalent to "(not tcp port 3128) and tcp port 80"

Examples

tcp port 80

Displays packets with tcp protocol on port 80

ip src host 136.159.5.20

Displays packets with source IP address equals to 136.159.5.20

host 136.159.5.1

Displays packets with source or destination IP address equals to 136.159.5.1

src portrange 2000-2500

Displays packets with source UDP or TCP ports in the 2000-2500 range

src host 136.159.5.20 and not dst host 136.159.5.1

Displays packets with source IP address equals to 136.159.5.20 and in the same time not with the destination IP address 136.159.5.1

(src host 136.159.5.1 or src host 136.159.5.3) and tcp dst portrange 200-10000 and dst host 136.159.5.2

Displays packets with source IP address 136.159.5.1 or source address 136.159.5.3, the result is then concatenated with packets having destination TCP port range from 200 to 10000 and destination IP address136.159.5.2

Display Filters


Syntax	Protocol	.	String 1	.	String 2	Comparison Operators	Value	Logical Operators	Other Expressions
Example	http	.	request	.	method	==	get	or	tcp.port == 80

String1, String2 (Optional settings)

Sub protocol categories inside the protocol. To find them, look for a protocol and then click on the "+" character

Examples

ip.addr == 136.159.5.20

Displays the packets with source or destination IP address equals to 136.159.5.20

http.request.version=="HTTP/1.1"

Display HTTP requests with version 1.1

tcp.dstport == 25

Display TCP packets with destination port equal to 25

tcp.flags

Display packets having a TCP flag

tcp.flags.syn == 0x02

Display packets with a TCP SYN flag

Back to Navigation

Assignment 2: TCP Wrangling

Tutorial 6: UDP Protocol Specification

TCP has a lot of overhead

In streaming, fragmentation isn't as bad as loss of information (which happens in TCP), that is why UDP is chosen

What is UDP?

User Datagram Protocol
Name for what to put in an envelope for every layer

Application: message
Transport: segments (TCP), datagram (UDP)

TCP: stream, segments are part of it; datagram is wrapped up data that can be handled separately

Protocol for transport layer in protocol stack

Alternative to TCP and together with IP, also referred to as UDP/IP
Programs on networked computers can send short messages sometimes known as datagrams (using datagram sockets) to one another

Features of UDP

Minimal transport layer protocol
Ex: Two computers connected by Ethernet

Routers congested, packets get dropped
Buffers at every layer exist, especially IP layer (routers have limited buffer amount)

Unreliable and connection-less

UDP provides no guarantees to the upper layer protocol for message delivery
The UDP protocol retains no state of UDP messages once sent
Messages may be delivered out of order

UDP provides some integrity verification (via checksum) of the header and payload

Problem 1: packet gets dropped
Problem 2: bits get flipped due to errors, problems, etc.

UDP provides application multiplexing (via port numbers)

Both TCP, UDP use this
TCP: OS handles this (one instance of socket library part of OS)

OS knows how to handle these by ports to decide which packet belong to which processes on this machine

Encapsulation

Process creates a message, into a buffer

-> UDP encapsulates, creates a header (UDP datagram) -> IP data, IP header -> Frame data, frame header -> Decapsulation occurs, first to frame data, frame header -> IP data, IP header -> UDP data, UDP header -> Message to process -> Process

UDP Header

offset (bits)  0       7 8     15 16    23 24     31
               +--------+--------+--------+--------+
       0       |   src port num  |  dest port num  |
               +--------+--------+--------+--------+
       32      |      length     |     checksum    |
               +--------+--------+--------+--------+
       64+     |    ...       data        ...      |
               +--------+--------+--------+--------+

HTTP headers were text-based, varying size, no clear boundary

Payload, message to send

UDP header is very fixed number of bytes allocated for each part of the header
Some are optional - bits must exist, but they can be zero'd

Not mandatory for bits to be there for functionality

Unsure of IP, so send pseudo headers
UDP header consists of 4 fields, each of which is 2 bytes

The use of two of those is optional in IPv4 (src port num and checksum), but in IPv6 only the source port num is optional

Source Port Number

Bit 0-15, 2 bytes (1 word)
Identifies the sender's port, should be assumed to be the port to reply to if needed
If not used, then it should be zero

Destination Port Number

Identifies the receiver's port and is required

Length

Length of message in bytes; considers the entire datagram: header and data
The minimum length is 8 bytes = length of the header
The field size sets a theoretical limit of 65 535 bytes (8 byte header + 65 527 bytes of data) for a UDP datagram
Imposes limit on how many bytes the payload can be, if it's bigger then break it into two

Checksum

Verification of contents whether bits have been flipped

Error checking of the header and data. Mandatory for IPv6

Not just payload, but also pseudo header
If no checksum is generated by the transmitter, the field should be set to all-zeros
UDP uses pseudo header to define the checksum. It is calculated over the combination of pseudo header and UDP message
Pseudo header contains:

IP source address field
IP destination address field
IP protocol field
UDP length field

Data

IPv4 is 2^16 for one
Padding must be added to make the data a multiple of 16 bits

TCP vs UDP

TCP

Reliable (like a pipe)
Connection-oriented
Segment retransmission and flow control through windowing

Cut in the pipe, things stop working - it will retransmit or give you a message

Segment sequencing
Acknowledge segments

UDP

Unreliable
Connectionless
No windowing or retransmission
No sequencing, ordering for datagrams
No acknowledgement

Why Still Use UDP?

UDP's advantage over TCP

UDP's header is smaller than TCP's. The header applies to every segment, so it adds up
Generating UDP header has much simpler processing steps
UDP has no connection setup overhead, while TCP requires 3-way handshake

UDP is widely used and recommended for cases where:

Speed is more important than reliablity. An application values timely delivery over reliable delivery

Saves bandwidth since more overhead in TCP

Data exchanges are short and the order of reception of datagram does not mater
Best-effort delivery is enough
Applications require multicast or broadcast transmissions, not supported by TCP

TCP: two ends, can't broadcast

Popular Applications Using UDP

Domain Name System (DNS)
Simple Network Management Protocol (SNMP)

Wireshark, see lots of these messages between routers and network

Dynamic Host Configuration Protocol (DHCP)

Gets dedicated IP

Routing Information Protocol (RIP)
Many multimedia applications like streaming video, VOIP

“Real-time video and audio streaming protocols are designed to handle occasional lost packets, so only slight degradation in quality occurs, rather than large delays if lost packets were retransmitted.” [Wikipedia]

Checksum

“Checksum is the 16-bit one's complement of the one's complement sum of a pseudo header of information from the IP header, the UDP header, and the data, padded with zero octets at the end (if necessary) to make a multiple of two octets.” [RFC 768]
Checksum is calculated for UDP header and data

IP header also has a checksum, but it doesn't cover data

To create, must maintain length (less than max pad with zeros)
UDP checksum test is performed only at the sender and receiver end stations

The IP checksum test is performed in every intermediate node (router)

UDP check sum is performed over a pseudo header

In addition to UDP header and data, and the source and the destination IP address
Prevents misrouting: in case the destination IP address in IP header was corrupted, and it was not discovered by the IP checksum test, the UDP datagram would arrive to the wrong IP address. UDP can detect this and silently drop the datagram

Pseudo Header

0      7 8     15 16    23 24    31
+--------+--------+--------+--------+
|         source IP address	    |
+--------+--------+--------+--------+
|      destination IP address       |
+--------+--------+--------+--------+
| zeroes |protocol| UDP total length|
+--------+--------+--------+--------+

Above the UDP header
IP layer - passes
These are all in the checksum

Reason: check if not actually the receiver

Comparison of IPv4 vs IPv6

Bigger, many are mandatory in IPv6

Sample Checksum Calculation

“Checksum is the 16-bit one's complement of the one's complement sum of a pseudo header of information from the IP header, the UDP header, and the data, padded with zero octets at the end (if necessary) to make a multiple of two octets.” [RFC 768]
For every 16 bit (word/row) take the 1's complement, add them, then take 1's complement of the result
Yellow: pseudo-header, gray: data

Socket Programming with UDP

No setup
Server: open socket (same), bind (for multiplexing) but don't listen/accept

Just send and receive data, then close connection

Client

No connect, since no setting up
Just open socket, send and receive data, close connection

UDP Server and Client Sockets

Server

(1) Open datagram socket with socket()

(2) Name the socket with bind(), using SOCKADDR structure for the address parameter

(3) Exchange data with a client using sento() and recvfrom()

(4) Close the connection with closesocket()

Client

(1) Open a datagram socket with socket()

(2) Exchange data with server using sendto() and recvfrom()

(3) Close the connection with closesocket()

Learn who other end is (accept)

No information, thus every time we do a recvfrom(), learn about sender

Can be multiple senders on same socket

Sending and Receiving Data

int sendto(int sockid, const void *sendbuf, int sendlen, int flags, const struct sockaddr *to, int tolen);

sockid: integer, socket descriptor
sendbuf: buffer containing the data to be transmitted
sendlen: size in bytes of the data in the buf buffer
flags: indicator specifying the way in which the call is made
to: struct sockaddr, the address of the target
tolen: the size (in bytes) of the addrport structure

int recvfrom(int sockid, const void *recvbuf, int recvlen, int flags, const struct sockaddr *from, int fromlen);

sockid and flags are same as above
recvbuf and recvlen: buffer containing the received data and its size
from and fromlen: the address of the source and its size

Return value: If no error occurs, these function returns the total number of bytes sent/recieved, if an error occurs, SOCKET_ERROR (- 1) is returned.

Sample Code

Example: wordserver

Ask for a word
Setup socket, error checking
Setup addr struct
Bind, error checking

Anyone can send

Also setup client sockaddr
Receives some string in the code; but you are sending/reading bytes so cannot use strlength

Example: simpleChat

Has both server/client in one program
Pay attention that it sends ASCII, but you have to deal with bytes
Client struct for peers
Each process has setting - input arguments to tell which instance of the program you will run
Need to run both at same time

Assignment Tips

Create program with both server and client
Write similar to BitTorrent
File segmented into chunks

Pull design: who has what, fear the other client who has piece I want

This server now, I am client; send a message requesting piece and they start sending
Multiple processes running

A binary file (P1)

Chunks: c1, c2, c3, c4
Only have c4
Server side: select, bind(port1)

recvfrom(...)

Design message so P2 wants c4

sendto(chunk, P2)

Another binary file (P2)

Chunks: c1, c2,c3, c4
Have c1, c3, wants c4
Client Side: socket()

sendto(requestmsg, addr P1)

Create your own protocol

Server sde: socket, bind(port2)

Another (P3)

Loop send/recv properly

Blockings occur, might be slow

Challenge: multiple clients, designing

TCP: easily fork, pass on handling of client to child process
UDP: no child process; use multi-threading (receive request, handle sending in another thread)

Instead of VM (which can cause network issues), can use cygwin

In Developer Tools, specify you want gdb
Gets Bash functionality

Back to Navigation

Tutorial 7: TCP Protocol Specification

At transport layer, data handled by UDP is datagram but for IP these are also datagram (different)

Overview

Connection-oriented, point-to-point protocol

Connection establishment and teardown phases
'Phone-like' circuit abstraction (application-layer view)
One sender, one receiver
Called 'reliable byte stream' protocol
General purpose (for any network environment)

Features

TCP provides the following facilities to:

Stream Data Transfer

Transfers a contiguous stream of bytes in TCP segments

Multiplexing

Allow for many processes within a single host to use TCP communication facilities simultaneously
Port numbers important to determine which process is bein used

Reliability
Flow Control and congestion control

Originally optimized for certain kinds of transfer:

Telnet (interactive remote login)
FTP (long, slow transfers)

Large amounts of data to transfer between two ends; overhead acceptable with reliability

Web is like neither of these!

Originally optimized for wired networks

Provides a reliable, in-order, byte stream abstraction:

Recover lost packets and detect/drop duplicates
Detect and drop corrupted packets
Preserve order in byte stream; no “message boundaries”
Full-duplex: bi-directional data flow in same connection

Flow and congestion control:

Flow control: sender will not overwhelm receiver
Congestion control: sender will not overwhelm the network
Sliding window flow control
Congestion control done via adaptive flow control window size
Send and receive buffers

TCP Header

0                   1                   2                   3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|       Source Port Number      |    Destination Port Number    |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|                        Sequence Number                        |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|                    Acknowledgment Number                      |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|  Data |           |U|A|P|R|S|F|                               |
| Offset| Reserved  |R|C|S|S|Y|I|     Receiver window size      |
|       |           |G|K|H|T|N|N|                               |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|           Checksum            |      Pointer urgent data      |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|          Options (variable length)            |    Padding    |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|               applcation data (variable length)               |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

More than UDP
Figure out-of-order or missing pieces: other end didn't receive acknowledgement
'Piggy-back' - if you have something to send to me, create TCP segment but acknowledge something along the way

Set ACK flag and it means something

To close: send ACK, then receive an acknowledgement of ACK, then close
Source/Destination Port Number

These two fields plus the source and destination IP addresses, combine to uniquely identify each TCP socket

Sequence Number

The byte in the stream of data from the sending TCP to the receiving TCP that the first byte of data in this segment represents

Acknowledgement Number

Contains the next sequence number that he sender expects to receive. This acknowledges receipt of all prior bytes. This field is valid only if the ACK flag is on

Data Offset/Header Length

Gives the length of the header in 32-bit words

Flags (6 bits)

UAPRSF used
URG

Segment contains urgent data. When this flag is set, the UrgPtr field indicates where the non-urgent data contained in this segment begins

ACK

Indicates that the Acknowledgement field is significant. All packets after the initial SYN packet sent by the client should have this flag set

PSH

Push function. Asks to push the buffered data to the receiving application

RST

Reset the connection

SYN

Synchronize sequence numbers. Only the first packet sent from each end should have this flag set

FIN

No more data from sender

Receiver Window Size

The number of bytes that the receiver is currently willing to receive

Checksum

The 16-bit checksum field is used for error-checking of the header and data

Option Field

Has many different options. For example MSS (maximum segment size) option, which specifies the maximum sized segment the sender wants to receive

The data portion of the TCP segment is optional

Establishing a TCP Connection

3-way handshake

Client sends SYN with initial sequence number (ISN = X)

Label of data being sent

Server responds (listen() on port 80) with its own SYN with sequence number Y and ACK of client ISN with (X + 1) (next expected byte)

Acknowledgement number is the label of the expected packet

Client ACKs server's ISN with Y + 1 (server accept() or read())
X and Y are randomly chosen
All modulo 32-bit arithmetic

Sending Data

Sender TCP passes segments to IP to transmit

Keeps a copy in buffer at send side in case of loss
Called a 'reliable byte stream' protocol
Sender must obey receiver advertised window

Receiver sends acknowledgements (ACKs)

ACKs can be piggybacked on data going the other way
Protocol allows receiver to ACK every other packet in attempt to reduce ACK traffic (delayed ACKs)
Delay should not be more than 500 ms (typically 200 ms)

Keep copy in buffer, wait for acknowledgement before removing it
Keep data, holes when out-of-order happens

Example

Preventing Congestion

Sender may not only overrun receiver, but may also overrun intermediate routers

No way to explicitly know router buffer occupancy, so we need to infer it from packet losses
Assumption is that losses stem from congestion, namely, that intermediate routers have no available buffers

Sender maintains a congestion window (CW)

Never have more than CW of un-acknowledged data outstanding (or RWIN data; minimum of the two)
Successive ACKs from receiver causes CW to grow

CW grows based on one of two phases:

Slow-start: initial state
Congestion avoidance: steady-state
Switch between the two when CW > slow-start threshold

Congestion Control Principles

Lack of congestion control would lead to congestion collapse (Jacobson 88)
Idea is to be a 'good network citizen'
Would like to transmit as fast as possible without loss
Probe network to find available bandwidth
In steady-state: linear increase in CW per RTT
After loss event: CW is halved
This general approach is called Addictive Increase and Multiplicative Decrease (AIMD)
Helps lead to network stability

Diagram: Comparison

Bigger Image

Slow Start

Initial CW = 1
After each ACK:

CW += 1

Continue until:

Loss occurs OR
CW > slow start threshold

Then switch to congestion avoidance
If we detect loss, cut CW in half
Exponential increase in window size per RTT

Congestion Avoidance

Until (loss) {
  after CW packets ACKed:
     CW += 1;
}
ssthresh = CW / 2;
// Depending on loss type:
CW /= 2; continue; // SACK/fast retransmit
CW = 1; go to slow start; // Course grained timeout

This is for TCP Reno/SACK: TCP Tahoe always sets CW = 1 after a loss

How Are Losses Recovered?

Coarse-grained timeout:

Sender does not receive ACK after some period of time
Event is called a retransmission time-out (RTO)
RTO value is based on estimated round-trip time (RTT)
RTT is adjusted over time using exponential weighted moving average:

RTT = (1-x) * RTT + (x) * sample

x is typically 0.1

First done in TCP Tahoe

Fast Retransmit

Receiver expects N, gets N + 1:

Immediately sends ACK(N)
This is called a duplicate ACK
Does not delay ACKs here
Continue sending duplicate ACKs for each subsequent packet (not N)

Sender gets 3 duplicate ACKs:

Infers N is lost and resends
3 chosen so out-of-order packets don't trigger Fast Retransmit accidentally
Called 'fast' since we don't need to wait for a full RTT

Connection Termination

Either side may terminate a connection. In fact, connection can stay half-closed. Let's say the server closes (typical in WWW)
Server sends FIN with sequence number (SN + 1) (i.e., FIN is a byte in sequence)
Client ACK's the FIN with (SN + 2) ('next expected')
Client sends its own FIN when ready
Server ACK's client FIN as well with (SN + 1)

The TCP State Machine

TCP uses a Finite State Machine, kept by each side of a connection, to keep track of what state a connection is in
State transitions reflect inherent races that can happen in the network, e.g. two FIN's passing each other in the network
Certain things can go wrong along the way, i.e. packets can be dropped or corrupted
Machine is not perfect; certain problems can arise not anticipated in the original RFC
Timers come into play then

TCP Connection Establishment

CLOSED: more implied than actual, i.e. no connection
LISTEN: willing to receive connections (Accept call)
SYN_SENT: sent a SYN, waiting for SYN-ACK
SYN_RECEIVED: received a SYN, waiting for an ACK of our SYN
ESTABLISHED: connection ready for data transfer

TCP Connection Termination

FIN_WAIT_1: we closed first, waiting for ACK of our FIN (active close)
FIN_WAIT_2: we closed first, other side has ACK'd our FIN, but not yet FIN'd
CLOSING: other side closed before it received our FIN
TIME_WAIT: we closed, other side closed, got ACK of our FIN
CLOSE_WAIT: other side sent FIN first, not us (passive close)
LAST_ACK: other side sent FIN, then we did, now waiting for ACK

Back to Navigation

Other Information

Tutorial 8: The Internet Protocol

The Network Layer

IP (Internet Protocol) is a Network Layer Protocol
RFC 791 provides the specification for IP
Sends datagrams (like UDP in transport layer)
At end points, assume virtual connection that abstracts out what's happening underneath

Goes through ISP, to major routers, to another ISP (for target)

These machines don't implement transport or application layer (end point does this)

The Waist of of the Hourglass

IP is the waist of the hourglass of the Internet protocol stack
Multiple higher-layer protocols
Multiple lower-layer protocols
One common protocol at the network layer for data transmission

Glue point: IP
Every other layer, might have multiple layers within it
Only layer where everyone must implement same protocol
From developer's POV, clean; network providers (telephone companies) hate this

They have no say for any layer (above: users, lower: hardware, physical link), and they have a lot of things to manage within IP

IPv4, IPv6 designed to live with each other - if they can't support one, they use the other (similar to IP protocol)

Routers

IP is the highest layer protocol which is implemented at both routers and hosts

Best Effort Protocol

IP provides an unreliable, connectionless, best effort service (also called: "datagram service")

Unreliable: no guarantee for delivery of packets
Connectionless: things delivered out of order

Each packet ("datagram") is handled independently. IP is not aware that packets between hosts may be sent in a logical sequence

Best effort: no guarantee on anything

IP does not make guarantees on the service (no throughput guarantee, no delay guarantee, etc.)

Consequences: Higher layer protocols have to take care of delivery guarantees

IPv4 and IPv6 Datagram

IPv4: address is 32 bits
IPv6: address is 128 bits

IP Versions

Authority that assigns IP addresses to institutions
The first publicly used version of the Internet Protocol was version 4 (IPv4)

Address space: 32 bits (~4.3 billion addresses)
Initially it was thought to be enough

Address exhaustion

On February 3, 2011, the Internet Assigned Numbers Authority (IANA) officially depleted the global pool of completely fresh blocks of addresses
Address exhaustion was a concern as early as the 1990s

IPv6 is the next generation IP that tries to address the shortcomings of IPv4

Address space: 128 bits (~79 octillion times more than IPv4)
Designed to live alongside IPv4

Version 5:

Doesn't exist; skipped to avoid confusion

IPv5 relates to experimental TCP/IP protocol called the Internet Stream Protocol, Version 2 (originally RFC 1190)
Protocol seen by some as being a peer of IP at the Internet Layer in TCP/IP architecture, and packets assigned IPv5 to differentiate them from 'normal' IPv4 packets
Protocol never went anywhere, so to avoid confusion IPv6 was used

IPv4 Datagram Fields

     0                   1                   2                   3
Bit  0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2
     +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
0    |Version|  IHL  |    DCP    |ECN|          Total Length         |
     +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
32   |         Identification        |Flags|     Fragment offset     |
     +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
64   |  Time to Live |    Protocol   |        Header Checksum        | 
     +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
96   |                      Source IP Address                        |
     +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
128  |                    Destination IP Address                     |
     +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
160  |                     Options (if IHL > 5)                      |
     +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

Explicit Congestion Notification(ECN): 2 bits

An optional feature that is defined by RFC 3168 for notification of network congestion without dropping packets
Both endpoints must support it and be willing to use it
Only effective when supported by the underlying network

Total Length: 16 bits

The entire IP datagram size, including the header and payload
Minimum-length is 20 bytes (minimal header with no payload)

Identification: 16 bits

Used primarily for uniquely identifying the group of fragments of a single IP datagram
Unique identification of a datagram from a host
Incremented whenever a datagram is transmitted

Flags: 3 bits

Bit field used to control or identify fragments

Bit 0: Reserved; must be zero
Bit 1: Don’t fragment (DF)

If set, packets are dropped if they need to be fragmented

Bit 2: More fragments (MF)

Zero for non-fragmented packets; for fragmented packets, all but the last packet has this flag set; the last packet will have a non-zero "Fragment Offset" field

Fragment Offset: 13 bits

Measured in units of 64-bit words (8 byte)

Time to Live: 8 bits

Limits a datagram’s lifetime to break routing circles

Specified in seconds but in practice is used as a hop count (decrement by 1 at each router) and set to 64 at the start
When TTL is zero, the router should discard the packet; typically an ICMP Time Exceeded message is sent to the sender

Protocol: 8 bits

Defines the protocol used in the payload
There are over 140 protocols defined (TCP is 0x06; UDP is 0x11)

Header Checksum

The 16-bit one's complement of the one's complement sum of all 16-bit words in the header
For computing the checksum, the value of the checksum field is

zero

Options: not often used

Used to control fragmenting, routing, debugging, security, etc.
Must be padded so that the header is divisible by 32 bits (4 bytes)

Maximum Transmission Unit

Maximum size of IP datagram is 65535, but the data link layer protocol generally imposes a limit that is much smaller
Ethernet frames have a maximum payload of 1500 bytes

IP datagrams encapsulated in Ethernet frame cannot be longer than 1500 bytes

The limit on the maximum IP datagram size, imposed by the data link protocol is called maximum transmission unit (MTU)

MTUs for various data link protocols
Ethernet	FDDI	ATM AAL5	802.3	802.5	802.11 (WLAN)
1500	4352	9180	1492	4464	2272

IP Fragmentation

If size of IP datagram exceeds MTU, IP datagram is fragmented into smaller units
Can transfer data over heterogenous types of networks (with different MTUs)
Fragmentaion:

IP router splits the datagram into several datagram
Fragments are reassembled at receiver

Analogy: want to send an air bus (that can't fly yet) from Paris to Calgary

From Paris to Toronto/Montreal, they can ship the whole thing without breaking it down (a big ship carries this bus parcel)
On arrival from Montreal, to send to Calgary, there is not a big enough carrier that can carry the bus

Disassemble into pieces and reassemble in Calgary

Fragmentation can be done at the sender or at intermediate routers
The same datagram can be fragmented several times
Reassembly of original datagram done only at destination hosts
How big your data that you want to transfer, the network you want to pass through cannot handle it
Multiple hops: each part can give you a different type of network
Sender is agnostic of what's going on in the network
Whoever is gateway between two types of network and sees a packet that cannot travel over a specific one, it is responsible for fragmentation
A network will not reassemble the packet (extra work), it maintains its fragmented state and destination reassembles it - no one in the middle does that
Middle man will fragment it depending where it's needed

Example of Fragmentation

User is agnostic to this, network layer worries about it
Set fragmentation flag and ID (common between all of them so know pieces belong together)
Offset - 1480 because something small taken out of header

IPv6 Datagram Fields

     0                   1                   2                   3
Bit  0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2
     +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
 0   |Version| Traffic Class |             Flow Control              |
     +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
 32  |         Payload Length        |  Next Header  |   Hop Limit   |
     +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
     +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
 64  |                                                               |
 96  |                     Source IPv6 Address                       |
 128 |                          128 bits                             |
 160 |                                                               |
     +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
 192 |                                                               |
 224 |                    Destination IPv6 Address                   |
 256 |                            128 bits                           |
 288 |                                                               |
     +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

Version: 4 bits

For IPv6, this has a value of 6 (0110)

Traffic Class: 8 bits

The same as the redefined IPv4 fields:

The first 6 bits are differentiated services for real-time data streaming
The last 2 bits are for ECN (Explicit Congestion Notification)

Flow Label: 20 bits

Originally created for giving real-time applications special service
When set to a non-zero value, it serves as a hint to routers and switches with multiple outbound paths that these packets should stay on the same path so that they will not be reordered
It has further been suggested that the flow label be used to help detect spoofed packets

Payload Length: 16 bits

The size of the payload in octets, including any extension headers
This is different from IPv4 as it does not include the fixed IPv6 header
The length is set to zero when a Hop-by-Hop extension header carries a Jumbo Payload option

A Jumbo Payload has a 32 bit length in the Hop-By-Hop Options extension header allowing packets up to 4GB in size!

Next Header: 8 bits

The same as the IPv4 Protocol field
The extension headers are described here as protocols

Hop Limit: 8 bits

Replaces the time to live field of IPv4
This value is decremented by one at each intermediate node visited by the packet
When the counter reaches 0 the packet is discarded

Fragmented Packets

Notice there is no fragmentation fields, so routers cannot fragment IPv6 packets as they do for IPv4
Hosts may use the fragmentation extension to send packets larger than an MTU

IPv6 also does not have a checksum field

Back to Navigation

Tutorial 9: Ethernet LANs

Introduction of Ethernet

Networks existed before Internet

At that time, they were classified in terms of local area (same building), wide area (clusters of LANs in a city), metropolitan (more than core downtown), Internet (connect these)

One machine has a modem which can dial up to internet and it becomes a gateway so other computers can connect to this one and use the Internet - painfully slow
Ethernet, defined under IEEE 802.3, is one of today's most widely used data communications standards

It finds its major use in Local Area Networks (LAN)
It has largely replaced competing wired LAN technologies

Founded by Xerox Palo Alto Research Center in 1975
Originally designed as a 2.94 Mbps system to connect 100 computers on a 1 km cable
Later, Xerox, Intel and DEC drew up a standard to support 10 Mbps, which was later basis for the IEEE’s 802.3 specification

Main Elements

(1) The network nodes

The points to and from which the communication takes place
Data Terminal Equipment: devices such as PCs, file servers, print servers
Data Communications Equipment: devices that receive and forward the data frames across the network, e.g. repeaters, switches, routers

(2) Interconnecting media

The cable that connects the network nodes, the type determines the speed at which the data may be transmitted
Coaxial Cable
Twisted Pair Cables
Fiber Optic Cables

Network Topologies

Point to point: simplest configuration as only two network hosts are used
Coaxial bus: broadcast LAN - all transmitted frames travel to and are processed by all adapters connected to bus
Hub-based star network: also broadcast LAN - hosts are directly connected to a hub with twisted-pair copper wire. Hub sends a copy out on all of its other interfaces
Switch-based star network: hub replaced by switch, which sends a copy to the target host

Ethernet IEEE 802.3 Standards

802.3 standard defines both MAC and physical layer details
Ethernet naming, three parts:

E.g. 10Base-T and 100Bast-T

First number:

10, 100, or 1000
Indicates transmission speed in megabits per second

Second number indicates transmission type

BASE = baseband (digital signal, single channel)
BROAD = broadband (analog signal, multiple channel)

Third number refers to physical media itself

T: means unshielded twisted-pair cables. Further numbers indicate the number of twisted pairs available. For example in 100BASE-T4, the T4 indicates four twisted pairs

MAC Addresses

Media Access Control Address:

A unique identifier assigned to network interfaces
Assigned by the manufacturer, stored in the hardware
Usually permanent, no duplication
6-byte address expressed in 12-digit hexadecimal numbers

The first 24 bits identify the manufacturer
The second half of the address is known as the extension of board ID

E.g. broadcast address FF-FF-FF-FF-FF-FF

Link-layer addressing scheme, used as a network address for most IEEE 802 network technologies, including Ethernet
Mac address is analogous to a person’s SIN number, while IP address is analogous to postal address

Ethernet IEEE 802.3 Frame Format/Structure

Frame structures are developed within the MAC layer of the protocol stack
10 / 100 Mbps Ethernet MAC data frame format:

Header

Preamble (PRE) - informs the receiving stations that a frame is starting as well as enabling synchronization
Start Of Frame delimiter (SOF)
Destination Address (DA) – first bit: 0-an individual address, 1-a group address. The next bit into the DA indicates whether the address is globally administered (0), or local(1). 46 remaining bits-destination address
Source Address (SA) - always an individual address the left most bit is always a zero
Length / Type - provides MAC information and indicates the number of client data types that are contained in the data field of the frame

Payload

Data - minimum of 46 bytes, up to 1500 bytes long

Trailer:

Frame Check Sequence (FCS) - This field is four bytes long. It contains a 32 bit Cyclic Redundancy Check (CRC)

1000 MBps Ethernet MAC data frame format

Extension: When using the 1000Base-X standard, there is a minimum frame size of 416 bytes, and for 1000Base-T there is a minimum frame size of 520bytes

Ethernet Media Access Control Method

Ethernet uses Carrier Sense Multiple Access / Collision Detection (CSMA/CD)
Carrier Sense: each station listens on the network for traffic and it can detect when the network is quiet
Multiple Access: the stations are able to determine for themselves whether they should transmit
Collision Detect:

It is still possible that two stations will start to transmit at virtually the same time
If this occurs then the stations detect collision and will stop transmitting. They then back off a random amount of time before attempting a retransmission

Fast Ethernet

100BaseT Ethernet (Fast Ethernet) is defined under the 802.3 family of standards under 802.3u
One of the most widely used forms of Ethernet
All the nodes within the network share the 100 Mbps bandwidth
It uses the CSMA/CD access method, but there are some minor differences in the way the overall system operates
It runs on UTP or optical fiber cable and uses a star topology

Gigabit Ethernet

The next development of the Ethernet standard beyond the popular 100Base-T version
Provides for half and full duplex operation at speeds of 1000 Mbps
It is particularly easy to install because the 1000Base-T variant is designed to run over Cat 5 UTP (unshielded twisted pair) that is widely and cheaply available
Uses the 802.3 Ethernet frame formats
Uses the CSMA/CD access method with support for one repeater per collision domain
Provides backward compatibility with 10BASE-T and 100BASE-T technologies

CSMA/CD Algorithm

Overview:

Detailed:

Back to Navigation

Tutorial 10: Hubs, Switches, and Bridges

LAN Interconnection

We need to break down big networks to sub-LANs

Limited amount of supportable traffic: on single LAN, all stations must share bandwidth
Limited length: 802.3 (Ethernet) specifies maximum cable length. For 10 Mbps:

Maximum length of the wire: 2,500 meter

Large "collision domain" (can collide with many stations)

Hubs

Physical Layer devices
Essentially repeaters operating at bit levels: repeat received bits on one interface to all other interfaces
Hubs can be arranged in a hierarchy (or multi-tier design), with backbone hub at its top
Each connected LAN referred to as LAN segment

Advantages:

Simple, inexpensive device

Multi-tier provides graceful degradation: portions of the LAN continue to operate if one hub malfunctions
Extends maximum distance between node pairs (100m per Hub)
Disadvantages:

Hubs do not isolate collision domains: node may collide with any node residing at any segment in LAN

Single collision domain results in no increase in max throughput
Multi-tier throughput same as single segment throughput
Individual LAN restrictions pose limits on number of nodes in same collision domain and on total allowed geographical coverage
Cannot connect different Ethernet types (e.g., 10BaseT and 100baseT)

Bridges

Link-layer devices:

Store, forward Ethernet frames
Examine incoming frame’s MAC address, selectively forward frame to one-or-more outgoing links when frame is to be forwarded on segment, uses CSMA/CD to access segment

Advantages:

Isolates collision domains resulting in higher total max throughput, and does not limit the number of nodes nor geographical coverage
Can connect different type Ethernet since it is a store and forward device
Transparent: no need for any change to hosts LAN adapters

Switches

A switch could be considered a bridge with numerous ports
Switch, or Layer 2 switch is often used interchangeably with bridge
Plug-and-play, self-learning

Switches do not need to be configured

Allows multiple simultaneous transmissions

Hosts have dedicated, direct connection to switch
Switches buffer packets
Ethernet protocol used on each incoming link, but no collisions; full duplex

Each link is its own collision domain

Switching: A-to-A' and B-to-B' simultaneously without collisions (not possible with dumb hub)

Switch Table

Each entry in table has (MAC address of host, interface to reach host, time stamp)
Looks like a routing table
Entries created, maintained in switch table like a routing protocol

Self-Learning

Switch learns which hosts can be reached through which interfaces

When frame received, switch 'learns' location of sender: incoming LAN segment
Records sender/location pair in switch table (initially is empty)

Frame Filtering/Forwarding

When frame received:

record link associated with sending host
index switch table using MAC destination address
IF entry found for destination
THEN
   IF destination on segment from which frame arrived
      THEN drop the frame
      ELSE forward the frame on interface indicated
   ELSE flood

Forward on all but the interface on which the frame arrived (flooding)

Interconnecting Switches

Switches can be connected together
Self-learning to know where to send frames

With loops, incorrect learning occurs

Spanning Trees

Allow a path between every LAN without causing loops (loop-free environment)
BRidges communicate with special configuration messages (BPDUs)
Standardized by IEEE 802.1D
Requirements:

Each bridge is assigned a unique identifier
A broadcast address for bridges on a LAN
A unique port identifier for all ports on all bridges

MAC address
Bridge ID + port number

Example: Spanning Tree

Spanning Tree Algorithm

Overview

(1) Determine root bridge among all bridges

(2) Each bridge determines its root port

The port in the direction of the root bridge

(3) Determine the designated bridge on each LAN

The bridge which accepts frames to forward towards the root bridge
The frames are sent on the root port of the designated bridge

Selecting root bridge

Initially, each bridge considers itself to be the root bridge
Bridges send BDPU frames to its attached LANs

The bridge and port ID of the sending bridge
The bridge and port ID of the bridge the sending bridge considers root
The root path cost for the sending bridge

Best one wins: (lowest root ID/cost/priority)
Selecting root ports

Each bridge selects one of its ports which has the minimal cost to the root bridge
In case of a tie, the lowest uplink (transmitter) bridge ID is used
In case of another tie, the lowest port ID is used

Selecting designated bridges forwarding/blocking state

Same as selecting the root bridge:

Initially, each bridge considers itself to be the designated bridge, send BDPU frames to attached LANs, best one wins
Root and designated bridges will forward frames to and from their attached LANs
All other ports are in the blocking state

Example: Spanning Tree Execution

Switches vs Routers

Both store-and-forward devices

Routers: network layer devices (examine network layer headers)
Switches: link layer devices

Routers maintain routing tables, implement routing algorithms
Switches maintain switch tables, implement filtering, learning algorithms

Back to Navigation

Tutorial 11: Wi-Fi LANs

Introduction

Wi-Fi or WiFi: defined as any "wireless local area network (WLAN) products that are based on the IEEE 802.11 standards
Popular technology that allows an electronic device to exchange data wirelessly (using radio waves) over a computer network
IEEE established the 802.11 Group in 1990. Specifications for standard ratified in 1997

Initial speeds were 1 and 2 Mbps
IEEE modified the standard in 1999 to include 802.11a (54 Mbps) and 802.11b (11 Mbps). Incidentally, 802.11b equipment was available before 802.11a
802.11g (54 Mbps) was added in 2003

The Basics

In many respects, the IEEE 802.11b wireless LAN (WLAN) standard is similar to that of classic IEEE 802.3 (Ethernet) LANs
Similarities:

LAN with limited geographic coverage
Multiple stations, with 48-bit MAC addresses
Shared transmission medium (broadcast technology)
CSMA-based Medium Access Control protocol
Comparable data rates (11 Mbps vs 10 Mbps)

But there are also many distinct differences:

Wireless (air interface) versus wired (coax)
Wireless propagation environment (multipath)
Higher error rate due to interference, fading, etc.
Successful frames are ACKed by receiver
Mobile stations versus fixed stations
Half-duplex versus full-duplex operation
"Hidden node" and "exposed node" problems (exposed node problem occurs when a node is prevented from sending packets to other nodes due to a neighboring transmitter)
Potential asymmetries of links
CSMA/CA versus CSMA/CD
Multiple data transmission rates (1, 2, 5.5, 11)

Wi-Fi Features

Infrastructure mode vs "ad hoc" mode:

Devices in a wireless network are set up to either communicate indirectly through a central place — an access point — or directly, one to the other

Access Point (AP) sends "beacon frames" – all the information about the network (e.g. Timestamp Capability information)

Mobiles choose AP based on signal strength

Multiple channel access protocols supported

CSMA/CA (DCF – Distributed coordination function); PCF (Point Coordination Function); RTS/CTS (Request to Send/Clear to Send)

MAC-layer can provide error control, retransmission, rate adaptation, etc.
Direct Sequence Spread Spectrum (DSSS)

Modulation technique
Signal spread across 22 MHz with each channel based around a center frequency. There are fourteen 22-MHz channels

Where does Wireless RF Live?

Protocol Stack View

Wireless Cells

In Canada/US, there are eleven 802.11b/g channels
Only channels 1, 6 and 11 are non-overlapping
Computers can roam between cells

Medium Access Control (MAC)

Carrier Sense Multiple Access with Collision Avoidance (CSMA-CA)

Device wanting to transmit senses the medium (air)
If medium is busy, it defers
If medium is free for certain period (DIFS – Distributed Inter-Frame Space), it transmits frame
DIFS is approximately 128 µs
Latency can increase if "air" is very busy since devices will have a hard time finding "open air" to send frames

MAC Protocol

DIFS – Distributed Inter-Frame Space (approx. 128 µs)
SIFS – Short Inter-Frame Space (approx. 28 µs)
Every frame is ACKed except broadcast/multicast

MAC Retransmission

If no ACK received 'right away', then the sender retransmits the frame again at the MAC layer

Indicates frame (or ACK) was lost/corrupted
Very short timeout (e.g., 1 ms)
Exponential back off (doubling) if repeated loss

Typically recovers before TCP would notice
Max retransmission limit (e.g. 8)
May do MAC-layer rate adaptation or frame fragmentation if channel error rate is high

Other MAC Protocols Supported

Point Coordination Function (PCF)

AP polls stations in turn to see if there’s frames to send
Useful for real-time traffic

Request-To-Send/Clear-To-Send (RTS/CTS)

Reservation-based approach (ask permission)
Useful for very large frames
Useful for solving the "hidden node" problem (when a node is visible from a wireless access point (AP), but not from other nodes)
Request asks for clearance (permission) to send
Request also indicates time required for transmit

Frame Formats

Two frame formats available:

Long preamble
Short preamble

Configuration option for NIC (Network interface controller) and AP
Variable-size frames (maximum of 2312 bytes for payload)
16-bit Cyclic Redundancy Code (CRC) for error checking of frames

Preamble

Long

Long Preamble = 144 bits

Interoperable with older 802.11 devices
Entire Preamble and 48 bit PLCP (Physical Layer Convergence Protocol) header sent at 1 Mbps

Short

Short Preamble = 72 bits

Preamble transmitted at 1 Mbps
PLCP (Physical Layer Convergence Protocol) header transmitted at 2 Mbps
More efficient than long preamble

More Features

Power Management

Mobile nodes can 'sleep' to save power

Accomplished by using DTIM (Delivery Traffic Indication Message) intervals

AP will buffer frames until client requests them
AP can use virtual bitmap field in beacons to indicate which stations have data waiting

Security

Wired Equivalent Privacy (WEP)

Not very secure at all!

To address this weakness, there is WPA2 (IEEE 802.11i)

Uses CCMP, an AES-based encryption mode with strong security

Back to Navigation

Tutorial 12: Transmission Media

The Physical Layer

Transfer information over a transmission medium
Converts bit streams into electrical or optical signals (and back)
The signal propagates over the transmission medium at different speeds depending on the physical characteristics of the medium

An electromagnetic wave propagates through vacuum at a speed of c=3*108 m/s and at lower speeds in other materials

Quality of Transmission Media

A good transmission medium should provide communication with good quality at long distance
For voice communication, quality of communication is determined by the voice quality
For data communication, however, the quality of communication is mainly determined by the effective data rate of communication

Classes of Transmission Media

Conducted or guided media

Use a conductor such as a wire or a fiber optic cable to move the signal from sender to receiver

Wireless or unguided media

Use radio waves of different frequencies and do not need a wire or cable conductor to transmit signals

Physical Layer Transmission Media

Design Factors

Bandwidth: All other factors remaining constant, the greater the band-width of a signal, the higher the data rate that can be achieved
Transmission impairments: Limit the distance a signal can travel
Interference: Competing signals in overlapping frequency bands can distort or wipe out a signal
Number of receivers: Each attachment introduces some attenuation and distortion, limiting distance and/or data rate

Guided Transmission Media

Transmission capacity depends on the distance and on whether the medium is point-to-point or multipoint
Twisted pair wires

Consists of two insulated copper wires arranged in a regular spiral pattern to minimize the electromagnetic interference between adjacent pairs
Low frequency transmission medium (traditional phone wires, 10 or 100 Mbps Ethernet, enhanced Cat 5 and 6 can handle gigabit+ Ethernet)
Unshielded Twisted Pair (UTP)

Each wire is insulated with plastic wrap, but the pair is encased in an outer covering
Category 3 UTP (16MHz Bandwidth; 10BASE-T/100BASE-T4)
Category 5 UTP (100MHz Bandwidth; 100BASE-T/1GBASE-T with Cat 5e)

More tightly twisted than Category 3 cables
Four pairs of copper wire

Category 6 UTP (250MHz Bandwidth; 10GBASE-T up to 55 meters)

Shielded Twisted Pair (STP)

The pair is wrapped with metallic foil or braid to insulate the pair from electromagnetic interference
Category 6a STP (500MHz Bandwidth; 10GBASE-T up to 100 meters)

Advantages

Inexpensive and readily available
Flexible and light weight
Easy to work with and install

Disadvantages

Susceptibility to interference and noise
Attenuation problem
For analog, repeaters needed every 5-6km
For digital, repeaters needed every 2-3km
Relatively low bandwidth

Coaxial Cable

Also known as Coax
Used for cable television, LANs, telephony
Has an inner conductor surrounded by a braided mesh
Both conductors share a common center axial, hence the term "co-axial"
Characteristics

Higher bandwidth

400 to 600Mhz
Up to 10,800 voice conversations

Can be tapped easily
Less susceptible to interference than twisted pair
High attenuation rate makes it expensive over long distance (needs amplifiers every few km)

Fiber Optic Cable

Glass fiber carrying light pulses, each pulse a bit
Greater capacity: high-speed point-to-point transmission (10’s-100’s Gbps)
Smaller size and lighter weight
Lower attenuation (fewer repeaters)
Low error rate (immune to electromagnetic noise)
Hard to tap
Multimode step-index fiber

The reflective walls of the fiber move the light pulses to the receiver

Multimode graded-index fiber

Acts to refract the light toward the center of the fiber by variations in the density

Single mode fiber

The light is guided down the center of an extremely narrow core

Unguided Media: Wireless

Radio Link Types:

Terrestrial microwave

Up to 45 Mbps channels

LAN (e.g., WiFi)

11 Mbps, 54 Mbps

Wide-area (e.g., cellular)

3G: ~ 1 Mbps
LTE: ~ 300 (DL), 75 (UL) Mbps

Satellite

Kbps to 45 Mbps channel (or multiple smaller channels)
270 msec end-end delay
Geosynchronous versus low altitude

Common characteristics

Signal carried in electromagnetic spectrum
No physical “wire”
Multidirectional
Propagation environment effects:

Reflection
Obstruction by objects
Interference

Data Encoding Techniques

Digital Data, Analog Signals [modem]
Digital Data, Digital Signals [wired LAN]
Analog Data, Digital Signals [codec]

Frequency Division Multiplexing (FDM)
Wave Division Multiplexing (WDM) [fiber]
Time Division Multiplexing (TDM)
Pulse Code Modulation (PCM) [T1] (digital transmission systems developed by Bell Labs)
Delta Modulation

Digital Data, Digital Signals

Techniques used in a number of LANs
Digital signal – is a sequence of discrete, discontinuous voltage pulses
Bit duration: the time it takes for the transmitter to emit the bit
Issues

Bit timing
Recovery from signal
Noise immunity

Binary Encoding

NRZ (non-return to zero)
NRZI (NRZ inverted)
Manchester (used by IEEE 802.3, 10 Mbps Ethernet)

Non-Return to Zero (NRZ)

Encode binary data onto signals

E.g. 0 as low signal and 1 as high signal
Voltage does not return to zero between bits

Known as Non-Return to Zero (NRZ)

Problem: consecutive 1s or 0s

Low signal (0) may be interpreted as no signal
High signal (1) leads to baseline wander
Unable to recover clock

Sender’s and receiver’s clock have to be precisely synchronized
Receiver resynchronizes on each signal transition
Clock drift in long periods without transition

Non-Return to Zero Inverted (NRZI)

Has a transition at a clock boundary if the bit being transmitted is "1"
Stay at current signal (maintain voltage level) to encode/transmit a "0"
Solves the problem of consecutive ones (shifts to 0s)
NRZI can have long series of zeros, still unable to recover clock

Manchester

In IEEE 802.3 - 10 MBps Ethernet
Split cycle into two parts

Send high--low for "1", low--high for "0"
Transmit XOR of NRZ encoded data and the clock

Clock signal can be recovered from the encoded data
Only 50% efficient (1/2 bit per transition): double the transmission rate

Different Encoding Schemes

Back to Navigation

Courses/Computer Science/CPSC 441.W2014/Tutorials

Contents

Introduction

Course Information

Assignment 1: Web Proxy

Tutorial 1: C Programming Review

Main Class

Primitive Data Types

Arrays

Structures

Pointers

Pointers in C

Strings

String Library

Formatting Strings

Standard C Library

Tutorial 2: Socket Programming

Internet Protocol Stack

What is a Socket?

Most Popular Types of Sockets

Socket Creation in C

Ports

Addresses, Ports and Sockets

The Bind Function

On the Connecting End

Connection Setup

Server and Clients

Connection Setup Steps

Server Socket: Listen & Accept

Connect

Sending/Receiving Data

Close

The struct sockaddr

Example

Network Byte-Ordering

Tips

Tutorial 3: HTTP Servers and Proxies

Demo: echoserver

Introduction to HTTP

HTTP Request Generation

What Happens Next?

Web Server Role

HTTP Request Types

HTTP Request Format

Response Format

HTTP Response Types

HTTP Server in a Nutshell

Initializing a Server

Processing a Request

Responding to a Request

Proxy Server

Tutorial 4: HTTP Protocol Specification

What is HTTP?

Structure of HTTP Transactions

Generic HTTP Header Format

Initial Request Line

Initial Response Line

The Message Body

Sample HTTP Exchange

The HEAD Method

The POST Method

POST Method Example

Persistent Connections

Caching

The Date: Header

Conditional Get Example

Redirection Example

HTTP 1.0 vs HTTP 1.1

Assignment Tips

Tutorial 5: Wireshark

Using Wireshark to Capture and Analyze Network Data

Features and Functions

Before Capturing Data

Example: Wireshark Demonstration

Wireshark Filters

Comparison Operators

Logical Expressions

Slice Operator

Capture Filters

Display Filters

`Main` Class

The `Bind` Function

The `struct sockaddr`

Demo: `echoserver`