Courses/Computer Science/CPSC 441.W2014/Tutorials

From wiki.ucalgary.ca
< Courses‎ | Computer Science‎ | CPSC 441.W2014
Revision as of 05:04, 14 January 2015 by Cmah (talk | contribs)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to: navigation, search

Navigation: Tutorials

Assignment 1

Assignment 2

Important Information

Tut 1
Tut 6
Tut 8
Tut 2
Tut 7
Tut 9
Tut 3
Tut 10
Tut 4
Tut 11
Tut 5
Tut 12

Contents

Introduction

Hello, my name is Carrie Mah and I am currently in my 3rd year of Computer Science with a concentration in Human Computer Interaction. I am also an Executive Officer for the Computer Science Undergraduate Society. If you have any questions (whether it be CPSC-related, events around the city, or an eclectic of random things), please do not hesitate to contact me.

I hope you find my notes useful, and if there are any errors please correct me by clicking "Edit" on the top of the page and making the appropriate changes. You should also "Watch this page" for the CPSC 441 page to check when I upload notes. Just click "Edit" and scroll down to check mark "Watch this page."

You are welcome to use my notes, just credit me when possible. If you have any suggestions or major concerns, please contact me at cmah[at]ucalgary[dot]ca. Thanks in advance for reading!

Course Information

Disclaimer: This is a page created by a student. Everything created and written by me is not associated with the Department of Computer Science or the University of Calgary. I am not being paid to do this nor am getting credit. I was a class scribe for CPSC 457 and enjoyed it so much that I wanted to continue it with a course related to it. I find writing Wiki notes for courses I am not familiar with are helpful for me, so I hope you find these notes helpful. I encourage that you still write your own notes, as it helps you retain information, but my notes are public and free to use!

This course is for W2014, taught by Dr. Carey Williamson.

The course website is here: http://pages.cpsc.ucalgary.ca/~carey/CPSC441

All tutorial slides are on the d2l website under "Tutorial Materials"

Assignment 1: Web Proxy

Tutorial 1: C Programming Review

C versus Java
C Program Java Program
  • Collection of functions
  • One function main() is called by the operating system as the starting function
  • Compile output: executable file
  • Running the executable (default name a.out) starts main function
  • Typically, single program with all user code linked in - but can be dynamic libraries (.dll, .so)
  • Note: C++ is C extended with object oriented functionality (and more)
  • Collection of classes
  • Class containing main method is the starting class
  • Compile output: jar file
  • Running java StartClass invokes StartClass.main method
  • JVM loads other classes as required
  • Compiling C/C++
  • gcc is a driver which calls the preprocessor, compiler (cc1 or cc1plus), assembler, and linker as needed
  • gcc hello.c, a.out
  • gcc hello.c -o hello, ./hello
  • Compiler options
  • [-o <file>]: specifies output file for object or executable
  • [-Wall]: shows all warnings (highly recommended)
  • [-l libnam]: links the library libnam (e.g. -lsocket)
  • If you get errors saying the library cannot be found, make sure the path is correctly set and you have the libraries you need

Main Class

int main(int argc, char *argv[])
  • argc: number of arguments passed to the program
  • argv: array of strings showing command line arguments
  • Name of executable and space-separated arguments
  • Name of executable is stored in argv[0]
  • Return value is type int
  • Convention: 0 means success, > 0 is some error

Primitive Data Types

  • char: character or smaller integer, 1 Byte
  • short int (short): short integer, >=2 Bytes
  • int: integer (most efficient), >=2, typically 4 Bytes
  • long int (long): long integer, >=4 Bytes
  • long long int: long long integer, >=8 Bytes
  • float: floating point number, 4 Bytes
  • double: double precision floating point number, 8 Bytes
  • long double: long double precision floating point number, >=8, typically 16 Bytes


  • Typecasting Example
  • Typecasting int to double (explicit)
  • f = (double) i;
  • Typecasting double to int (implicit)
  • unsigned long int i; i = 1.2;

Arrays

  • Array declaration (on the stack): int a[];
  • C/C++ arrays have no length attribute
  • Note: when passing an array to a function, typically have to pass the array size as a separate argument as well
  • Must take care of array bounds yourself, as you may not run into compiling errors but your program might behave unexpectedly or crash
  • Array's name is a pointer to its first element

Structures

  • C struct is a way to logically group related types
  • Similar to (not same as) C++/Java classes
  • Somehow a class without methods
  • Members are always public (no encapsulation concept in C)
  • A struct component can be of any type (including other struct types), but cannot be recursive, unless it is a pointer to itself
  • In C, there are two different namespaces of types:
  • A namespace of struct/union/enum tag names
  • A namespace of typedef names
  • These can return something
  • In C++, all struct/union/enum/class declarations act like they are implicitly typedef'd, as long as the name is not hidden by another declaration with the same name
  • Example:
struct address {
   char* street;
   char* city;
   char* zip;
};

typedef struct {
   char* name;
   unsigned int ID;
   struct address Address;
} student_item;

struct link_list {
   student_item student_info;
   struct link_list *next;
};
typedef struct link_list student;

Pointers

  • A pointer is just an address to some memory location
  • Another variable
  • Some dynamically allocated memory
  • Some function
  • Even NULL
  • Reference operator: &
  • Dereference operator: *
  • Examples:
int *p = &x;
int x = 4;
  • &x(address of x), x contains 4
int *p = malloc(sizeof int);
  • Address of allocated memory -> ? (allocated memory)

Pointers in C

  • Declaration
  • Use * before variable name
  • int * ptr = NULL; // creates pointer for integer
  • Allocation
  • Allocate new memory to a pointer using malloc in C (new in C++)
  • int *p = malloc(sizeof(int));
int *p = (int *) malloc(10 * sizeof(int)); // array of int
  • Deallocation
  • Clear the allocated memory when you are using it, otherwise you have a memory leak
  • free(p);
  • Dereferencing
  • Accessing data from the pointer
  • x = *p;
  • Referencing
  • Getting the memory location for the data
  • p = &x;

Strings

  • In C, a string is an array of char terminated with \0 (a null terminator)
  • Example: "hello" is hello\0

String Library

  • #include <string.h>
  • Functions:
  • char *strcpy(char *dest, char *source)
  • Copies chars from source array into dest array up to NULL
  • char *strncpy(char *dest, char *source, int num)
  • Copies chars; stops after num chars if no NULL before that; appends NULL
  • int strlen(const char *source)
  • Returns number of chars, excluding NULL
  • char *strchr(const char *source, const char ch)
  • Returns pointer to first occurrence of ch in source; NULL if none
  • char *strstr(const char *source, const char *search)
  • Return pointer to first occurrence of search in source

Formatting Strings

  • int sscanf(char *string, char *format, ...)
  • Parse the contents of string according to format
  • Return the number of successful conversions
  • Formatting codes:
  • %c: char, matches single character
  • %d: int, matches an integer in decimal
  • %f: float, matches a real number
  • %s: char *, matches a string up to a white space
  • %[^c]: char *, matches a string up to next c char
  • Values normally right-justified; use negative field width to get left-justified
  • %nc: char, char in field of n spaces
  • %nd: int, integer in field of n spaces
  • %n.mf: float, double; real number in width n.m decimals
  • %n.ms: char *, first m chars from string in width n
  • %%: writes a single % to the stream
  • int sprintf(char *buffer, char *format, ...)
  • Produce a string formatted according to format directives and place this string into the buffer

return number of successful conversions

Standard C Library

  • #include <stdio.h>
  • Formatted I/O
  • int scanf(const char *format, ...)
  • Read from standard input and store according to format
  • int printf(const char *format, ...)
  • Write to standard output according to format
  • File I/O: FILE *
  • FILE *fopen(const char *path, const char *mode)
  • Open a file and return the file descriptor
  • int fclose(FILE *stream)
  • Close the file; return 0 if successful, EOF if not
  • Other I/O operations:
  • int getchar()
  • Read the next character from stdin; returns EOF if none
  • char *fgets(char *buf, int size, FILE *in)
  • Read the next line from a file into buf
  • int fputs(const char *str, FILE *out)
  • Output the string to a file, stopping at ‘\0’
  • Returns number of characters written or EOF

Back To Navigation

Tutorial 2: Socket Programming

Internet Protocol Stack

  • Basic, low level layer: physical
  • Medium between laptops and people: air
  • Physical tool: vocal chords, ears
  • May be a middle man between the two machines
  • Highest layer: application
  • Virtual connection between machines
  • Nothing directly from your keyboard to Google's servers
  • Below that: transport layer
  • Envelope passed from browser to transport layer (handled by OS)
  • Virtual connection to another machine - send lots of envelopes in a large bag
  • Below that: network layer
  • Think of it like the main, large postal office
  • Speaks language that every computer in the world understands
  • IP protocol - common rules for different devices
  • Virtual connection between machines
  • Below that: link layer
  • Physical transmission, depends on the machine you have
  • Connects to physical layer

What is a Socket?

  • Interface between the application and the network (the lower levels of the protocol stack)
  • Application and transport layer
  • The application creates a socket
  • Asks the OS to creates socket to send/receive things
  • The socket type dictates the style of communication
  • Reliable (TCP) vs best effort (UDP)
  • OS makes sure things get sent
  • Connection-oriented vs connectionless
  • Once a socket is setup the application can:
  • Pass data to the socket for network transmission
  • Receive data from the socket (transmitted through the network, sent by some other host)

Most Popular Types of Sockets

  • 1st assignment: TCP, 2nd assignment: may use UDP
  • TCP Socket
  • Type: SOCK_STREAM
  • Reliable delivery
  • In-order guaranteed
  • Connection-oriented
  • Bidirectional
  • When you send something, guaranteed it's delivered or error message saying it can't be sent
  • Packets are in same order, sent/receive something
  • UDP Socket
  • Type: SOCK_DGRAM
  • Unreliable delivery
  • No order guaranteed
  • No notion of 'connection' - app indicates destination for each packet
  • Can send or receive

Socket Creation in C

  • How to open a file in C?
  • Have a descriptor (integer), tells OS which file you're talking about
  • int s = socket(domain, type, protocol);
  • You need socket library
  • s: socket descriptor, an integer (like a file-handle)
  • domain: integer, communication domain
  • IPv4, IPv6 - network layer, default protocols, difference is addressing space
  • Example: PF_INET (IPv4 protocol) - typically used
  • type: communication type
  • UDP, TCP
  • SOCK_STREAM: reliable, two-way, connection-based service
  • SOCK_DGRAM: unreliable, connectionless
  • Other values: need root permission, rarely used, or obsolete
  • protocol: specifies protocol (see file /etc/protocols for a list of options) - usually set to 0
  • socket call does not specify where data will be coming from, nor where it will be going to; it just creates the interface
  • Sets up interface, no 'virtual connection' established yet; just establishing the ones between layers of the same machine
  • Have not reserved a port yet (connection from transport -> application layer)

Ports

  • Each host machine has an IP address (or more)
  • Each host has 65 536 ports (2^16)
  • Some ports are reserved for specific apps
  • 20, 21: FTP
  • 23: Telnet
  • 80: HTTP
  • See RFC 1700 (about 2000 ports are reserved)
  • A socket provides an interface to send data to/from the network through a port
  • Can have multiple sockets opened in one program for OS

Addresses, Ports and Sockets

  • Like apartments and mailboxes
  • You are the application
  • Your apartment building address is the address
  • Your mailbox is the port
  • The post office is the network
  • The socket is the key that gives you access to the right mailbox (one difference: assume outgoing mail is placed by you in your mailbox)
  • How to choose which port a socket connects to?

The Bind Function

  • Bind function associates and (can exclusively) reserves a port for use by the socket
  • int status = bind(sockid, &addrport, size);
  • status: error status, -1 if bind failed
  • sockid: integer the socket call returned, socket descriptor
  • addrport: struct sockaddr, the (IP) address and port of the machine (address usually set to INADDR_ANY - chooses a local address
  • Particular structure that includes IP and port of machine that you're on
  • Want to own port number, anything sent to there is given to your machine
  • Need to tell OS on which IP and port to listen to because one machine can have multiple IPs depending on the physical interfaces
  • INADDR_ANY: include all IP addresses this machine is assigned to; send whatever the port is, send to me
  • size: size (in bytes) of addrport structure
  • Need size and struct because & sends pointer only, size to deal with structure itself

On the Connecting End

  • When connecting to another host (i.e. connecting end is the client and the receiving end is the server), the OS automatically assigns a free port for the outgoing connection
  • During connection setup, receiving end is informed of port
  • Server listens for incoming requests (for someone to connect to server)
  • You can bind to a specific port if need be
  • From machine2, moves through layers so machine1 establishes virtual connection to machine2 (using connect)

Connection Setup

  • A connection occurs between two ends
  • Server: waits for an active participant to request connection
  • Client: initiates connection request to passive side
  • Once connection is established, server and client ends are 'similar'
  • Both can send and receive data
  • Either can terminate the connection

Server and Clients

xpmgP8t.png

  • connect() calls go down to protocol stack
  • accept() waits until something comes in and accepts; now connection established and both OS takes care of the delivery of packets
  • When syscall comes back, can start to read/write (send whatever you want) and now both sides are equal because both sides can read/write

Connection Setup Steps

  • Step 1: Server listens for incoming requests
  • Step 2: Client requests and establishes connection
  • Step 3: Server accepts a request
  • Step 4: Client and Server sends/receive
  • The accepted connection is on a new socket
  • The old socket continues to listen for other active participants
  • Server has a socket (bound to port)
  • Client, from its own OS gets a socket, wants to connect to l-sock
  • When l-sock receives client call, it creates a socket exclusive (but l-sock still listens)

Server Socket: Listen & Accept

  • Called on server side:
  • int status = listen(sock, queuelen);
  • status: 0 if listening, -1 if error
  • sock: integer, socket descripor
  • queuelen: integer, number of active participants that can 'wait' for a connection
  • Can access multiple calls, can define how many to wait in queue before serving one-by-one
  • Handles queue length number of trying to establish connection
  • Listen is non-blocking: returns immediately
  • int s = accept(sock, &addr, &addrlen);
  • s: integer, the new socket (used for data-transfer)
  • sock: integer, the original socket (being listened on)
  • Pass in sock (listening), spawns out another descriptor that takes care of a specific connection
  • addr: struct sockaddr, address of the active participant
  • addrlen: sizeof(addr) - value/result parameter
  • Must be set appropriately before call
  • Adjusted by OS upon return
  • Picks something from queue from listen and establish connection
  • Accept is blocking: waits for connection before returning
  • System call - tells OS "give me something from queue. If nothing, you're going to wait until something comes into the queue"

Connect

  • From client:
  • Don't need to pre-define port, connect() - OS assigns you a random free port, and you don't even know it
  • Don't need it because you're not even listening, the OS is the one that needs about it; no need to bind (implicit)
  • When you send data to server, port number is sent - only focused on what connect does
  • int status = connect(sock, &addr, addrlen);
  • status: 0 if successful connect, -1 otherwise
  • sock: integer, socket to be used in connection
  • addr: struct sockaddr - address of server
  • addrlen: integer, sizeof(addr)
  • connect is blocking

Sending/Receiving Data

  • Buffers: pointers, memory management
  • Get some memory from OS, pass on handle from memory with send and send on particular socket
  • If sock is not valid, will get error message
  • int count = send(sock, &buf, len, flags);
  • count: number of bytes transmitted (-1 if error)
  • buf: void*, buffer to be transmitted
  • len: integer, length of buffer (in bytes) to transmit
  • flags: integer, special options, usually just 0
  • How to do sending - block, type of error message to receive
  • int count = recv(sock, &buf, len, flags);
  • count: number of bytes received (-1 if error)
  • buf: void*, stores received bytes
  • len: integer, number of bytes received
  • flags: integer, special options, usually just 0
  • Want to receive something, need to get some portion of memory and pass function to OS to write whatever comes in in that bulk of memory
  • Tell how much space is available
  • Typecasting - sending doesn't matter, just needs to know about bit sections (word you're saying)
  • Only needs len

Close

  • Close socket to release port
  • If don't close, OS still thinks ports is used
  • Closes connection, frees up the port used by the socket
  • status = close(s);
  • status: 0 if successful, -1 if error
  • s: the file descriptor (socket being closed)

The struct sockaddr

  • The struct to store the Internet address of a host:
struct sockaddr_in {
   short          sin_family;
   u_short        sin_port;
   struct in_addr sin_addr;
   char           sin_zero[8];
};
  • sin_family
  • Specifies the address family
  • Example: AF_INET
  • sin_port
  • Specifies the port number (0-65535)
  • sin_addr
  • Specifies the IP address
  • sin_zero
  • Unused!

Example

struct sockaddr_in server;                  // definition
memset(&server, 0, sizeof(server));         // init to 0
server.sin_family = AF_INET;                // address family
server.sin_port = htons(MYPORTNUM);         // port
server.sin_addr.s_addr = htonl(INADDR_ANY); // address
  • Host Byte-Ordering: the byte ordering used by a host (big-endian or little-endian)
  • Network Byte-Ordering: the byte ordering used by the network – always big-endian
  • Any words sent through the network should be converted to Network Byte-Order prior to transmission (and back to Host Byte-Order once received)

Network Byte-Ordering

  • u_long htonl(u_long x);
  • u_short htons(u_short x);
  • u_long ntohl(u_long x);
  • u_short ntohs(u_short x);
  • On big-endian machines, these routines do nothing
  • On little-endian machines, they reverse the byte order
  • Example:
  • 128, 119, 40, 12 (htonl) - Big-Endian machine
-> 12, 40, 119, 128 (ntohl) - Little-Endian machine

Tips

  • Sometimes, an ungraceful exit from a program (e.g. ctrl-c) does not properly free up a port
  • Eventually (after a few minutes), the port will be freed
  • You can kill the process, or to reduce the likelihood of this problem, include the following code:
  • In header include:
#include <signal.h>
void cleanExit(){exit(0);}
  • In socket code add:
signal(SIGTERM, cleanExit);
signal(SIGINT, cleanExit);
  • Q: How to find the IP address of the machine my server program is running on?
  • Use 127.0.0.1 or localhost for accessing a server running on your local machine
  • For a remote server running Linux use the bash shell command: ifconfig
  • For Windows, use cmd to invoke: ipconfig

Back To Navigation

Tutorial 3: HTTP Servers and Proxies

Demo: echoserver


  • Get a socket handler
  • First: initialization, then parsing input arguments
  • If valid port not given, one will be generated within a range
  • Get a socket at line 47 - listensocket
  • Line 86: SOCK_STREAM is TCP - a stream, flow of data will continue vs UDP which sends only one packet with no two-sided connection
  • Always check if this is successful or not


  • Bind to a port
  • Get variable in line 50
  • Structure needed for binding purposes
  • Line 77: memset has bulk of memory set to 0
  • Line 79: byte ordering between PCs and network machines are different
  • Big endian vs little endian: 8-bit to 16-bit to 32-bit machines (0->32)
  • How many bits you read depends on bus size
  • Example: bus size of 16 and want sto read 32-bit string, starts from left hand bulk then right
  • If you're a little endian machine and you're setting up address, ensure most and least significant bit are properly set up, so that when things travel between computers there's no confusion
  • Also do the same thing when setting up address - use any address whic hbelongs to this machine


  • Check if socket exists
  • Line 94: binds socket
  • Check if bind function is -1 (unsuccessful)
  • Line 100: listen for incoming connection requests
  • Give socket you prepared and listen to queue length 5 (FIFO)
  • While serving incoming connection, if 10 other incoming requests are lined up before accepted, there's not going to be room for them
  • Line 116: want to know who client is - passing along an address structure to accept function


  • Active socket - the socket accept returns, end point for pipe between me and client
  • Line 108: ensure you're successful
  • Want to port process so main process can still listen for other requests, and the second process serves/talk to the client; this way, one main process is listening in a loop. When something comes in, it spawns a new process to communicate to
  • Line 112: no need for listensocket because you're child process (fork)
  • Line 117: Want to figure out IP and port of client, get some memory
  • Want to look at IP before, the actual address, and the other IP (want to be populated)


  • Read - this server receives something and sends it back
  • Want to have some memory allocated
  • Want to read from active socket. If any bytes received, sends back

echoserverHTTP demo

  • In browser: copy IP of machine you're talking to (ex. CPSC server) and port
  • Echoes what browser is sending
  • keep-alive: by default, keeps connection alive unless one is closed actively or times out
  • Use same pipe from same server otherwise you need to set up another connection

Introduction to HTTP

  • HTTP: HyperText Transfer Protocol
  • Communication protocol between clients and servers
  • Application layer protocol for www
  • Client/Server model
  • Server owns information and gives whatever you need, client doesn't provide information unless for credentials or info on server
  • Client: browser that requests, receives, displays object
  • Server: receives requests and responds to them
  • Protocol consists of various operations
  • Few for HTTP 1.0 (RFC 1945, RFC 1996)
  • Many more in HTTP 1.1 (RFC 2616, RFC 1999)

HTTP Request Generation

  • Scheme is first thing - what kind of protocol? http/https/ftp
  • Next thing after :// is the host you're trying to contact
  • Hostname is converted from a name to a 32-bit IP address (DNS lookup)
  • DNS lookup - servers that contain translation for all the things (register with some place that keeps track of all names)
  • Connection is established to server (TCP)
  • Usually if it's http, you're on port 80; can have servers that work on other ports
  • After /, the path is meaningful to that particular server
  • Path from base folder ('home' folder) or looking up database and see what's the redirection

What Happens Next?

  • Client downloads HTML page
  • Sends back a file (or script), usually HTML file with some formatting
  • Typically in text format (ASCII)
  • Contains instructions for rendering (e.g. background color, frames)
  • Many have embedded objects
  • Images: GIF, JPG (logos, banner ads)
  • Embedded objects usually automatic as you can ensure you get page
  • Usually automatically retrieved
  • I.e. without user involvement
  • User may be able to change setting for automatic retrievals

Web Server Role

  • Respond to client requests, typically a browser
  • Could be a proxy sending request on behalf of browser, it forwards user requests
  • Could be a search engine spider or robot (e.g. google)
  • May have work to do on client's behalf
  • Is the client's cached copy still good?
  • Is client authorized to get this document?
  • Hundreds or thousands of simultaneous clients
  • Hard to predict how many will show up on some day (e.g. 'flash crowds', diurnal cycle, global presence)
  • Many requests are in progress concurrently

HTTP Request Types

  • Text-based thing
  • First words of request to shape structure
  • These are called methods
  • GET: retrieves a file (95% of requests)
  • Method sent from browser to web server
  • Some file or resource to be retrieved
  • HEAD: gets meta-data (e.g. modified time)
  • Asks the server to return the response headers only
  • POST: submitting a form/file to a server
  • PUT: stores enclosed document as URI
  • DELETE removed name resource
  • LINK/UNLINK: in 1.0, gone in 1.1
  • TRACE: http 'echo' for debugging (added in 1.1)
  • CONNECT: used by proxies for tunnelling (1.1)
  • OPTIONS: request for server/proxy options (1.1)

HTTP Request Format

  • Format: key word, value
  • Finish with carriage return and line-feed - server figures out where header ended
  • Messages in ASCII (human-readable)
  • Headers may communicate private information (browser, OS, cookie information, etc.)

Response Format

  • When server generates - header browser isn't going to show
  • Code that tells browser what kind of response (successful, error, redirection) it is
  • Only thing important is first line (e.g. HTTP/1.0 200 OK)

HTTP Response Types

  • 1XX: Informational
  • 100 Continue, 101 Switching Protocols
  • 2XX: Success
  • 200 OK, 206 Partial Content
  • 3XX: Redirection
  • 301 Moved Permanently, 304 Not Modified
  • 4XX: Client error
  • 400 Bad Request, 403 Forbidden, 404 Not Found
  • 5XX: Server error
  • 500 Internal Server Error, 503 Services Unavailable, 505 HTTP Version Not Supported

HTTP Server in a Nutshell

Initialize;
Setup the listening socket;
forever do {
  get request;
  process;
  send response;
  log request (optional);
}
  • Get socket, bind
  • Forever loop
  • Accept incoming thing, get request; process what needs to be done; send response and maybe log (print) something

Initializing a Server

s = socket(); /* allocate listen socket */ 
bind(s, 80); /* bind to TCP port 80 */ 
listen(s); /* indicate willingness to accept */ 
while (1) { 
  newconn = accept(s);/* accept new connection */
  ...
}
  • First allocate a socket and bind() it to address
  • HTTP requests are usually sent on TCP to port 80
  • Other services use different ports (e.g. SSL is on 443)
  • Call listen() on the socket to indicate willingness to receive requests
  • Call accept() to wait for a request to come in (blocking)
  • When the accept() returns, we have a new socket which represented a new connection to a client

Processing a Request

remoteIP = getsockname(newconn);
remoteHost = gethostbyname(remoteIP);
gettimeofday(currentTime);
read(newconn, reqBuffer, sizeof(reqBuffer));
reqInfo = serverParse(reqBuffer); 
  • read() is called on a new socket to retrieve request
  • Type of request is determined by parsing the read data
  • For logging purposes (optional, but done by most):
  • Figure out the remote host name
  • Figure out the name of other end
  • Get time of request
  • Optional: who is calling (address, IP), read what client was sending
  • Return a file, figure if the file is there, if it's accessible and read it, send it to client
fileName = parseOutFileName(requestBuffer); 
fileAttr = stat(fileName); 
serverCheckFileStuff(fileName, fileAttr); 
open(fileName);

  • Assuming the request is for a file (e.g. penguin.gif)
  • Test file path and meta-data
  • See if file exists/is accessible
  • Check permissions
  • Check meta-data: e.g. size of file, last modified time
  • Assuming all is OK: open the file

Responding to a Request

read(fileName, fileBuffer);
headerBuffer = serverFigureHeaders(fileName, reqInfo);
write(newSock, headerBuffer);
write(newSock, fileBuffer);
close(newSock);
close(fileName);
write(logFile, requestInfo);
  • Read the file into user space
  • Send HTTP headers on socket
  • Write the file on the socket
  • If connection is not persistent, close the socket
  • Close the open file descriptor
  • Write on the log file

Proxy Server

  • Two faces: one talks to client, another to server
  • Client: plays role of a server
  • Server: plays role of a client
  • For each HTTP request from the browser:
(1) Accepts HTTP requests from the browser
(2) Gets the data from the target web server (or from its cache), modifies the response if need be
(3) Sends HTTP respond with the data
  • Must handle concurrent browser requests
  • May be designed for different purposes

Back to Navigation

Tutorial 4: HTTP Protocol Specification

What is HTTP?

  • HTTP stands for Hypertext Transfer Protocol
  • Used to deliver virtually all files and other data (collectively called resources) on the World Wide Web
  • Usually, HTTP takes place through TCP/IP sockets
  • A browser is an HTTP client
  • It sends requests to an HTTP server (Web server)
  • The standard/default port for HTTP servers to listen on is 80
  • A resource is some chunk of data that is referred to by a URL
  • The most common kind of resource is a file
  • A resource may also be a dynamically-generated content, e.g. query result, CGI script output, etc.

Structure of HTTP Transactions

  • Start with method name (GET, etc.)
  • HTTP uses the client-server model
  • An HTTP client opens a connection and sends a request message to an HTTP server
  • The server then returns a response message, usually containing the resource that was requested
  • After delivering the response, the server closes the connection (except for persistent connections)
  • Format of the HTTP request and response messages
  • Almost the same, human readable (English-oriented)
  • An initial line specifying the method
  • Zero or more header lines
  • A blank line (i.e. a CRLF by itself)
  • An optional message body (e.g. a file, or query data, or query output)

Generic HTTP Header Format

<initial line, different for request vs. response> 
Header1: value1 
Header2: value2 
Header3: value3 
<optional message body, like file or query data; may be many lines, may be binary $&*%@!^$@>

Initial Request Line

  • A request line has three parts, separated by spaces
  • A method name,
  • The local path of the requested resource,
  • And the version of HTTP being used
  • Example: GET /path/to/file/index.html HTTP/1.1
  • GET: most common HTTP request. Says "give me this resource"
  • Method names are always uppercase
  • The path is the part of the URL after the host name, also called the request URI
  • URI is like a URL, but more general
  • The HTTP version always takes the form HTTP/x.x, uppercase

Initial Response Line

  • Status line
  • The HTTP version
  • The response status code: result of the request
  • A reason phrase describing the status code
  • Response categories and most common status codes
  • 1xx: an informational message
  • 2xx: success of some kind
  • 200 OK: the request succeeded, and the resulting resource is returned in the message body
  • 3xx: redirections
  • 301 Moved Permanently
  • 302 Moved Temporarily
  • 303 See Other (HTTP 1.1 only): the resource has moved to another URL
  • 4xx: an error on the client's part
  • 404 Not Found
  • 5xx: an error on the server's part

The Message Body

  • Reading of a string, know when to stop reading/expect how many come
  • Need content length and value
  • After headers, there may be a body of data
  • In a response, may be:
  • Requested resource
  • Explanatory text if there's an error
  • In a request this may be:
  • The user-entered data
  • Uploaded files
  • If an HTTP message includes a body, there are usually header lines in the message that describes the body
  • Content-Type: the MIME-type of the data (e.g. text/html or image/gif)
  • Content-Length: the number of bytes in the body

Sample HTTP Exchange

  • HTTP Request
GET /path/f.htm HTTP/1.1
Host: www.host1.com:80
User-Agent: HTTPTool/1.0 
[blank line here]
  • HTTP Response
HTTP/1.1 200 OK 
Date: Fri, 31 Dec 1999 
23:59:59 GMT 
Content-Type: text/html 
Content-Length: 1354 
<html> 
<body> 
< h1>Happy New Millennium!</ h1> 
(more file contents) ...
</body> 
</html>

The HEAD Method

  • Meta data of page
  • Need for caching
  • HEAD is like a GET request, except:
  • It asks the server to return the response headers only, not the actual resource (i.e. no message body)
  • Used to check characteristics of a resource without actually downloading it
  • HEAD is used when you don't actually need a file's contents
  • The response to a HEAD request must never contain a message body, just the status line and headers

The POST Method

  • POST is used to send data to the server
  • POST is different from a GET request:
  • Data is sent with the request, in the message body
  • There are usually extra headers to describe this message body (e.g. Content-Type and Content-Length)
  • The request URI is not a resource to retrieve; it's usually a program to handle the data you're sending
  • The HTTP response is normally program output, not a static file
  • Example: submitting HTML form data to CGI scripts
  • Content-Type: header is usually application/x-www-form-urlencoded
  • Content-Length: header gives the length of the URL-encoded form data
  • Blank user password - transparent (don't see it but it's there)
  • Accounts get hacked easily

POST Method Example

POST /login.jsp HTTP/1.1
Host: www.mysite.com
User-Agent: Mozilla/4.0
Content-Length: 27
Content-Type: application/x-www-form-urlencoded
userid=me&password=guessme

  • Can use POST request to send whatever data you want, not just form submissions. Just make sure the sender and the receiving program agree on the format
  • GET method can also be used to submit forms. The form data is URL-encoded and appended to the request URI

Persistent Connections

  • Persistent HTTP connection:
  • Used to increase performance, some servers allow persistent HTTP connections
  • The server does not immediately close the connection after sending the response
  • The response should be sent back in the same order as requests
  • The Connection:close header in a request indicates the final request for the connection
  • The server should close the connection after sending the response. Also, the server should close an idle connection after some timeout period

Caching

  • Locally figure out what is up to date, or send a HEAD request and server will tell us if there are any changes
  • Saves bandwidth and improves efficiency
  • Proxy or web browser avoids transferring resources for which a local up-to-date copy exists
  • A copy of the previous content is saved in the cache
  • Upon a new request, first the cache is searched
  • If found in cache, return the content from cache
  • If not in cache, send request to the server
  • But what if the content is out of date?
  • We need to check if the content is modified since last access

The Date: Header

  • We need time-stamped responses for caching
  • Servers must timestamp every response with a Date: header containing the current time e.g. Date: Fri, 31 Dec 1999 23:59:59 GMT
  • All responses except those with 100-level status (but including error responses) must clude the Date: header
  • All time values in HTTP use Greenwich Mean Time

Conditional Get Example

  • REQUEST
GET /sample.html HTTP/1.1
Host: example.com
If-Modified-Since: Wed, 01 Sep 2004 13:24:52 GMT
If-None-Match: “4135cda4″
  • RESPONSE
HTTP/1.1 304 Not Modified
Expires: Tue, 27 Dec 2005 11:25:19 GMT
Date: Tue, 27 Dec 2005 05:25:19 GMT
Server: Apache/1.3.33 (Unix) PHP/4.3.10

Redirection Example

  • REQUEST 1
GET /~carey/index.html HTTP/1.1
Host: www.cpsc.ucalgary.ca
Connection: keep-alive
User-Agent: Mozilla/5.0 […]
Accept: text/html,application/ 
[…]
Accept-Encoding: gzip,deflate,sdch
[…]
\r\n
  • RESPONSE 1
HTTP/1.1 302 Found
Date: Sat, 21 Jan 2012 01:10:43 GMT
Server: Apache/2.2.4 (Unix) 
   mod_ssl/2.2.4OpenSSL/0.9.7a 
   PHP/5.2.9 mod_jk/1.2.25
Location: http://pages.cpsc.ucalgary.ca/~carey/index.html
\r\n
  • REQUEST 2
GET /~carey/index.html HTTP/1.1
Host: pages.cpsc.ucalgary.ca
Connection: keep-alive
User-Agent: Mozilla/5.0 […]
Accept: text/html,application/ 
[…]
Accept-Encoding:gzip,deflate
[…]
\r\n
  • RESPONSE 2
HTTP/1.1 200 OK
Date: Sat, 21 Jan 2012 01:11:49 GMT
Server: Apache/2.2.4 (Unix) […]
Last-Modified: Mon, 16 Jan 2012 05:40:45 GMT
Content-Length: 3157
Keep-Alive: timeout=5
Connection: Keep-Alive
Content-Type: text/html
\r\n
<!DOCTYPE HTML PUBLIC "-
//W3C//DTD HTML 4.0 
Transitional//EN">
<html>
[…]
</html>
\r\n

HTTP 1.0 vs HTTP 1.1

  • Host header
  • HTTP 1.1 has a required host header
  • Persistent connections
  • By default, connections are assumed to be kept open after the transmission of a request and its response
  • The protocol permits closing of connections at any point
  • Connection: close header is used to inform the recipient that the connection will not be reused
  • Pipelining
  • A client need not wait to receive the response for one request before sending another request on the same connection
  • Strong cache support
  • Chunked transfer-encoding
  • OPTIONS method
  • A way for a client to learn about the capabilities of a server without actually requesting a resource

Assignment Tips

  • In our proxy:
(1) Pass along the 300-level message
  • GET request - does not ensure contents match as it's a redirection
  • Handle it locally in proxy and make a new connection on behalf of browser and get content from target server, or let browser take care of itself (send message to browser)
  • If you want to do extra:
  • Read about SELECT - set up sockets, give sockets the function and have one process and one loop; SELECT tells you if anything interesting has happened in the meantime while other stuff happens (saves resources)
// make sure you have read enough
// handle GET request
   // figure out he host
       // connect to the host
       // get response from host
       // do modification
       // send back response

Back to Navigation

Tutorial 5: Wireshark

Using Wireshark to Capture and Analyze Network Data

  • Wireshark (originally named Ethereal)is a free and open-source packet analyzer
  • It is used for network troubleshooting, analysis, software and communication protocol development, and education
  • It has a graphical front-end, and many more information sorting and filtering options.
  • Tool to capture traffic
  • Passive: gets a copy of protocol stack

Features and Functions

  • Wireshark is software that "understands" the structure of different networking protocols. Thus, it is able to display the encapsulation and the fields along with their meanings of different packets specified by different networking protocols.
  • Physical layer frame has header and footer, Wireshark can figure that out and its payload
  • Looking at header, can tell what kind of package it is; goes until application layer to figure out the protocol it's using
  • Works with live data and recorded/captured data
  • Data display can be refined using a display filter

Before Capturing Data

  • Not always allowed to capture data, follow laws!
  • Are you allowed to do this?
  • Ensure that you have permission to capture packets from the network you are connected with
  • Corporate policies or applicable laws may prohibit capturing data from the network
  • General Setup
  • Operating system must support packet capturing, e.g. capture support is enabled :* You must have sufficient privileges to capture packets, e.g. root / administrator privileges
  • Your computer's time and time zone settings should be correct

Example: Wireshark Demonstration

  • Detects interfaces (can be associated with Wi-Fi, ethernet, etc.)
  • Capture -> Interfaces
  • See traffic on interfaces (figure out the active one)
  • Start (can stop whenever)
  • Fills page with packets and information
  • NBNS: helps with discovery of devices (routers, iOS, etc.)
  • ARP: routing purposes (adhoc network)
  • Can filter in Options -> Capture Filter: <words> then Start
  • Capture filter: capture the ones I tell you
  • Example: (host <IP address>)
  • Or filter by Start then Filter: <words>
  • Captures everything and just show what you want
  • Example: HTTP

Wireshark Filters

  • Wireshark has two types of filters:
  • Capture Filters
  • A powerful capture filter engine helps remove unwanted packets from a packet trace and only retrieve the packets of interest
  • Display Filters
  • Let you compare the fields within a protocol against a specific value, compare fields against other fields, and check the existence of specified fields or protocols

Comparison Operators

  • Fields can also be compared against values. The comparison operators can be expressed either through English-like abbreviations or through C-like symbols
  • eq, == Equal
  • ne, != Not Equal
  • gt, > Greater Than
  • lt, < Less Than
  • ge, >= Greater than or Equal to
  • le, <= Less than or Equal to

Logical Expressions

  • Tests can be combined using logical expressions. These too are expressible in C-like syntax or with English-like abbreviations:
  • and, && Logical AND
  • or, || Logical OR
  • not, ! Logical NOT
  • Some valid display filters
  • tcp.port == 80 and ip.src == 192.168.2.1
  • http and frame[100-199] contains "wireshark"

Slice Operator

  • You can take a slice of a field if the field is a text string or a byte array.
  • For example, you can filter the HTTP header fields
  • Here the header “location” indicates that redirection happens: http.location[0:12]=="http://pages"
  • Another example: http.content_type[0:4] == "text"

Capture Filters

Syntax Protocol Direction Host(s) Logical Operand Other Expressions
Example tcp dst 136.159.5.20 and host 136.159.5.6
  • Protocol
  • Values can be ether, fddi, ip, arp, rarp, decnet, lat, sca, moprc, mopdl, tcp and udp
  • If no protocol is specified, all the protocols are used
  • Direction
  • Values can be src, dst, src and dst, src or dst
  • If no source or destination is specified, the "src or dst" keywords are applied
  • For example, “host 136.159.5.20” is equivalent to “src or dst host 136.159.5.20”
  • Host(s)
  • Values can be net, port, host, portrange
  • If no host(s) is specified, the "host" keyword is used
  • For example, "src 136.159.5.20" is equivalent to "src host 136.159.5.20"
  • Logical Operations
  • Values can be not, and, or
  • Negation ("not") has highest precedence. Alternation ("or") and concatenation ("and") have equal precedence and associate left to right
  • For example, "not tcp port 3128 and tcp port 80" is equivalent to "(not tcp port 3128) and tcp port 80"


  • Examples
  • tcp port 80
  • Displays packets with tcp protocol on port 80
  • ip src host 136.159.5.20
  • Displays packets with source IP address equals to 136.159.5.20
  • host 136.159.5.1
  • Displays packets with source or destination IP address equals to 136.159.5.1
  • src portrange 2000-2500
  • Displays packets with source UDP or TCP ports in the 2000-2500 range
  • src host 136.159.5.20 and not dst host 136.159.5.1
  • Displays packets with source IP address equals to 136.159.5.20 and in the same time not with the destination IP address 136.159.5.1
  • (src host 136.159.5.1 or src host 136.159.5.3) and tcp dst portrange 200-10000 and dst host 136.159.5.2
  • Displays packets with source IP address 136.159.5.1 or source address 136.159.5.3, the result is then concatenated with packets having destination TCP port range from 200 to 10000 and destination IP address136.159.5.2

Display Filters

Syntax Protocol . String 1 . String 2 Comparison Operators Value Logical Operators Other Expressions
Example http . request . method == get or tcp.port == 80
  • String1, String2 (Optional settings)
  • Sub protocol categories inside the protocol. To find them, look for a protocol and then click on the "+" character
  • Examples
  • ip.addr == 136.159.5.20
  • Displays the packets with source or destination IP address equals to 136.159.5.20
  • http.request.version=="HTTP/1.1"
  • Display HTTP requests with version 1.1
  • tcp.dstport == 25
  • Display TCP packets with destination port equal to 25
  • tcp.flags
  • Display packets having a TCP flag
  • tcp.flags.syn == 0x02
  • Display packets with a TCP SYN flag

Back to Navigation

Assignment 2: TCP Wrangling

Tutorial 6: UDP Protocol Specification

  • TCP has a lot of overhead
  • In streaming, fragmentation isn't as bad as loss of information (which happens in TCP), that is why UDP is chosen

What is UDP?

  • User Datagram Protocol
  • Name for what to put in an envelope for every layer
  • Application: message
  • Transport: segments (TCP), datagram (UDP)
  • TCP: stream, segments are part of it; datagram is wrapped up data that can be handled separately
  • Protocol for transport layer in protocol stack
  • Alternative to TCP and together with IP, also referred to as UDP/IP
  • Programs on networked computers can send short messages sometimes known as datagrams (using datagram sockets) to one another

Features of UDP

  • Minimal transport layer protocol
  • Ex: Two computers connected by Ethernet
  • Routers congested, packets get dropped
  • Buffers at every layer exist, especially IP layer (routers have limited buffer amount)
  • Unreliable and connection-less
  • UDP provides no guarantees to the upper layer protocol for message delivery
  • The UDP protocol retains no state of UDP messages once sent
  • Messages may be delivered out of order
  • UDP provides some integrity verification (via checksum) of the header and payload
  • Problem 1: packet gets dropped
  • Problem 2: bits get flipped due to errors, problems, etc.
  • UDP provides application multiplexing (via port numbers)
  • Both TCP, UDP use this
  • TCP: OS handles this (one instance of socket library part of OS)
  • OS knows how to handle these by ports to decide which packet belong to which processes on this machine

Encapsulation

  • Process creates a message, into a buffer

-> UDP encapsulates, creates a header (UDP datagram) -> IP data, IP header -> Frame data, frame header -> Decapsulation occurs, first to frame data, frame header -> IP data, IP header -> UDP data, UDP header -> Message to process -> Process

UDP Header

offset (bits)  0       7 8     15 16    23 24     31
               +--------+--------+--------+--------+
       0       |   src port num  |  dest port num  |
               +--------+--------+--------+--------+
       32      |      length     |     checksum    |
               +--------+--------+--------+--------+
       64+     |    ...       data        ...      |
               +--------+--------+--------+--------+
  • HTTP headers were text-based, varying size, no clear boundary
  • Payload, message to send
  • UDP header is very fixed number of bytes allocated for each part of the header
  • Some are optional - bits must exist, but they can be zero'd
  • Not mandatory for bits to be there for functionality
  • Unsure of IP, so send pseudo headers
  • UDP header consists of 4 fields, each of which is 2 bytes
  • The use of two of those is optional in IPv4 (src port num and checksum), but in IPv6 only the source port num is optional
  • Source Port Number
  • Bit 0-15, 2 bytes (1 word)
  • Identifies the sender's port, should be assumed to be the port to reply to if needed
  • If not used, then it should be zero
  • Destination Port Number
  • Identifies the receiver's port and is required
  • Length
  • Length of message in bytes; considers the entire datagram: header and data
  • The minimum length is 8 bytes = length of the header
  • The field size sets a theoretical limit of 65 535 bytes (8 byte header + 65 527 bytes of data) for a UDP datagram
  • Imposes limit on how many bytes the payload can be, if it's bigger then break it into two
  • Checksum
  • Verification of contents whether bits have been flipped
  • Error checking of the header and data. Mandatory for IPv6
  • Not just payload, but also pseudo header
  • If no checksum is generated by the transmitter, the field should be set to all-zeros
  • UDP uses pseudo header to define the checksum. It is calculated over the combination of pseudo header and UDP message
  • Pseudo header contains:
  • IP source address field
  • IP destination address field
  • IP protocol field
  • UDP length field
  • Data
  • IPv4 is 2^16 for one
  • Padding must be added to make the data a multiple of 16 bits

TCP vs UDP

TCP

  • Reliable (like a pipe)
  • Connection-oriented
  • Segment retransmission and flow control through windowing
  • Cut in the pipe, things stop working - it will retransmit or give you a message
  • Segment sequencing
  • Acknowledge segments

UDP

  • Unreliable
  • Connectionless
  • No windowing or retransmission
  • No sequencing, ordering for datagrams
  • No acknowledgement

Why Still Use UDP?

  • UDP's advantage over TCP
  • UDP's header is smaller than TCP's. The header applies to every segment, so it adds up
  • Generating UDP header has much simpler processing steps
  • UDP has no connection setup overhead, while TCP requires 3-way handshake
  • UDP is widely used and recommended for cases where:
  • Speed is more important than reliablity. An application values timely delivery over reliable delivery
  • Saves bandwidth since more overhead in TCP
  • Data exchanges are short and the order of reception of datagram does not mater
  • Best-effort delivery is enough
  • Applications require multicast or broadcast transmissions, not supported by TCP
  • TCP: two ends, can't broadcast

Popular Applications Using UDP

  • Domain Name System (DNS)
  • Simple Network Management Protocol (SNMP)
  • Wireshark, see lots of these messages between routers and network
  • Dynamic Host Configuration Protocol (DHCP)
  • Gets dedicated IP
  • Routing Information Protocol (RIP)
  • Many multimedia applications like streaming video, VOIP
  • “Real-time video and audio streaming protocols are designed to handle occasional lost packets, so only slight degradation in quality occurs, rather than large delays if lost packets were retransmitted.” [Wikipedia]

Checksum

  • “Checksum is the 16-bit one's complement of the one's complement sum of a pseudo header of information from the IP header, the UDP header, and the data, padded with zero octets at the end (if necessary) to make a multiple of two octets.” [RFC 768]
  • Checksum is calculated for UDP header and data
  • IP header also has a checksum, but it doesn't cover data
  • To create, must maintain length (less than max pad with zeros)
  • UDP checksum test is performed only at the sender and receiver end stations
  • The IP checksum test is performed in every intermediate node (router)
  • UDP check sum is performed over a pseudo header
  • In addition to UDP header and data, and the source and the destination IP address
  • Prevents misrouting: in case the destination IP address in IP header was corrupted, and it was not discovered by the IP checksum test, the UDP datagram would arrive to the wrong IP address. UDP can detect this and silently drop the datagram

Pseudo Header

0      7 8     15 16    23 24    31
+--------+--------+--------+--------+
|         source IP address	    |
+--------+--------+--------+--------+
|      destination IP address       |
+--------+--------+--------+--------+
| zeroes |protocol| UDP total length|
+--------+--------+--------+--------+
  • Above the UDP header
  • IP layer - passes
  • These are all in the checksum
  • Reason: check if not actually the receiver

Comparison of IPv4 vs IPv6

svLyzIS.png
  • Bigger, many are mandatory in IPv6

Sample Checksum Calculation

k4yBeqW.png
  • “Checksum is the 16-bit one's complement of the one's complement sum of a pseudo header of information from the IP header, the UDP header, and the data, padded with zero octets at the end (if necessary) to make a multiple of two octets.” [RFC 768]
  • For every 16 bit (word/row) take the 1's complement, add them, then take 1's complement of the result
  • Yellow: pseudo-header, gray: data

Socket Programming with UDP

  • No setup
  • Server: open socket (same), bind (for multiplexing) but don't listen/accept
  • Just send and receive data, then close connection
  • Client
  • No connect, since no setting up
  • Just open socket, send and receive data, close connection

UDP Server and Client Sockets

  • Server
(1) Open datagram socket with socket()
(2) Name the socket with bind(), using SOCKADDR structure for the address parameter
(3) Exchange data with a client using sento() and recvfrom()
(4) Close the connection with closesocket()
  • Client
(1) Open a datagram socket with socket()
(2) Exchange data with server using sendto() and recvfrom()
(3) Close the connection with closesocket()
  • Learn who other end is (accept)
  • No information, thus every time we do a recvfrom(), learn about sender
  • Can be multiple senders on same socket

Sending and Receiving Data

  • int sendto(int sockid, const void *sendbuf, int sendlen, int flags, const struct sockaddr *to, int tolen);
  • sockid: integer, socket descriptor
  • sendbuf: buffer containing the data to be transmitted
  • sendlen: size in bytes of the data in the buf buffer
  • flags: indicator specifying the way in which the call is made
  • to: struct sockaddr, the address of the target
  • tolen: the size (in bytes) of the addrport structure
  • int recvfrom(int sockid, const void *recvbuf, int recvlen, int flags, const struct sockaddr *from, int fromlen);
  • sockid and flags are same as above
  • recvbuf and recvlen: buffer containing the received data and its size
  • from and fromlen: the address of the source and its size
  • Return value: If no error occurs, these function returns the total number of bytes sent/recieved, if an error occurs, SOCKET_ERROR (- 1) is returned.

Sample Code

Example: wordserver

  • Ask for a word
  • Setup socket, error checking
  • Setup addr struct
  • Bind, error checking
  • Anyone can send
  • Also setup client sockaddr
  • Receives some string in the code; but you are sending/reading bytes so cannot use strlength

Example: simpleChat

  • Has both server/client in one program
  • Pay attention that it sends ASCII, but you have to deal with bytes
  • Client struct for peers
  • Each process has setting - input arguments to tell which instance of the program you will run
  • Need to run both at same time

Assignment Tips

  • Create program with both server and client
  • Write similar to BitTorrent
  • File segmented into chunks
  • Pull design: who has what, fear the other client who has piece I want
  • This server now, I am client; send a message requesting piece and they start sending
  • Multiple processes running


  • A binary file (P1)
  • Chunks: c1, c2, c3, c4
  • Only have c4
  • Server side: select, bind(port1)
  • recvfrom(...)
  • Design message so P2 wants c4
  • sendto(chunk, P2)
  • Another binary file (P2)
  • Chunks: c1, c2,c3, c4
  • Have c1, c3, wants c4
  • Client Side: socket()
  • sendto(requestmsg, addr P1)
  • Create your own protocol
  • Server sde: socket, bind(port2)
  • Another (P3)


  • Loop send/recv properly
  • Blockings occur, might be slow
  • Challenge: multiple clients, designing
  • TCP: easily fork, pass on handling of client to child process
  • UDP: no child process; use multi-threading (receive request, handle sending in another thread)


  • Instead of VM (which can cause network issues), can use cygwin
  • In Developer Tools, specify you want gdb
  • Gets Bash functionality

Back to Navigation

Tutorial 7: TCP Protocol Specification

  • At transport layer, data handled by UDP is datagram but for IP these are also datagram (different)

Overview

  • Connection-oriented, point-to-point protocol
  • Connection establishment and teardown phases
  • 'Phone-like' circuit abstraction (application-layer view)
  • One sender, one receiver
  • Called 'reliable byte stream' protocol
  • General purpose (for any network environment)
hTAU8fz.png

Features

  • TCP provides the following facilities to:
  • Stream Data Transfer
  • Transfers a contiguous stream of bytes in TCP segments
  • Multiplexing
  • Allow for many processes within a single host to use TCP communication facilities simultaneously
  • Port numbers important to determine which process is bein used
  • Reliability
  • Flow Control and congestion control
  • Originally optimized for certain kinds of transfer:
  • Telnet (interactive remote login)
  • FTP (long, slow transfers)
  • Large amounts of data to transfer between two ends; overhead acceptable with reliability
  • Web is like neither of these!
  • Originally optimized for wired networks
18Ovgl4.png
  • Provides a reliable, in-order, byte stream abstraction:
  • Recover lost packets and detect/drop duplicates
  • Detect and drop corrupted packets
  • Preserve order in byte stream; no “message boundaries”
  • Full-duplex: bi-directional data flow in same connection
  • Flow and congestion control:
  • Flow control: sender will not overwhelm receiver
  • Congestion control: sender will not overwhelm the network
  • Sliding window flow control
  • Congestion control done via adaptive flow control window size
  • Send and receive buffers

TCP Header

0                   1                   2                   3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|       Source Port Number      |    Destination Port Number    |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|                        Sequence Number                        |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|                    Acknowledgment Number                      |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|  Data |           |U|A|P|R|S|F|                               |
| Offset| Reserved  |R|C|S|S|Y|I|     Receiver window size      |
|       |           |G|K|H|T|N|N|                               |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|           Checksum            |      Pointer urgent data      |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|          Options (variable length)            |    Padding    |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|               applcation data (variable length)               |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
  • More than UDP
  • Figure out-of-order or missing pieces: other end didn't receive acknowledgement
  • 'Piggy-back' - if you have something to send to me, create TCP segment but acknowledge something along the way
  • Set ACK flag and it means something
  • To close: send ACK, then receive an acknowledgement of ACK, then close
  • Source/Destination Port Number
  • These two fields plus the source and destination IP addresses, combine to uniquely identify each TCP socket
  • Sequence Number
  • The byte in the stream of data from the sending TCP to the receiving TCP that the first byte of data in this segment represents
  • Acknowledgement Number
  • Contains the next sequence number that he sender expects to receive. This acknowledges receipt of all prior bytes. This field is valid only if the ACK flag is on
  • Data Offset/Header Length
  • Gives the length of the header in 32-bit words
  • Flags (6 bits)
  • UAPRSF used
  • URG
  • Segment contains urgent data. When this flag is set, the UrgPtr field indicates where the non-urgent data contained in this segment begins
  • ACK
  • Indicates that the Acknowledgement field is significant. All packets after the initial SYN packet sent by the client should have this flag set
  • PSH
  • Push function. Asks to push the buffered data to the receiving application
  • RST
  • Reset the connection
  • SYN
  • Synchronize sequence numbers. Only the first packet sent from each end should have this flag set
  • FIN
  • No more data from sender
  • Receiver Window Size
  • The number of bytes that the receiver is currently willing to receive
  • Checksum
  • The 16-bit checksum field is used for error-checking of the header and data
  • Option Field
  • Has many different options. For example MSS (maximum segment size) option, which specifies the maximum sized segment the sender wants to receive
  • The data portion of the TCP segment is optional

Establishing a TCP Connection

  • 3-way handshake
  • Client sends SYN with initial sequence number (ISN = X)
  • Label of data being sent
  • Server responds (listen() on port 80) with its own SYN with sequence number Y and ACK of client ISN with (X + 1) (next expected byte)
  • Acknowledgement number is the label of the expected packet
  • Client ACKs server's ISN with Y + 1 (server accept() or read())
  • X and Y are randomly chosen
  • All modulo 32-bit arithmetic

Sending Data

  • Sender TCP passes segments to IP to transmit
  • Keeps a copy in buffer at send side in case of loss
  • Called a 'reliable byte stream' protocol
  • Sender must obey receiver advertised window
  • Receiver sends acknowledgements (ACKs)
  • ACKs can be piggybacked on data going the other way
  • Protocol allows receiver to ACK every other packet in attempt to reduce ACK traffic (delayed ACKs)
  • Delay should not be more than 500 ms (typically 200 ms)
  • Keep copy in buffer, wait for acknowledgement before removing it
  • Keep data, holes when out-of-order happens

Example

jBdOTBI.png

Preventing Congestion

  • Sender may not only overrun receiver, but may also overrun intermediate routers
  • No way to explicitly know router buffer occupancy, so we need to infer it from packet losses
  • Assumption is that losses stem from congestion, namely, that intermediate routers have no available buffers
  • Sender maintains a congestion window (CW)
  • Never have more than CW of un-acknowledged data outstanding (or RWIN data; minimum of the two)
  • Successive ACKs from receiver causes CW to grow
  • CW grows based on one of two phases:
  • Slow-start: initial state
  • Congestion avoidance: steady-state
  • Switch between the two when CW > slow-start threshold

Congestion Control Principles

  • Lack of congestion control would lead to congestion collapse (Jacobson 88)
  • Idea is to be a 'good network citizen'
  • Would like to transmit as fast as possible without loss
  • Probe network to find available bandwidth
  • In steady-state: linear increase in CW per RTT
  • After loss event: CW is halved
  • This general approach is called Addictive Increase and Multiplicative Decrease (AIMD)
  • Helps lead to network stability

Diagram: Comparison

0dDrEik.png

Bigger Image

Slow Start

  • Initial CW = 1
  • After each ACK:
  • CW += 1
  • Continue until:
  • Loss occurs OR
  • CW > slow start threshold
  • Then switch to congestion avoidance
  • If we detect loss, cut CW in half
  • Exponential increase in window size per RTT

Congestion Avoidance

Until (loss) {
  after CW packets ACKed:
     CW += 1;
}
ssthresh = CW / 2;
// Depending on loss type:
CW /= 2; continue; // SACK/fast retransmit
CW = 1; go to slow start; // Course grained timeout
  • This is for TCP Reno/SACK: TCP Tahoe always sets CW = 1 after a loss

How Are Losses Recovered?

  • Coarse-grained timeout:
  • Sender does not receive ACK after some period of time
  • Event is called a retransmission time-out (RTO)
  • RTO value is based on estimated round-trip time (RTT)
  • RTT is adjusted over time using exponential weighted moving average:
  • RTT = (1-x) * RTT + (x) * sample
  • x is typically 0.1
  • First done in TCP Tahoe

Fast Retransmit

  • Receiver expects N, gets N + 1:
  • Immediately sends ACK(N)
  • This is called a duplicate ACK
  • Does not delay ACKs here
  • Continue sending duplicate ACKs for each subsequent packet (not N)
  • Sender gets 3 duplicate ACKs:
  • Infers N is lost and resends
  • 3 chosen so out-of-order packets don't trigger Fast Retransmit accidentally
  • Called 'fast' since we don't need to wait for a full RTT

Connection Termination

  • Either side may terminate a connection. In fact, connection can stay half-closed. Let's say the server closes (typical in WWW)
  • Server sends FIN with sequence number (SN + 1) (i.e., FIN is a byte in sequence)
  • Client ACK's the FIN with (SN + 2) ('next expected')
  • Client sends its own FIN when ready
  • Server ACK's client FIN as well with (SN + 1)

The TCP State Machine

  • TCP uses a Finite State Machine, kept by each side of a connection, to keep track of what state a connection is in
  • State transitions reflect inherent races that can happen in the network, e.g. two FIN's passing each other in the network
  • Certain things can go wrong along the way, i.e. packets can be dropped or corrupted
  • Machine is not perfect; certain problems can arise not anticipated in the original RFC
  • Timers come into play then

TCP Connection Establishment

VByKMlE.png
  • CLOSED: more implied than actual, i.e. no connection
  • LISTEN: willing to receive connections (Accept call)
  • SYN_SENT: sent a SYN, waiting for SYN-ACK
  • SYN_RECEIVED: received a SYN, waiting for an ACK of our SYN
  • ESTABLISHED: connection ready for data transfer

TCP Connection Termination

Mom1D7I.png
  • FIN_WAIT_1: we closed first, waiting for ACK of our FIN (active close)
  • FIN_WAIT_2: we closed first, other side has ACK'd our FIN, but not yet FIN'd
  • CLOSING: other side closed before it received our FIN
  • TIME_WAIT: we closed, other side closed, got ACK of our FIN
  • CLOSE_WAIT: other side sent FIN first, not us (passive close)
  • LAST_ACK: other side sent FIN, then we did, now waiting for ACK

Back to Navigation

Other Information

Tutorial 8: The Internet Protocol

The Network Layer

  • IP (Internet Protocol) is a Network Layer Protocol
  • RFC 791 provides the specification for IP
  • Sends datagrams (like UDP in transport layer)
  • At end points, assume virtual connection that abstracts out what's happening underneath
  • Goes through ISP, to major routers, to another ISP (for target)
  • These machines don't implement transport or application layer (end point does this)

The Waist of of the Hourglass

kWm9f5M.png
  • IP is the waist of the hourglass of the Internet protocol stack
  • Multiple higher-layer protocols
  • Multiple lower-layer protocols
  • One common protocol at the network layer for data transmission
  • Glue point: IP
  • Every other layer, might have multiple layers within it
  • Only layer where everyone must implement same protocol
  • From developer's POV, clean; network providers (telephone companies) hate this
  • They have no say for any layer (above: users, lower: hardware, physical link), and they have a lot of things to manage within IP
  • IPv4, IPv6 designed to live with each other - if they can't support one, they use the other (similar to IP protocol)

Routers

LJVZs6M.png
  • IP is the highest layer protocol which is implemented at both routers and hosts

Best Effort Protocol

  • IP provides an unreliable, connectionless, best effort service (also called: "datagram service")
  • Unreliable: no guarantee for delivery of packets
  • Connectionless: things delivered out of order
  • Each packet ("datagram") is handled independently. IP is not aware that packets between hosts may be sent in a logical sequence
  • Best effort: no guarantee on anything
  • IP does not make guarantees on the service (no throughput guarantee, no delay guarantee, etc.)
  • Consequences: Higher layer protocols have to take care of delivery guarantees

IPv4 and IPv6 Datagram

fnD0J2j.png
  • IPv4: address is 32 bits
  • IPv6: address is 128 bits

IP Versions

  • Authority that assigns IP addresses to institutions
  • The first publicly used version of the Internet Protocol was version 4 (IPv4)
  • Address space: 32 bits (~4.3 billion addresses)
  • Initially it was thought to be enough
  • Address exhaustion
  • On February 3, 2011, the Internet Assigned Numbers Authority (IANA) officially depleted the global pool of completely fresh blocks of addresses
  • Address exhaustion was a concern as early as the 1990s
  • IPv6 is the next generation IP that tries to address the shortcomings of IPv4
  • Address space: 128 bits (~79 octillion times more than IPv4)
  • Designed to live alongside IPv4
  • Version 5:
  • Doesn't exist; skipped to avoid confusion
  • IPv5 relates to experimental TCP/IP protocol called the Internet Stream Protocol, Version 2 (originally RFC 1190)
  • Protocol seen by some as being a peer of IP at the Internet Layer in TCP/IP architecture, and packets assigned IPv5 to differentiate them from 'normal' IPv4 packets
  • Protocol never went anywhere, so to avoid confusion IPv6 was used

IPv4 Datagram Fields

     0                   1                   2                   3
Bit  0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2
     +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
0    |Version|  IHL  |    DCP    |ECN|          Total Length         |
     +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
32   |         Identification        |Flags|     Fragment offset     |
     +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
64   |  Time to Live |    Protocol   |        Header Checksum        | 
     +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
96   |                      Source IP Address                        |
     +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
128  |                    Destination IP Address                     |
     +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
160  |                     Options (if IHL > 5)                      |
     +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
  • Explicit Congestion Notification(ECN): 2 bits
  • An optional feature that is defined by RFC 3168 for notification of network congestion without dropping packets
  • Both endpoints must support it and be willing to use it
  • Only effective when supported by the underlying network
  • Total Length: 16 bits
  • The entire IP datagram size, including the header and payload
  • Minimum-length is 20 bytes (minimal header with no payload)
  • Identification: 16 bits
  • Used primarily for uniquely identifying the group of fragments of a single IP datagram
  • Unique identification of a datagram from a host
  • Incremented whenever a datagram is transmitted
  • Flags: 3 bits
  • Bit field used to control or identify fragments
  • Bit 0: Reserved; must be zero
  • Bit 1: Don’t fragment (DF)
  • If set, packets are dropped if they need to be fragmented
  • Bit 2: More fragments (MF)
  • Zero for non-fragmented packets; for fragmented packets, all but the last packet has this flag set; the last packet will have a non-zero "Fragment Offset" field
  • Fragment Offset: 13 bits
  • Measured in units of 64-bit words (8 byte)
  • Time to Live: 8 bits
  • Limits a datagram’s lifetime to break routing circles
  • Specified in seconds but in practice is used as a hop count (decrement by 1 at each router) and set to 64 at the start
  • When TTL is zero, the router should discard the packet; typically an ICMP Time Exceeded message is sent to the sender
  • Protocol: 8 bits
  • Defines the protocol used in the payload
  • There are over 140 protocols defined (TCP is 0x06; UDP is 0x11)
  • Header Checksum
  • The 16-bit one's complement of the one's complement sum of all 16-bit words in the header
  • For computing the checksum, the value of the checksum field is

zero

  • Options: not often used
  • Used to control fragmenting, routing, debugging, security, etc.
  • Must be padded so that the header is divisible by 32 bits (4 bytes)

Maximum Transmission Unit

  • Maximum size of IP datagram is 65535, but the data link layer protocol generally imposes a limit that is much smaller
  • Ethernet frames have a maximum payload of 1500 bytes
  • IP datagrams encapsulated in Ethernet frame cannot be longer than 1500 bytes
  • The limit on the maximum IP datagram size, imposed by the data link protocol is called maximum transmission unit (MTU)
MTUs for various data link protocols
Ethernet FDDI ATM AAL5 802.3 802.5 802.11 (WLAN)

1500

4352 9180 1492 4464 2272

IP Fragmentation

  • If size of IP datagram exceeds MTU, IP datagram is fragmented into smaller units
  • Can transfer data over heterogenous types of networks (with different MTUs)
  • Fragmentaion:
  • IP router splits the datagram into several datagram
  • Fragments are reassembled at receiver
  • Analogy: want to send an air bus (that can't fly yet) from Paris to Calgary
  • From Paris to Toronto/Montreal, they can ship the whole thing without breaking it down (a big ship carries this bus parcel)
  • On arrival from Montreal, to send to Calgary, there is not a big enough carrier that can carry the bus
  • Disassemble into pieces and reassemble in Calgary


  • Fragmentation can be done at the sender or at intermediate routers
  • The same datagram can be fragmented several times
  • Reassembly of original datagram done only at destination hosts
  • How big your data that you want to transfer, the network you want to pass through cannot handle it
  • Multiple hops: each part can give you a different type of network
  • Sender is agnostic of what's going on in the network
  • Whoever is gateway between two types of network and sees a packet that cannot travel over a specific one, it is responsible for fragmentation
  • A network will not reassemble the packet (extra work), it maintains its fragmented state and destination reassembles it - no one in the middle does that
  • Middle man will fragment it depending where it's needed

Example of Fragmentation

2gjXgbU.png
  • User is agnostic to this, network layer worries about it
  • Set fragmentation flag and ID (common between all of them so know pieces belong together)
  • Offset - 1480 because something small taken out of header

IPv6 Datagram Fields

     0                   1                   2                   3
Bit  0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2
     +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
 0   |Version| Traffic Class |             Flow Control              |
     +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
 32  |         Payload Length        |  Next Header  |   Hop Limit   |
     +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
     +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
 64  |                                                               |
 96  |                     Source IPv6 Address                       |
 128 |                          128 bits                             |
 160 |                                                               |
     +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
 192 |                                                               |
 224 |                    Destination IPv6 Address                   |
 256 |                            128 bits                           |
 288 |                                                               |
     +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
  • Version: 4 bits
  • For IPv6, this has a value of 6 (0110)
  • Traffic Class: 8 bits
  • The same as the redefined IPv4 fields:
  • The first 6 bits are differentiated services for real-time data streaming
  • The last 2 bits are for ECN (Explicit Congestion Notification)
  • Flow Label: 20 bits
  • Originally created for giving real-time applications special service
  • When set to a non-zero value, it serves as a hint to routers and switches with multiple outbound paths that these packets should stay on the same path so that they will not be reordered
  • It has further been suggested that the flow label be used to help detect spoofed packets
  • Payload Length: 16 bits
  • The size of the payload in octets, including any extension headers
  • This is different from IPv4 as it does not include the fixed IPv6 header
  • The length is set to zero when a Hop-by-Hop extension header carries a Jumbo Payload option
  • A Jumbo Payload has a 32 bit length in the Hop-By-Hop Options extension header allowing packets up to 4GB in size!
  • Next Header: 8 bits
  • The same as the IPv4 Protocol field
  • The extension headers are described here as protocols
  • Hop Limit: 8 bits
  • Replaces the time to live field of IPv4
  • This value is decremented by one at each intermediate node visited by the packet
  • When the counter reaches 0 the packet is discarded
  • Fragmented Packets
  • Notice there is no fragmentation fields, so routers cannot fragment IPv6 packets as they do for IPv4
  • Hosts may use the fragmentation extension to send packets larger than an MTU
  • IPv6 also does not have a checksum field

Back to Navigation

Tutorial 9: Ethernet LANs

Introduction of Ethernet

  • Networks existed before Internet
  • At that time, they were classified in terms of local area (same building), wide area (clusters of LANs in a city), metropolitan (more than core downtown), Internet (connect these)
  • One machine has a modem which can dial up to internet and it becomes a gateway so other computers can connect to this one and use the Internet - painfully slow
  • Ethernet, defined under IEEE 802.3, is one of today's most widely used data communications standards
  • It finds its major use in Local Area Networks (LAN)
  • It has largely replaced competing wired LAN technologies
  • Founded by Xerox Palo Alto Research Center in 1975
  • Originally designed as a 2.94 Mbps system to connect 100 computers on a 1 km cable
  • Later, Xerox, Intel and DEC drew up a standard to support 10 Mbps, which was later basis for the IEEE’s 802.3 specification

Main Elements

(1) The network nodes
  • The points to and from which the communication takes place
  • Data Terminal Equipment: devices such as PCs, file servers, print servers
  • Data Communications Equipment: devices that receive and forward the data frames across the network, e.g. repeaters, switches, routers
(2) Interconnecting media
  • The cable that connects the network nodes, the type determines the speed at which the data may be transmitted
  • Coaxial Cable
  • Twisted Pair Cables
  • Fiber Optic Cables

Network Topologies

vRRbvyj.png
  • Point to point: simplest configuration as only two network hosts are used
  • Coaxial bus: broadcast LAN - all transmitted frames travel to and are processed by all adapters connected to bus
  • Hub-based star network: also broadcast LAN - hosts are directly connected to a hub with twisted-pair copper wire. Hub sends a copy out on all of its other interfaces
  • Switch-based star network: hub replaced by switch, which sends a copy to the target host

Ethernet IEEE 802.3 Standards

RXtmcRa.png
  • 802.3 standard defines both MAC and physical layer details
  • Ethernet naming, three parts:
  • E.g. 10Base-T and 100Bast-T
  • First number:
  • 10, 100, or 1000
  • Indicates transmission speed in megabits per second
  • Second number indicates transmission type
  • BASE = baseband (digital signal, single channel)
  • BROAD = broadband (analog signal, multiple channel)
  • Third number refers to physical media itself
  • T: means unshielded twisted-pair cables. Further numbers indicate the number of twisted pairs available. For example in 100BASE-T4, the T4 indicates four twisted pairs

MAC Addresses

  • Media Access Control Address:
  • A unique identifier assigned to network interfaces
  • Assigned by the manufacturer, stored in the hardware
  • Usually permanent, no duplication
  • 6-byte address expressed in 12-digit hexadecimal numbers
  • The first 24 bits identify the manufacturer
  • The second half of the address is known as the extension of board ID
  • E.g. broadcast address FF-FF-FF-FF-FF-FF
  • Link-layer addressing scheme, used as a network address for most IEEE 802 network technologies, including Ethernet
  • Mac address is analogous to a person’s SIN number, while IP address is analogous to postal address

Ethernet IEEE 802.3 Frame Format/Structure

  • Frame structures are developed within the MAC layer of the protocol stack
  • 10 / 100 Mbps Ethernet MAC data frame format:
EkOGFgX.png
  • Header
  • Preamble (PRE) - informs the receiving stations that a frame is starting as well as enabling synchronization
  • Start Of Frame delimiter (SOF)
  • Destination Address (DA) – first bit: 0-an individual address, 1-a group address. The next bit into the DA indicates whether the address is globally administered (0), or local(1). 46 remaining bits-destination address
  • Source Address (SA) - always an individual address the left most bit is always a zero
  • Length / Type - provides MAC information and indicates the number of client data types that are contained in the data field of the frame
  • Payload
  • Data - minimum of 46 bytes, up to 1500 bytes long
  • Trailer:
  • Frame Check Sequence (FCS) - This field is four bytes long. It contains a 32 bit Cyclic Redundancy Check (CRC)
XvVVa25.png
  • 1000 MBps Ethernet MAC data frame format
  • Extension: When using the 1000Base-X standard, there is a minimum frame size of 416 bytes, and for 1000Base-T there is a minimum frame size of 520bytes

Ethernet Media Access Control Method

zUJrMN2.png
  • Ethernet uses Carrier Sense Multiple Access / Collision Detection (CSMA/CD)
  • Carrier Sense: each station listens on the network for traffic and it can detect when the network is quiet
  • Multiple Access: the stations are able to determine for themselves whether they should transmit
  • Collision Detect:
  • It is still possible that two stations will start to transmit at virtually the same time
  • If this occurs then the stations detect collision and will stop transmitting. They then back off a random amount of time before attempting a retransmission

Fast Ethernet

  • 100BaseT Ethernet (Fast Ethernet) is defined under the 802.3 family of standards under 802.3u
  • One of the most widely used forms of Ethernet
  • All the nodes within the network share the 100 Mbps bandwidth
  • It uses the CSMA/CD access method, but there are some minor differences in the way the overall system operates
  • It runs on UTP or optical fiber cable and uses a star topology

Gigabit Ethernet

  • The next development of the Ethernet standard beyond the popular 100Base-T version
  • Provides for half and full duplex operation at speeds of 1000 Mbps
  • It is particularly easy to install because the 1000Base-T variant is designed to run over Cat 5 UTP (unshielded twisted pair) that is widely and cheaply available
  • Uses the 802.3 Ethernet frame formats
  • Uses the CSMA/CD access method with support for one repeater per collision domain
  • Provides backward compatibility with 10BASE-T and 100BASE-T technologies

CSMA/CD Algorithm

Overview:

7D3CTBB.png

Detailed:

mDPfzc4.png

Back to Navigation

Tutorial 10: Hubs, Switches, and Bridges

LAN Interconnection

  • We need to break down big networks to sub-LANs
  • Limited amount of supportable traffic: on single LAN, all stations must share bandwidth
  • Limited length: 802.3 (Ethernet) specifies maximum cable length. For 10 Mbps:
  • Maximum length of the wire: 2,500 meter
  • Large "collision domain" (can collide with many stations)

Hubs

Le1des2.png
  • Physical Layer devices
  • Essentially repeaters operating at bit levels: repeat received bits on one interface to all other interfaces
  • Hubs can be arranged in a hierarchy (or multi-tier design), with backbone hub at its top
  • Each connected LAN referred to as LAN segment
  • Advantages:
  • Simple, inexpensive device
  • Multi-tier provides graceful degradation: portions of the LAN continue to operate if one hub malfunctions
  • Extends maximum distance between node pairs (100m per Hub)
  • Disadvantages:
  • Hubs do not isolate collision domains: node may collide with any node residing at any segment in LAN
  • Single collision domain results in no increase in max throughput
  • Multi-tier throughput same as single segment throughput
  • Individual LAN restrictions pose limits on number of nodes in same collision domain and on total allowed geographical coverage
  • Cannot connect different Ethernet types (e.g., 10BaseT and 100baseT)

Bridges

  • Link-layer devices:
  • Store, forward Ethernet frames
  • Examine incoming frame’s MAC address, selectively forward frame to one-or-more outgoing links when frame is to be forwarded on segment, uses CSMA/CD to access segment
  • Advantages:
  • Isolates collision domains resulting in higher total max throughput, and does not limit the number of nodes nor geographical coverage
  • Can connect different type Ethernet since it is a store and forward device
  • Transparent: no need for any change to hosts LAN adapters

Switches

  • A switch could be considered a bridge with numerous ports
  • Switch, or Layer 2 switch is often used interchangeably with bridge
  • Plug-and-play, self-learning
  • Switches do not need to be configured
5oXono1.png
  • Allows multiple simultaneous transmissions
  • Hosts have dedicated, direct connection to switch
  • Switches buffer packets
  • Ethernet protocol used on each incoming link, but no collisions; full duplex
  • Each link is its own collision domain
  • Switching: A-to-A' and B-to-B' simultaneously without collisions (not possible with dumb hub)

Switch Table

  • Each entry in table has (MAC address of host, interface to reach host, time stamp)
  • Looks like a routing table
  • Entries created, maintained in switch table like a routing protocol

Self-Learning

  • Switch learns which hosts can be reached through which interfaces
  • When frame received, switch 'learns' location of sender: incoming LAN segment
  • Records sender/location pair in switch table (initially is empty)

Frame Filtering/Forwarding

  • When frame received:
record link associated with sending host
index switch table using MAC destination address
IF entry found for destination
THEN
   IF destination on segment from which frame arrived
      THEN drop the frame
      ELSE forward the frame on interface indicated
   ELSE flood
  • Forward on all but the interface on which the frame arrived (flooding)

Interconnecting Switches

  • Switches can be connected together
  • Self-learning to know where to send frames
ftIKgeR.png
  • With loops, incorrect learning occurs

Spanning Trees

  • Allow a path between every LAN without causing loops (loop-free environment)
  • BRidges communicate with special configuration messages (BPDUs)
  • Standardized by IEEE 802.1D
  • Requirements:
  • Each bridge is assigned a unique identifier
  • A broadcast address for bridges on a LAN
  • A unique port identifier for all ports on all bridges
  • MAC address
  • Bridge ID + port number

Example: Spanning Tree

3ag5Mmv.png
Zh0EPe7.png

Spanning Tree Algorithm

  • Overview
(1) Determine root bridge among all bridges
(2) Each bridge determines its root port
  • The port in the direction of the root bridge
(3) Determine the designated bridge on each LAN
  • The bridge which accepts frames to forward towards the root bridge
  • The frames are sent on the root port of the designated bridge
  • Selecting root bridge
  • Initially, each bridge considers itself to be the root bridge
  • Bridges send BDPU frames to its attached LANs
  • The bridge and port ID of the sending bridge
  • The bridge and port ID of the bridge the sending bridge considers root
  • The root path cost for the sending bridge
  • Best one wins: (lowest root ID/cost/priority)
  • Selecting root ports
  • Each bridge selects one of its ports which has the minimal cost to the root bridge
  • In case of a tie, the lowest uplink (transmitter) bridge ID is used
  • In case of another tie, the lowest port ID is used
  • Selecting designated bridges forwarding/blocking state
  • Same as selecting the root bridge:
  • Initially, each bridge considers itself to be the designated bridge, send BDPU frames to attached LANs, best one wins
  • Root and designated bridges will forward frames to and from their attached LANs
  • All other ports are in the blocking state

Example: Spanning Tree Execution

0mA4Aiw.png

Switches vs Routers

  • Both store-and-forward devices
  • Routers: network layer devices (examine network layer headers)
  • Switches: link layer devices
  • Routers maintain routing tables, implement routing algorithms
  • Switches maintain switch tables, implement filtering, learning algorithms

Back to Navigation

Tutorial 11: Wi-Fi LANs

Introduction

  • Wi-Fi or WiFi: defined as any "wireless local area network (WLAN) products that are based on the IEEE 802.11 standards
  • Popular technology that allows an electronic device to exchange data wirelessly (using radio waves) over a computer network
  • IEEE established the 802.11 Group in 1990. Specifications for standard ratified in 1997
  • Initial speeds were 1 and 2 Mbps
  • IEEE modified the standard in 1999 to include 802.11a (54 Mbps) and 802.11b (11 Mbps). Incidentally, 802.11b equipment was available before 802.11a
  • 802.11g (54 Mbps) was added in 2003

The Basics

  • In many respects, the IEEE 802.11b wireless LAN (WLAN) standard is similar to that of classic IEEE 802.3 (Ethernet) LANs
  • Similarities:
  • LAN with limited geographic coverage
  • Multiple stations, with 48-bit MAC addresses
  • Shared transmission medium (broadcast technology)
  • CSMA-based Medium Access Control protocol
  • Comparable data rates (11 Mbps vs 10 Mbps)
  • But there are also many distinct differences:
  • Wireless (air interface) versus wired (coax)
  • Wireless propagation environment (multipath)
  • Higher error rate due to interference, fading, etc.
  • Successful frames are ACKed by receiver
  • Mobile stations versus fixed stations
  • Half-duplex versus full-duplex operation
  • "Hidden node" and "exposed node" problems (exposed node problem occurs when a node is prevented from sending packets to other nodes due to a neighboring transmitter)
  • Potential asymmetries of links
  • CSMA/CA versus CSMA/CD
  • Multiple data transmission rates (1, 2, 5.5, 11)

Wi-Fi Features

  • Infrastructure mode vs "ad hoc" mode:
  • Devices in a wireless network are set up to either communicate indirectly through a central place — an access point — or directly, one to the other
  • Access Point (AP) sends "beacon frames" – all the information about the network (e.g. Timestamp Capability information)
  • Mobiles choose AP based on signal strength
  • Multiple channel access protocols supported
  • CSMA/CA (DCF – Distributed coordination function); PCF (Point Coordination Function); RTS/CTS (Request to Send/Clear to Send)
  • MAC-layer can provide error control, retransmission, rate adaptation, etc.
  • Direct Sequence Spread Spectrum (DSSS)
  • Modulation technique
  • Signal spread across 22 MHz with each channel based around a center frequency. There are fourteen 22-MHz channels
qIJCSbh.png

Where does Wireless RF Live?

ZhsihE8.png

Protocol Stack View

3B8hz4l.png

Wireless Cells

A1OFdcT.png
  • In Canada/US, there are eleven 802.11b/g channels
  • Only channels 1, 6 and 11 are non-overlapping
  • Computers can roam between cells

Medium Access Control (MAC)

  • Carrier Sense Multiple Access with Collision Avoidance (CSMA-CA)
  • Device wanting to transmit senses the medium (air)
  • If medium is busy, it defers
  • If medium is free for certain period (DIFS – Distributed Inter-Frame Space), it transmits frame
  • DIFS is approximately 128 µs
  • Latency can increase if "air" is very busy since devices will have a hard time finding "open air" to send frames

MAC Protocol

NPvCRZH.png
  • DIFS – Distributed Inter-Frame Space (approx. 128 µs)
  • SIFS – Short Inter-Frame Space (approx. 28 µs)
  • Every frame is ACKed except broadcast/multicast

MAC Retransmission

  • If no ACK received 'right away', then the sender retransmits the frame again at the MAC layer
  • Indicates frame (or ACK) was lost/corrupted
  • Very short timeout (e.g., 1 ms)
  • Exponential back off (doubling) if repeated loss
  • Typically recovers before TCP would notice
  • Max retransmission limit (e.g. 8)
  • May do MAC-layer rate adaptation or frame fragmentation if channel error rate is high

Other MAC Protocols Supported

  • Point Coordination Function (PCF)
  • AP polls stations in turn to see if there’s frames to send
  • Useful for real-time traffic
  • Request-To-Send/Clear-To-Send (RTS/CTS)
  • Reservation-based approach (ask permission)
  • Useful for very large frames
  • Useful for solving the "hidden node" problem (when a node is visible from a wireless access point (AP), but not from other nodes)
  • Request asks for clearance (permission) to send
  • Request also indicates time required for transmit

Frame Formats

  • Two frame formats available:
  • Long preamble
  • Short preamble
  • Configuration option for NIC (Network interface controller) and AP
  • Variable-size frames (maximum of 2312 bytes for payload)
  • 16-bit Cyclic Redundancy Code (CRC) for error checking of frames

Preamble

Long

3DZuwLb.png
  • Long Preamble = 144 bits
  • Interoperable with older 802.11 devices
  • Entire Preamble and 48 bit PLCP (Physical Layer Convergence Protocol) header sent at 1 Mbps

Short

dK1kJ6f.png
  • Short Preamble = 72 bits
  • Preamble transmitted at 1 Mbps
  • PLCP (Physical Layer Convergence Protocol) header transmitted at 2 Mbps
  • More efficient than long preamble

More Features

  • Power Management
  • Mobile nodes can 'sleep' to save power
  • Accomplished by using DTIM (Delivery Traffic Indication Message) intervals
  • AP will buffer frames until client requests them
  • AP can use virtual bitmap field in beacons to indicate which stations have data waiting
  • Security
  • Wired Equivalent Privacy (WEP)
  • Not very secure at all!
  • To address this weakness, there is WPA2 (IEEE 802.11i)
  • Uses CCMP, an AES-based encryption mode with strong security

Back to Navigation

Tutorial 12: Transmission Media

The Physical Layer

  • Transfer information over a transmission medium
  • Converts bit streams into electrical or optical signals (and back)
  • The signal propagates over the transmission medium at different speeds depending on the physical characteristics of the medium
  • An electromagnetic wave propagates through vacuum at a speed of c=3*108 m/s and at lower speeds in other materials

Quality of Transmission Media

  • A good transmission medium should provide communication with good quality at long distance
  • For voice communication, quality of communication is determined by the voice quality
  • For data communication, however, the quality of communication is mainly determined by the effective data rate of communication

Classes of Transmission Media

  • Conducted or guided media
  • Use a conductor such as a wire or a fiber optic cable to move the signal from sender to receiver
  • Wireless or unguided media
  • Use radio waves of different frequencies and do not need a wire or cable conductor to transmit signals

Physical Layer Transmission Media

RK3U37m.png

Design Factors

  • Bandwidth: All other factors remaining constant, the greater the band-width of a signal, the higher the data rate that can be achieved
  • Transmission impairments: Limit the distance a signal can travel
  • Interference: Competing signals in overlapping frequency bands can distort or wipe out a signal
  • Number of receivers: Each attachment introduces some attenuation and distortion, limiting distance and/or data rate

Guided Transmission Media

  • Transmission capacity depends on the distance and on whether the medium is point-to-point or multipoint
  • Twisted pair wires
  • Consists of two insulated copper wires arranged in a regular spiral pattern to minimize the electromagnetic interference between adjacent pairs
  • Low frequency transmission medium (traditional phone wires, 10 or 100 Mbps Ethernet, enhanced Cat 5 and 6 can handle gigabit+ Ethernet)
  • Unshielded Twisted Pair (UTP)
  • Each wire is insulated with plastic wrap, but the pair is encased in an outer covering
  • Category 3 UTP (16MHz Bandwidth; 10BASE-T/100BASE-T4)
  • Category 5 UTP (100MHz Bandwidth; 100BASE-T/1GBASE-T with Cat 5e)
  • More tightly twisted than Category 3 cables
  • Four pairs of copper wire
  • Category 6 UTP (250MHz Bandwidth; 10GBASE-T up to 55 meters)
  • Shielded Twisted Pair (STP)
  • The pair is wrapped with metallic foil or braid to insulate the pair from electromagnetic interference
  • Category 6a STP (500MHz Bandwidth; 10GBASE-T up to 100 meters)
  • Advantages
  • Inexpensive and readily available
  • Flexible and light weight
  • Easy to work with and install
  • Disadvantages
  • Susceptibility to interference and noise
  • Attenuation problem
  • For analog, repeaters needed every 5-6km
  • For digital, repeaters needed every 2-3km
  • Relatively low bandwidth
  • Coaxial Cable
  • Also known as Coax
  • Used for cable television, LANs, telephony
  • Has an inner conductor surrounded by a braided mesh
  • Both conductors share a common center axial, hence the term "co-axial"
  • Characteristics
  • Higher bandwidth
  • 400 to 600Mhz
  • Up to 10,800 voice conversations
  • Can be tapped easily
  • Less susceptible to interference than twisted pair
  • High attenuation rate makes it expensive over long distance (needs amplifiers every few km)
  • Fiber Optic Cable
  • Glass fiber carrying light pulses, each pulse a bit
  • Greater capacity: high-speed point-to-point transmission (10’s-100’s Gbps)
  • Smaller size and lighter weight
  • Lower attenuation (fewer repeaters)
  • Low error rate (immune to electromagnetic noise)
  • Hard to tap
  • Multimode step-index fiber
  • The reflective walls of the fiber move the light pulses to the receiver
  • Multimode graded-index fiber
  • Acts to refract the light toward the center of the fiber by variations in the density
  • Single mode fiber
  • The light is guided down the center of an extremely narrow core

Unguided Media: Wireless

  • Radio Link Types:
  • Terrestrial microwave
  • Up to 45 Mbps channels
  • LAN (e.g., WiFi)
  • 11 Mbps, 54 Mbps
  • Wide-area (e.g., cellular)
  • 3G: ~ 1 Mbps
  • LTE: ~ 300 (DL), 75 (UL) Mbps
  • Satellite
  • Kbps to 45 Mbps channel (or multiple smaller channels)
  • 270 msec end-end delay
  • Geosynchronous versus low altitude
  • Common characteristics
  • Signal carried in electromagnetic spectrum
  • No physical “wire”
  • Multidirectional
  • Propagation environment effects:
  • Reflection
  • Obstruction by objects
  • Interference

Data Encoding Techniques

  • Digital Data, Analog Signals [modem]
  • Digital Data, Digital Signals [wired LAN]
  • Analog Data, Digital Signals [codec]
  • Frequency Division Multiplexing (FDM)
  • Wave Division Multiplexing (WDM) [fiber]
  • Time Division Multiplexing (TDM)
  • Pulse Code Modulation (PCM) [T1] (digital transmission systems developed by Bell Labs)
  • Delta Modulation

Digital Data, Digital Signals

  • Techniques used in a number of LANs
  • Digital signal – is a sequence of discrete, discontinuous voltage pulses
  • Bit duration: the time it takes for the transmitter to emit the bit
  • Issues
  • Bit timing
  • Recovery from signal
  • Noise immunity

Binary Encoding

  • NRZ (non-return to zero)
  • NRZI (NRZ inverted)
  • Manchester (used by IEEE 802.3, 10 Mbps Ethernet)

Non-Return to Zero (NRZ)

tyhOdPl.png
  • Encode binary data onto signals
  • E.g. 0 as low signal and 1 as high signal
  • Voltage does not return to zero between bits
  • Known as Non-Return to Zero (NRZ)
Yge13Ty.png
  • Problem: consecutive 1s or 0s
  • Low signal (0) may be interpreted as no signal
  • High signal (1) leads to baseline wander
  • Unable to recover clock
  • Sender’s and receiver’s clock have to be precisely synchronized
  • Receiver resynchronizes on each signal transition
  • Clock drift in long periods without transition

Non-Return to Zero Inverted (NRZI)

TyiZxaN.png
  • Has a transition at a clock boundary if the bit being transmitted is "1"
  • Stay at current signal (maintain voltage level) to encode/transmit a "0"
  • Solves the problem of consecutive ones (shifts to 0s)
  • NRZI can have long series of zeros, still unable to recover clock

Manchester

7oK4Q9N.png
  • In IEEE 802.3 - 10 MBps Ethernet
  • Split cycle into two parts
  • Send high--low for "1", low--high for "0"
  • Transmit XOR of NRZ encoded data and the clock
  • Clock signal can be recovered from the encoded data
  • Only 50% efficient (1/2 bit per transition): double the transmission rate

Different Encoding Schemes

99dSdfg.png

Back to Navigation