Courses/Computer Science/CPSC 441.W2014/Chapter 2: Application Layer
Course Overview |
Application Layer |
Transport Layer |
Network Layer |
Datalink Layer |
Advanced Topics |
Extra |
---|---|---|---|---|---|---|
Chapter 2 |
||||||
Contents
- 1 Introduction
- 2 Course Information
- 3 Chapter 2: Application Layer
- 3.1 Section 2.1: Principles of Network Applications
- 3.1.1 Creating a Network App
- 3.1.2 Client-Server Architecture
- 3.1.3 P2P Architecture
- 3.1.4 Processes Communicating
- 3.1.5 Sockets
- 3.1.6 Code Examples: Demo
- 3.1.7 Addressing Processes
- 3.1.8 App-layer Protocol Defines
- 3.1.9 What Transport Services Does an App Need?
- 3.1.10 Transport Service Requirements: Common Apps
- 3.1.11 Internet Transport Protocols Services
- 3.1.12 Securing TCP
- 3.1 Section 2.1: Principles of Network Applications
- 4 HTTP over TCP
- 4.1 Diagram: Comparison of Approaches
- 4.2 Summary of Web and HTTP
- 4.3 Section 2.2: Web and HTTP
- 4.3.1 HTTP Overview
- 4.3.2 HTTP Connections
- 4.3.3 Non-persistent HTTP
- 4.3.4 Persistent HTTP
- 4.3.5 HTTP Request Message
- 4.3.6 Uploading Form Input
- 4.3.7 HTTP Response Message
- 4.3.8 HTTP Response Status Codes
- 4.3.9 Trying HTTP (Client Side)
- 4.3.10 User-Server State: Cookies
- 4.3.11 Web Caches (Proxy Server)
- 4.3.12 Example: Caching
- 4.3.13 Conditional GET
- 4.4 Section 2.3: File Transfer Protocol (FTP)
- 4.5 Section 2.4: Electronic Mail
- 4.6 Section 2.5: DNS
- 4.6.1 Domain Name System/Service (DNS)
- 4.6.2 Services, Structures
- 4.6.3 A Distributed, Hierarchical Database
- 4.6.4 Root Name Servers
- 4.6.5 TLD, Authoritative Servers
- 4.6.6 Local DNS Name Server
- 4.6.7 DNS Name Resolution Example
- 4.6.8 Caching, Updating Records
- 4.6.9 DNS Records
- 4.6.10 DNS Protocol, Messages
- 4.6.11 Example: nslookup
- 4.6.12 Example: dig
- 4.6.13 Linuxconfig files
- 4.6.14 Inserting Records into DNS
- 4.6.15 Attacking DNS
- 4.7 Section 2.6.1: P2P Architecture
Introduction
Hello, my name is Carrie Mah and I am currently in my 3rd year of Computer Science with a concentration in Human Computer Interaction. I am also an Executive Officer for the Computer Science Undergraduate Society. If you have any questions (whether it be CPSC-related, events around the city, or an eclectic of random things), please do not hesitate to contact me.
I hope you find my notes useful, and if there are any errors please correct me by clicking "Edit" on the top of the page and making the appropriate changes. You should also "Watch this page" for the CPSC 441 page to check when I upload notes. Just click "Edit" and scroll down to check mark "Watch this page."
You are welcome to use my notes, just credit me when possible. If you have any suggestions or major concerns, please contact me at cmah[at]ucalgary[dot]ca. Thanks in advance for reading!
Course Information
Disclaimer: This is a page created by a student. Everything created and written by me is not associated with the Department of Computer Science or the University of Calgary. I am not being paid to do this nor am getting credit. I was a class scribe for CPSC 457 and enjoyed it so much that I wanted to continue it with a course related to it. I find writing Wiki notes for courses I am not familiar with are helpful for me, so I hope you find these notes helpful. I encourage that you still write your own notes, as it helps you retain information, but my notes are public and free to use!
This course is for W2014, taught by Dr. Carey Williamson.
The course website is here: http://pages.cpsc.ucalgary.ca/~carey/CPSC441
Chapter 2: Application Layer
- Notes adapted from slides created by JFK/KWR and lectures held by Dr. Carey Williamson
- All material copyright 1996-2012 © J.F Kurose and K.W. Ross, All Rights Reserved
Section 2.1: Principles of Network Applications
- Some network apps
- Big three:
- FTP, e-mail, remote login
- Web, text messaging, multi-user network games, streaming stored video (Youtube, Hulu, Netflix), voice over IP (e.g. Skype), real-time video conferencing, social networking, search…
Creating a Network App
- Write programs that:
- Run on (different) end systems
- Communicate over network
- Obey vertical API of how messages get handed down in OS kernel for transmission across network
- E.g. web server software communicates with browser software
- No need to write software for network-core devices
- Network-core devices do not run user applications
- Applications on end systems allows for rapid app
- Make sure messages get passed down the right API
- End systems have full stack, communicate in illusion of horizontal communication
- Web browser talks to web server, illusion talks horizontally without magic underneath; in reality there’s the passing though
- Application architectures
- Possible structure of applications:
- Client-server
- Peer-to-peer (P2P)
Client-Server Architecture
- Server:
- Always-on host
- Permanent IP address
- Data centers for scaling
- Clients:
- Communicate with server
- May be intermittently connected
- May have dynamic IP address
- Server is special, and client is cheap and ubiquitous
- File server, web server, video server, name server, file server
- Put extra investments and resources that have high storage capacity, server gets permanent IP address, always on
- Clients: from anywhere, clients are plentiful and cheap; they come and go
P2P Architecture
- No always-on server
- Arbitrary end systems directly communicate
- Peers request service from other peers, provide service in return to other peers
- Self scalability: new peers bring new service capacity, as well as new service demands
- When you join a P2P network, you’re requesting stuff from other people; but you actually share stuff with that network
- Content, storage you share
- Peers are intermittently connected and change IP addresses
- End systems can talk to other end systems and negotiate sharing of content
- User-generated content, P2P sharing, copyright, etc.
- Everyone is equal, can share/download content, play symmetrical role of being a provider to some others and being a consumer from others – make social good better for everybody
Processes Communicating
- Process: program running within a host
- Within same host, two process communicate using inter-process communication (defined by OS)
- Processes in different hosts communicate by exchanging messages
- Client process: process that initiates communications
- Server process: process that waits to be contacted
- Aside: applications with P2P architectures have client processes and server
- Built using socket programming API
- Way to send data from on process to another
- Socket: magic door – stick data in socket and magically whisked away by data fairies from other end
Sockets
- Process sends/receivers messages to/from its socket
- Socket analogous to door
- Sending process shoves message out the door
- Sending process relies on transport infrastructure on other side of door to deliver message to socket at receiving process
- Socket API – programmer invokes socket-based communication to invoke data in/out of kernel and network interface
Code Examples: Demo
- wordlengthserver code and wordlengthclient code
- Give it a text string, and it tells you how many characters there are
- Server using TCP, sitting in a passive loop waiting for clients to contact it
- Compile with
–lsocket
on server - Compile with
–lsocket –lnsl
for client
- One screen with server, another with client
- Every time talk to server, it spawns a new child process to communicate with the client
- “CAT” is 3 letters; server sends the answer 3 and client receives the answer
- When server sends 3, sends an ASCII string with the ‘3’ character
- Main listener in server; forks a new child process when communicating with client
Addressing Processes
- To receive messages, process must have identifier
- Host device has unique 32-bit IP address
- Q: Does IP address of host on which process runs suffice for identifying the process?
- A: No, many processes can be running on same host
- Identifier includes both IP address and port numbers associated with process host
- Example port numbers:
- HTTP server: 80
- Mail server: 25
- To send HTTP message to gaia.cs.umass.edu web server:
- IP address: 128.119.245.12
- Port number: 08
App-layer Protocol Defines
- Variety of commands that you can send back and forth
- Types of messages exchanged
- E.g. request, response
- Message syntax
- What fields in messages and how fields are delineated
- Where does URL go, what commands go, key-value pairs
- Message semantics
- Meaning of information in fields
- Example: GET <retrieve file>
- Rules for when and how processes send and respond to messages
- Who talks first, what to do when message comes in and how to handle wacky behavior
- Open protocols
- Defined in RFCs
- Allows for interoperability
- Everyone knows how server and client should talk – public protocol
- E.g. HTTP, SMTP
- Proprietary protocols
- Don’t want to tell anyone how it works, internal secret protocol used
- E.g. Skype
What Transport Services Does an App Need?
- Data integrity
- Some apps (e.g. file transfer, web transactions) require 100% reliable data transfer
- Other apps (e.g. audio) can tolerate some loss
- Reliable data exchange required as client wants exact copy of web page that server has
- Timing
- Some apps (e.g. Internet telephony, interactive games) require low delay to be ‘effective’
- Some applications need fast response, some are elastic (depends on Internet speed)
- Elastic example: a file
- Stock exchange need to be updated – hard to do with TCP
- Throughput
- Some apps (e.g. multimedia) require minimum amount of throughput to be ‘effective’
- Other apps (‘elastic apps’) make use of whatever throughput they get
- How much data does app need to pump
- Example: Youtube, with video and a certain video rate – needs to be delivered consistently across the network. If network unable to sustain throughput, you might see pauses or glitches
- Security
- On network – encrypted form so no one can steal it
Transport Service Requirements: Common Apps
application | data loss | throughput | time sensitive |
---|---|---|---|
File transfer | no loss | elastic | no |
no loss | elastic | no | |
Web documents | no loss | elastic | no |
Real-time audio/video | loss-tolerant | audio: 5kbps-1Mbps video:10kbps-5Mbps |
yes, 100s msec |
Stored audio/video | loss-tolerant | audio: 5kbps-1Mbps video:10kbps-5Mbps |
yes, few secs |
Interactive games | loss-tolerant | few kbps up | yes, 100s msec |
Text messaging | no loss | elastic | yes and no |
- File transfer: reliable, want perfect data; delay-tolerant
- Video: want guaranteed throughput (live video) and low latency hopefully (screen is actually what is happening)
- Stored video (Youtube): throughput, tolerate more latency – pause, rewind
- Gaming: low bandwidth, low latency
- FPS: want shot to hit person, don’t want latency
Internet Transport Protocols Services
- Layer 4 – what transport layer protocol to invoke?
- Reliable, TCP vs unreliable, UDP
TCP service:
- Gets all of your data there eventually (slow)
- Reliable transport between sending and receiving process
- Flow control: sender won’t overwhelm receiver
- Congestion control: throttle sender when network overloaded
- Does not provide: timing, minimum throughput guarantee, security
- Connection-oriented: setup required between client and server processes
UDP service:
- Gets most of your data there quickly (fast)
- Minimal mechanism, generates data
- Unreliable data transfer between sending and receiving process
- Does not provide: reliability, flow control, congestion control, timing, throughput, guarantee, security, error-checking, connection setup
application | application layer protocol | underlying transport protocol |
---|---|---|
SMTP [RFC 2821] | TCP | |
Remote terminal access | Telnet [RFC 854] | TCP |
Web | HTTP [RFC 2616] | TCP |
File transfer | FTP [RFC 959] | TCP |
Streaming multimedia | HTTP (e.g. Youtube), RTP [RFC 1889] | TCP or UDP |
Internet telephony | SIP, RTP, proprietary (e.g. Skype) | TCP or UDP |
Securing TCP
- TCP & UDP
- No encryption
- Cleartext passwds sent into socket traverse Internet in cleartext
- SSL
- Provides encrypted TCP connection
- Data integrity
- End-point authentication
- SSL is at app layer
- Apps use SSL libraries, which ‘talk’ to TCP
- SSL socket API
- cleartext passwds sent into socket traverse Internet encrypted
- Internet protocol stack – no security unless you build it
- SSL allows encryption near top of protocol stack – API for program to do encryption and have data handled by lower level of stack
HTTP over TCP
- Recall previous notes
- Example: page.html with two images (.jpg)
Diagram: Comparison of Approaches
HTTP/1.0 (Classic) | Parallel | Persistent | HTTP/1.1 (Pipelining) |
---|---|---|---|
|
|
|
|
Summary of Web and HTTP
- The major application on the Internet
- Majority of traffic is HTTP (or HTTP-related)
- Client/server model:
- Clients make requests, servers respond to them
- Done mostly in ASCII text (helps debugging)
- Various headers and commands
- Original version built by physicist had 1.0 paradigm – works but it’s slow and clunky
Section 2.2: Web and HTTP
- Implementation in early 90s, idea about 50 years ago
- Host stores object, path name contains files
- Web page consists of objects
- Object can be HTML file, JPEF image, Java applet, audio file…
- Web page consists of base HTML-file which includes several referenced objects
- Each object is addressable by a URL, e.g.
- [www.someschool.edu]/someDept/pic.gif
- [host name] path name
HTTP Overview
- HTTP: hypertext transfer protocol
- Web’s application layer protocol
- Client/server model
- Client: browser that requests, receivers, (using HTTP protocol) and ‘displays’ Web objects
- Server: Web server sends (using HTTP protocol) objects in response to requests
- Uses TCP:
- Client initiated TCP connection (creates socket) to server, port 80
- Server accepts TCP connection from client
- HTTP messages (application-layer protocol messages) exchanged between browser (HTTP client) and Web server (HTTP server)
- TCP connection closed
- HTTP is ‘stateless’
- Server maintains no information about past client requests
- Protocols that maintain ‘state’ are complex
- Past history (state) must be maintained
- If server/client crashes, their views of ‘state’ may be inconsistent, must be reconciled
HTTP/1.0 | HTTP/1.1 |
---|---|
|
|
HTTP Connections
- Non-persistent HTTP
- At most one object sent over TCP connection
- Connection then closed
- Downloading multiple objects required multiple connections
- Persistent HTTP
- Multiple objects can be sent over single TCP connection between client, server
Non-persistent HTTP
- Suppose user enters URL:
www.someschool.edu/somedepartment/home.index
(contains text, references to 10 jpeg images)
- 1a) HTTP client initiates TCP connection to HTTP server (process) at
www.someschool.edu
on port 80 - 1b) HTTP server at host
www.someschool.edu
waiting for TCP connection at port 80. ‘Accepts’ connection, notifying client - 2) HTTP client sends HTTP request message (containing URL) into TCP connection socket. Message indicates that client wants object
somedepartment/home.index
- 3) HTTP server receives request message, forms response message containing requested object, and sends message into its socket
- 4) HTTP server closes TCP connection
- 5) HTTP client receives response message containing HTML file, displays HTML. Parsing HTML file, finds 10 referenced JPEG objects
- 6) Steps 1-5 repeated for each of 10 JPEG objects
- Response time
- RTT: time for a small packet to travel from client to server and back
- HTTP response time:
- One RTT to initiate TCP connection
- One RTT for HTTP request and first few bytes of HTTP response to return
- File transmission time (time from server to send file to client)
- Non-persistent HTTP response time = 2RTT + file transmission time
Persistent HTTP
- Non-persistent HTTP issues
- Requires 2 RTTs per object
- OS overhead for each TCP connection
- Browsers often open parallel TCP connections to fetch referenced objects
- Persistent HTTP
- Server leaves connection open after sending response
- Subsequent HTTP messages between same client/server sent over open connection
- Client sends requests as soon as it encounters a referenced object
- As little as one RTT for all the referenced objects
HTTP Request Message
- Two types of HTTP messages: request, response
- HTTP request message: ASCII (human-readable format)
GET /index.html HTTP/1.1\r\n Host: www-net.cs.umass.edu\r\n User-Agent: Firefox/3.6.10\r\n Accept: text/html,application/xhtml+xml\r\n Accept-Language: en-us,en;q=0.5\r\n Accept-Encoding: gzip,deflate\r\n Accept-Charset: ISO-8859-1,utf-8;q=0.7\r\n Keep-Alive: 115\r\n Connection: keep-alive\r\n \r\n
-
GET
: request line -
\r\n
: carriage return character, line-feed character -
Host…Connection
: header lines -
\r\n + <space>
: carriage return, line-feed at start of line indicates end of header lines
- Blank line after request delineates header and data of subsequent responses
Uploading Form Input
-
POST
method
- Web page often includes form input
- Input is uploaded to server in entity body
-
URL
method
- Uses
GET
method (retrieves web pages) - Input is uploaded in URL field of request line
-
www.somesite.com/animalsearch?monkeys&banana
-
- Uses
HTTP/1.0 | HTTP/1.1 |
---|---|
|
|
HTTP Response Message
HTTP/1.1 200 OK\r\n Date: Sun, 26 Sep 2010 20:09:20 GMT\r\n Server: Apache/2.0.52 (CentOS)\r\n Last-Modified: Tue, 30 Oct 2007 17:00:02 GMT\r\n ETag: "17dc6-a5c-bf716880"\r\n Accept-Ranges: bytes\r\n Content-Length: 2652\r\n Keep-Alive: timeout=10, max=100\r\n Connection: Keep-Alive\r\n Content-Type: text/html; charset=ISO-8859-1\r\n \r\n data data data data data ...
-
HTTP/1.1 200 OK\r\n
: status line (protocol status code, status phrase) -
Date…\r\n
: header lines -
data data data data data ...
: data, e.g. Requested HTML file
HTTP Response Status Codes
- Status code appears in first line in server-to-client response message
- Some sample codes:
-
200 OK
- Request succeeded, requested object later in this message
-
301 Moved Permanently
- Requested object moved, new location specified later in this message (
Location:
)
- Requested object moved, new location specified later in this message (
-
400 Bad Request
- Request message not understood by server
-
404 Not Found
- Requested document not found on this server
-
505 HTTP Version Not Supported
-
Trying HTTP (Client Side)
(1) Telnet to your favorite Web server:
-
telnet cis.poly.edu 80
- Opens TCP connection to port 80
- (Default HTTP server port) at cis.poly.edu
- Anything typed in sent to port 80 at cis.poly.edu
(2) Type in a GET HTTP request:
-
GET /~ross/ HTTP/1.1
-
Host: cis.poly.edu
- By typing this in (hit carriage return twice), you send this minimal (but complete) GET request to HTTP server
(3) Look at response message sent by HTTP server
- When you type up something random, you will get a BAD REQUEST (did not talk HTTP language)
-
GET /~carey/index.htm HTTP/1.0
- Sends back content
-
HEAD /~carey/index.htm HTTP/1.0
- Metadata
- In HTTP/1.1 host header is mandatory
-
/~carey/CPSC441/oddtest1.html HTTP/1.1
-
-
Host: asdasda
- Pauses for 5 seconds because of timeout to retrieve multiple objects on same connection
-
Host:
- Doesn’t respond because no extra blank line after
- Five seconds before interaction ends (can do GET request)
-
/~carey/CPSC441/oddtest4.html HTTP/1.0
- 404 Not Found
-
/~carey/CPSC441/oddtest3.html HTTP/1.0
- 403 Forbidden: exists, but no access permission
- Public webserver www.cpsc.ucalgary.ca but redirects to internal server with different name and server actually hosting content
-
GET /~carey/CPSC441/oddtest1.html HTTP/1.0
-
Host: foople
-
Range: bytes=250-30
- Give bytes from location 250 – 300
- 206 Partial Content: successful, returns something but not entire object, just part of it
User-Server State: Cookies
- Many Web sites use cookies
- Four components:
- (1) Cookie header line of HTTP response message
- (2) Cookie header line in next HTTP request message
- (3) Cookie file kept on user’s host, managed by user’s browser
- (4) Back-end database at Web site
- Example:
- Susan always accesses Internet from PC
- Visits specific e-commerce site for first time
- When initial HTTP requests arrives at site, site creates: unique ID and entry in backend database for ID
- Header is a set-cookie header
- Amazon, unique identifier
- Browser sending cookie header – know what you bought, send targeted ads
- Messy when companies send cookie database to another organization
- What cookies can be used for
- Authorization
- Shopping carts
- Recommendations
- User session state (Web e-mail)
- How to keep ‘state’
- Protocol endpoints: maintain state at sender/receiver over multiple transactions
- Cookies: http messages carry state
- Cookies and privacy
- Cookies permit sites to learn a lot about you
- You may supply name and e-mail to sites
Web Caches (Proxy Server)
- Goal: satisfy client request without involving origin server
- User sets browser: Web accesses via cache
- Browser sends all HTTP requests to cache
- Object in cache: cache returns object
- Else cache requests object from origin server, then returns object to client
- Browser has a cache for images, etc.
- Important for efficiency of web
- Cache acts as both client and server
- Server for original requesting client
- Client to origin server
- Typically cache is installed by ISP (university, company, residential ISP)
- Web caching:
- Reduces response time for client request
- Reduces traffic on an institution’s access link
- Internet dense with caches: enables ‘poor’ content providers to effectively deliver content (so too does P2P file sharing)
Example: Caching
- Assumptions
- Average object size: 100K bits
- Average request rate from browsers to origin servers: 15/sec
- Average data rate to browsers: 1.50 Mbps
- RTT from institutional router to any origin server: 2 sec
- Access link rate: 1.54 Mbps
- Consequences
- LAN utilization: 15%
- Access link utilization = 99% (problem)
- Total delay = Internet delay + access delay + LAN delay = 2 sec + minutes + usecs
- Bottleneck link
- Use a fatter access link
- Access link rate: 154 Mbps
- Access link utilization = 0.99%
- Total delay = Internet delay + access delay + LAN delay = 2 sec + msecs + usecs
- Cost: increased access link speed (not cheap!)
- Install local cache
- Access link utilization = ?
- Total delay = ?
- Cost: web cache (cheap!)
- Calculating access link utilization, delay with cache:
- Suppose cache hit rate is 0.4
- 40% requests satisfied at cache, 60% requests satisfied at origin
- Access link utilization:
- 60% of requests use access link
- Data rate to browsers over access link = 0.6*1.50 Mbps = .9 Mbps
- Utilization = 0.9/1.54 = .58
- Total delay
- = 0.6 * (delay from origin servers) + 0.4 * (delay when satisfied at cache)
- = 0.6 (2.01) + 0.4 (~msecs)
- = ~ 1.2 secs
- Less than with 154 Mbps link (and cheaper too!)
Conditional GET
- Goal: don’t send object if cache has up-to-date cached version
- No object transmission delay
- Lower link utilization
- Cache: specify date of cached copy in HTTP request
-
If-modified-since: <date>
-
- Server: response contains no object if cached copy is up-to-date
-
HTTP/1.0 304 Not Modified
-
HTTP/1.0 200 OK <date>
if cached copy has been modified after <date>
-
Section 2.3: File Transfer Protocol (FTP)
- Transfer file to/from remote host
- Client/server model
- Client: side that initiates transfer (either to/from remote)
- Server: remote host
- Text-based protocol
- User interface session part: log in, navigate around directories and issue requests
- Data transfer part: copy file from one machine to another
- ftp: RFC 959
- ftp server: port 21
FTP: Separate Control, Data Connections
- FTP client contacts FTP server at port 21, using TCP
- Client authorized over control connection
- Client browses remote directory, sends commands over control connection
- When server receives file transfer command, server opens 2nd TCP data connection (for file) to client
- After transferring one file, server closes data connection
- Server opens another TCP data connection to transfer another file
- Control connection: ‘out of band’
- FTP server maintains ‘state’: current directory, earlier authentication
- FTP control and data part separately done
Control part | Data part |
---|---|
|
|
FTP Commands, Responses
- Sample commands
- Sent as ASCII text over control channel
-
USER <username>
-
PASS <password>
-
LIST
returns list of file in current directory -
RETR filename
retrieves (gets) file -
STOR filename
stores (puts) file onto remote host
- Sample return codes
- Status code and phrase (as in HTTP)
- 331 Username OK, password required
- 125 data connection already open; transfer starting
- 425 Can’t open data connection
- 452 Error writing file
-
ftp ftp.cpsc.ucalgary.ca
- Login:
- (1) username: anonymous; password: <e-mail address>
- Can copy files to local directory
- (2) username: <name>; password: <password>
- Displays actual directory
- In browser:
ftp://ftp.cpsc.ucalgary.ca
- Forces browser to go from http to ftp
- Also:
ftp://<name>.ftp.cpsc.ucalgary.ca
Section 2.4: Electronic Mail
Good Old Days | Today |
---|---|
|
|
- Three major components:
- User agents
- Mail servers
- Simple mail transfer protocol (SMTP)
- User agent
- Aka ‘mail reader’
- Composing, editing, reading mail messages
- Outgoing, incoming messages stored on server
- E.g. Outlook, Thunderbird, iPhone mail client
- Mail servers
- Mailbox contains incoming messages for user
- Message queue of outgoing (to be sent) mail messages
- SMTP protocol between mail servers to send e-mail messages
- Client: sending mail server
- ‘Server’: receiving mail server
SMTP [RFC 2821]
- Uses TCP to reliably transfer email message from client to server, port 25
- Direct transfer: sending server to receiving server
- Three phases of transfer
- Handshaking (greeting)
- Transfer of messages
- Closure
- Command/response interaction (like HTTP, FTP)
- Commands: ASCII text
- Response: status code and phrase
- Messages must be in 7-bit ASCII
Example: Alice Sends Message to Bob
- (1) Alice uses UA to compose message ‘to’
bob@someschool.edu
- (2) Alice uploads to SMTP server; her UA sends message to her mail server, and message placed in message queue
- (3) Client side of SMTP opens TCP connection with Bob’s mail server
- SMTP has outbound message in message queue, try to find SMTP server (Bob’s)
- (4) SMTP client sends Alice’s message over the TCP connection
- (5) Bob’s mail server places the message in Bob’s mailbox
- (6) Bob invokes his user agent to read message
- User agent -> Alice’s mail server -> Bob’s mail server -> user agent
Example: SMTP Interaction
S: 220 hamburger.edu C: HELO crepes.fr S: 250 Hello crepes.fr, pleased to meet you C: MAIL FROM: <alice@crepes.fr> S: 250 alice@crepes.fr... Sender ok C: RCPT TO: <bob@hamburger.edu> S: 250 bob@hamburger.edu ... Recipient ok C: DATA S: 354 Enter mail, end with "." on a line by itself C: Do you like ketchup? C: How about pickles? C: . S: 250 Message accepted for delivery C: QUIT S: 221 hamburger.edu closing connection
-
cs.umass.edu
bulletproof’d their SMTP; must use proper mail exchange - Naive vulnerable protocol; spammers exploited vulnerabilities, driving e-mail from any place
SMTP
- Can send e-mail without using e-mail client (reader)
-
telnet servername 25
- See 220 reply from server
- Enter
HELO, MAIL FROM, RCPT TO, DATA, QUIT
commands
- SMTP uses persistent connections
- SMTP requires message (header & body) to be in 7-bit ASCII
- SMTP server uses
CRLF.CRLF
to determine end of message - Comparison with HTTP:
- HTTP: pull
- SMTP: push
- Both have ASCII command/response interaction, status codes
- HTTP: each object encapsulated in its own response message
- SMTP: multiple objects sent in multi-part message
Mail Message Format
- SMTP: protocol for exchanging email messages
- RFC 822: standard for text message format:
- Header lines, such as:
- To:
- From:
- Subject:
- Different from SMTP MAIL FROM, RCPT TO: commands!
- Blank line
- Body: the “message”
- ASCII characters only
Mail Access Protocols
- User agent ---SMTP---> Sender’s mail server ---SMTP---> Receiver’s mail server ---Mail access protocol---> User agent
- SMTP: delivery/storage to receiver’s server
- Mail access protocol: retrieval from server
- POP: Post Office Protocol [RFC 1939]
- Authorization, download
- Mail arrived at SMTP server
- Using device, when you retrieve messages, the message is now on the device used to retrieve; message is no longer at the server
- IMAP: Internet Mail Access Protocol [RFC 1730]
- More features, including manipulation of stored messages on server
- Remote manipulation of content; stays on server but remotely access and read from anywhere
- HTTP: HyperText Transfer Protocol
- Gmail, Hotmail, Yahoo! Mail, etc.
POP3 Protocol
S: +OK POP3 server ready C: user bob S: +OK C: pass hungry S: +OK user successfully logged on
- Authorization phase
- Client commands:
-
user
: declare username -
pass
: password
-
- Server responses
-
+OK
-
-ERR
-
C: list S: 1 498 S: 2 912 S: . C: retr 1 S: <message 1 contents> S: . C: dele 1 C: retr 2 S: <message 1 contents> S: . C: dele 2 C: quit S: +OK POP3 server signing off
- Transaction phase, client
-
list
: list message numbers -
retr
: retrieve message by number -
dele
: delete -
quit
-
POP3 and IMAP
- More about POP3
- Previous example uses POP3 'download and delete' mode
- Bob cannot re-read e-mail if he changes client
- POP3 'download-and-keep': copies of messages on different clients
- POP3 is stateless across sessions
- IMAP
- Keeps all messages in one place: at server
- Allows user to organize messages in folders
- Keeps user state across sessions:
- Names of folders and mappings between message IDs and folder name
Section 2.5: DNS
Domain Name System/Service (DNS)
- Mapping from human readable text name (www….) to an actual IP address, so your computer knows where the website is on the Internet and can get there
- For Internet hosts and routers, IP address (32 bit) used fo addressing datagrams, ‘name’ (e.g. www.yahoo.com) is used by humans
- Human readable name (high level) => mapping => unique IP address (low level identity)
- Client goes to DNS first before setting up a socket and GET from the Internet to retrieve data from Web server and get sent back
- Tool for doing this mapping, from human name spaces into IP name spaces
- Domain Name System:
- Distributed database implemented in hierarchy of many name servers
- Application-layer protocol: hosts, name servers communicate to resolve names (address/name translation)
- Note: core Internet function implemented as application-layer protocol
- Complexity at network’s edge
- Transaction-oriented application-layer protocol
- UDP-based (usually) because want low latency
- Runs on Port 53
Services, Structures
- DNS services
- Hostname to IP address translation
- Host aliasing: canonical, alias names
- Mail server aliasing
- Load distribution
- Replicated Web servers: many IP addresses correspond to one name
- Why not centralize DNS and make it a huge distributed database?
- Issue of scale
- Single point of failure
- Traffic volume
- Distant centralized database
- Maintenance
- If you centralize, massive amount of load by everyday people will be sent; single point of failure if someone were to attack DNS – doesn’t scale
A Distributed, Hierarchical Database
- Root DNS Servers
- -> com DNS servers
- -> yahoo.com DNS servers
- -> amazon.com DNS servers
- -> org DNS servers
- -> pbs.org DNS servers
- -> edu DNS servers
- -> poly.edu DNS servers
- -> umass.edu DNS servers
- Client wants IP for www.amazon.com, first approximation:
- Client queries root server to find com DNS server
- Client queries .com DNS server to get amazon.com DNS server
- Client queries amazon.com DNS server to get IP address for www.amazon.com
Root Name Servers
- Contacted by local name server that cannot resolve name
- Root name server:
- Contacts authoritative name server if name mapping not known
- Gets mapping
- Returns mapping to local name server
- Replicated by number of sites
[www].cpsc.ucalgary.(ca)
- [most specific] to (least specific, top level of where it is)
- Resolve from root level (right) and working the way down to get exact details
- 13 root name ‘servers’ worldwide
- (a) Verisign, Los Angeles, CA (5 other sites)
- (b) USC=ISI Marina del Rey, CA
- (c) Cogent, Herndon, VA (5 other sites)
- (d) U Maryland College Park, MD
- (e) NASA Mt. View, CA
- (f) Internet Software C. Palo Alto, CA (and 48 other sites)
- (g) US DoD Columbus, OH (5 other sites)
- (h) ARL Aberdeen, MD
- (i) Netnod, Stockholm (37 other sites)
- (j) Verisign, Dulles VA (69 other sites)
- (k) RIPE London (17 other sites)
- (l) ICANN Los Angeles, CA (41 other sites)
- (m) WIDE Tokyo (5 other sites)
TLD, Authoritative Servers
- Generic top-level domain (TLD) servers
- Generic categories for businesses/advertising
- Responsible for com, org, net, edu, aero, jobs, museums, and all top-level country domains (e.g. uk, fr, ca, jp)
- Network Solutions maintains servers for .com TLD
- Educause for .edu TLD
- Allow you to have .insurance, .xxx, etc.
- Authoritative DNS servers
- Local DNS server, that has real answers of ‘which computer and office does it reside in?’
- Name of every computer, IP address, who owns that computer (map to location/person)
- Organization’s own DNS server(s), providing authoritative hostname to IP mappings for organization’s named hosts
- Can be maintained by organization or service provider
- Non-authoritative server: anything that comes out of a cache
Local DNS Name Server
- Does not strictly belong to hierarchy
- Each ISP (residential ISP, company, university) has one
- Also called “default name server”
- When host makes DNS query, query is sent to its local DNS server
- Has local cache of recent name-to-address translation pairs (but may be out of date!)
- Acts as proxy, forwards query into hierarchy
DNS Name Resolution Example
- Host at cis.poly.edu wants IP address for gaia.cs.umass.edu
-
nslookup gaia.cs.umass.edu
- IP address: 128.119.245.12 (can initiate TCP connection)
- Answered by local DNS server (that already knew the answer; cached)
- Non-authoritative answer because it came out of local cache rather than getting definitive answer from umass server
- Iterated query
- Each step, different servers consulted – “I don’t have the answer, but I know where you should look”
- Originating DNS server does all heavy lifting
- Most commonly used
- Answer only ends up at local DNS server (because of space, query volume)
- (1) Requesting host (cis.poly.edu) -> local DNS server (dns.poly.edu): “I don’t know, I will find it for you”
- (2) Local DNS server -> root DNS server: “you need to go over to .edu”
- (3) Root DNS server -> local DNS server
- (4) Local DNS server -> TLD DNA server: tells where .umass is
- (5) TLD DNS server -> local DNS server
- (6) Local DNS server -> authoritative DNS server (dns.cs.umass.edu -> gaia.cs.umass.edu): talks to .umass and returns IP address
- (7) Authoritative DNS server-> local DNS server
- Recursive query
- Puts burden of name resolution on contacted name server
- Originating computer asks local DNS server, and it will find it for you – passes down to .umass
- Make each server along the chain do lots of work for you
- Answer generated at end, propagated through chain
- (1) Requesting host (cis.poly.edu) -> local DNS server (dns.poly.edu)
- (2) Local DNS server -> root DNS server
- (3) Root DNS server -> TLD DNS server
- (4) TLD DNS server -> authoritative DNS server (dns.cs.umass.edu -> gaia.cs.umass.edu)
- (5) Authoritative DNS server -> TLD DNS server
- (6) TLD DNS server -> root DNS server
- (7) Root DNS server -> local DNS server
- (8) Local DNS server -> requesting host
Caching, Updating Records
- Once (any) name server leans mapping, it caches mapping – lots of caching
- Cache entries timeout (disappear) after some time (TTL)
- Anytime you get an answer, it will be remembered (for a certain amount of time)
- TLD servers typically cached in local name servers
- Thus root name servers not often visited
- Cached entries may be out of date, but cached to avoid the trouble of getting the answer (best effort name-to-address translation)
- Mostly accurate, avoids unnecessary lookup
- If name host changes IP address, may not be know Internet-wide until all TTLs expire
- Fill DNS cache locally with good information, avoid necessary lookups if possible
- Update/notify mechanisms proposed IETF standard (RFC 2136)
DNS Records
- DNS: distributed db storing resource records (RR)
- RR format:
(name, value, type, ttl)
type=A | type=NS | type=CNAME | type=MX |
---|---|---|---|
|
|
|
|
DNS Protocol, Messages
- Query and reply messages, both with same message format
- Message header:
- Identification
- 2 bytes in size
- 16 bit number for query, reply to query uses same number
- Flags
- 2 bytes in size
- Query or reply
- Recursion available and desired
- Reply is authoritative
- Number of questions (2 bytes), number of answer resource records (RRs) (2 bytes)
- Number of authority RRs (2 bytes), number of additional RRs (2 bytes)
- Questions
- 4 bytes in size
- Variable number of questions
- Name, type fields for a query
- Answers
- 4 bytes in size
- Variable number of RRs
- RRs in response to query
- Authority
- 4 bytes in size
- Variable number of RRs
- Records for authoritative servers
- Additional Information
- 4 bytes in size
- Variable number of RRs
- Additional “helpful” info that may be used
- Sequence numbers
- Outbound query – sequence number incremented each time DNS query happens; match response with originating query and corresponds to the name
- Typical response: A few 100 bytes
- List cname, mx, etc. and info of who responded and which DNS server; if it was authoritative or non-authoritative
Example: nslookup
-
nslookup
-
www.cpsc.ucalgary.ca
- First looks at root level, then ends up at highest level
- Information from a cache
- Actual machine name is
web1.cpsc.ucalgary.ca
-
www.google.com</cde>
- Query <code>google.com, receive many answers
- Load balancing – uses multiple servers
- Numbers keep changing
- Ordering changed, for load balancing (round robin)
Example: dig
- Domain Information Groper
- Prints out resource records
-
dig gaia.cs.umass.edu
- qr – query response, rd – recursion denied
-
dig @cis.poly.edu gaia.cs.umass.edu
- Records kept for a few days at root level
Linuxconfig files
/etc/resolv.conf /etc/hosts /etc/protocols /etc/services
- Files used to resolve names
Inserting Records into DNS
- Example: new startup “Network Utopia”
- Register name networkutopia.com at DNS registrar (e.g. NetworkSolutions)
- Provide names, IP addresses of authoritative name server (primary and secondary)
- Registrar inserts two RRs into the .com TLD server:
-
(networkutopia.com, dns1.networkutopia.com, NS)
-
(dns1.networkutopia.com, 212.212.212.1, A)
-
- Create authoritative server type A record for www.networkutopia.com; type MX record for networkutopia.com
Attacking DNS
- Distributed Denial-of-Service (DDoS) attacks
- Bombard root servers with traffic
- Not successful to date
- Traffic filtering
- Local DNS servers cache IPs of TLD servers, allowing root server bypass
- There is so much caching, it’s hard to get a query to the root
- Bombard TLD servers
- Potentially more dangerous
- Redirect attacks
- Man-in-middle
- Intercept queries
- DNS poisoning
- Send bogus replies to DNS server, which caches
- Trick people into going to your DNS site
- Software might put in DNS cache; name mapped to bogus IP
- Exploit DNS for DDoS
- Send queries with spoofed source address:target IP
- Requires amplification
- DNS sec is secure
- DNS can be fragile
Section 2.6.1: P2P Architecture
Pure P2P Architecture
- No always-on server
- Arbitrary end systems directly communicate
- Peers are intermittently connected and change IP addresses
- Instead of special server machines, we have ordinary peer machine
- Any peer can be consumer, producer, provider, etc. and can talk to other peers
- Arbitrary end hosts choosing to directly communicate with each other; may be transient, Wi-Fi, etc.
- Examples:
- File distribution (BitTorrent)
- File sharing
- Streaming (KanKan)
- VoIP (Skype)
Peer-to-Peer Design
Real world | Internet |
---|---|
|
|
- NW Apps
- Client-server
- HTTP, DNS
- P2P
- File Sharing
- Get illegal copyrighted content from other people on the network
- Hybrid
- Need server-hardware stuff, but need lots of bandwidth from peers
P2P File Sharing Applications
- Early ones: central design (Napster)
- Central server keeps track of who had what content willing to share, and gave IP address to that person so you could get that content yourself
- Catalogue, dictionary, a list of the content and who has it
- Distributed is sending of file
- Later ones: distributed design (eDonkey, KaZaa, Gnutella…)
- Structured style (CAN, Chord)
- Explicit location for every piece of content registered
- CAN: content addressable network
- Tells what file/movie you’re looking for based on that name of the object, they do hash of name and gives you unique identifier and tells you where exactly the content is stored in the P2P model
- Unstructured style
- Content can be anywhere and you have to search for it
- Anybody can host any file anywhere in the system, must do explicit flood-based search to find out who has what you need
- Sends out query of all participating nodes, find whoever has it, pursue content
“Early” | “Modern” |
---|---|
|
|
In Class Example
- Actual DVD copy of Lord of the Rings with a size of 430 mb
- Puts up on server at home (upload speed 2 mb/s), anyone can download for free
- Person 1: access speed 3 mb/s
- ~30 minutes – only person who wants it, gets full speed on Internet link
- Person 2, 3, 4 now wants it
- Server has to split the time up somehow – 2 hours
- 50 hours to download the movie if running client-server (Slow)
- Split into 100 pieces, 100 mb each
- Each person has a choice of 8 pieces
- Person 1-4 gets 1/8 of the movie (4 minutes)
- Person 3 picks a piece Person 4 has – piece 7
- When they want a portion a peer has, they don’t involve the server – they do peer-to-peer transfer
- When all pieces are sent out to peers, server can be shut down and the rest can happen
- When someone finishes all the pieces, they can shut off their services or remain in the system to serve capacity
- How much time to distribute file (size F) from one server to N peers?
- Peer upload/download capacity is limited resource
- us: server upload capacity
- dj: peer i download capacity
- ui: peer i upload capacity
Client-Server | P2P |
---|---|
|
|
- Client upload rate = u, F/u = 1 hour, us = 10u, dmin ≥ us
- Client-server takes longer (linearly)
- P2P asymptotically lower bound (logarithmic)
P2P File Distribution: BitTorrent
- File divided into 256 kb chunks, peers in torrent send/receive file chunks
- Example: Alice arrives…obtains list of peers from tracker…and begins exchanging file chunks with peers in torrent
- Tracker: tracks peers participating in torrent
- Keeps track of name and IP address
- Torrent: group of peers exchanging chunks of a file
- Peer joining torrent
- Has no chunks, but will accumulate them over time from other peers
- Registrars with tracker to get list of peers, connects to subset of peers (‘neighbors’)
- While downloading, peer uploads chunks to other peers
- Peer may change peers with whom it exchanges chunks
- Churn: peers may come and go
- Once a peer has entire file, it may (selfishly) leave or (altruistically) remain in torrent
BitTorrent: Requesting, Sending File Chunks
- Requesting chunks
- At any given time, different peers have different subsets of file chunks
- Periodically, Alice asks each peer for list of chunks that they have
- Alice requests missing chunks from peers, rarest first
- Sending chunks: tit-for-tat
- Alice sends chunks to those four peers currently sending her chunks at highest rate
- Other peers are choked by Alice (do not receive chunks from her)
- Re-evaluate top 4 every 10 seconds
- Every 30 seconds: randomly selects another peer, starts sending chunks
- ‘Optimistically unchoke’ this peer
- Newly chosen peer may join top 4
- Server is a seed which has every piece
- Dynamic, unpredictable of what peer will provide
- No security or protection on corrupted or malicious files
- Tracker hosted on a site
BitTorrent: Tit-for-Tat
- (1) Alice ‘optimistically unchokes’ Bob
- (2) Alice becomes one of Bob’s top-four providers; Bob reciprocates
- (3) Bob becomes one of Alice’s top-four providers
- Tit for tat deals: mutual incentive for an exchange (I have something you want, you want something I have)
- Pieces: Who has the piece you need; if you find someone that also needs your pieces - reciprocate and mutual incentive
- Service (throughput): if someone gives you good/lousy service, you return the same
- First Piece Problem
- When Alice joins, she has no pieces; how to get started?
- In BitTorrent, there is something called choking and unchoking
- Tracker provided list of 20-30 peers for Alice, she will have an active exchange with about 5
- 25 of them are choked; 5 are unchoked
- Do dynamically over time (every so minutes); you will allow random trades with someone
- Will see what pieces they have and what service they have; if the new person is better than the previous, they will shut the worse service and go to the better, new one
- Random choking allows new people to participate in the system
- Rarest First Piece Selection
- Identify which piece is least prevalent (i.e. fewest copies) and try to make another copy
- Find the piece that is least popular, and make another copy of it, in case someone leaves with the only piece
- Tracker would notice this; whenever you download it, you report to the tracker (up-to-date view of who has what) and it calculates how many copies there are
- Alice gets optimistically unchoked, now has copy of the least prevalent piece in system. Soon, everyone will want Alice’s piece
- Bootstrapping: everyone will contact the new one because of the rare piece they have
- Dynamic, adaptive protocol
- Dynamic Bandwidth Probing
- Bandwidth Throttling
- Last Piece Problem
- Start to lose parallelism because only one or two pieces needed; now down to 1-2 peers
- Find someone who has the piece you need, who has optimistically unchoked you
- “Harnesses bandwidth”
- BitTorrent
- If you have to get it sequentially then must wait for piece 1
- Might get the next piece in time while you’re watching
- Solution: prioritize the first bits so you can watch the beginning of it, ensure you get the next ones coming up
- Small numbers -> large numbers
- Example: Many people start to watch the movie (good for buffer latency)
- Person 2 wants Person 1’s piece, Person 1 no incentive for reciprocation
- Add some randomization so people can reciprocate
- Also people altruistically give pieces (daisy chain of performance)
- Stratification Approach: someone watches the movie first, and can give the extra pieces to another person and so forth
Media Streaming Apps
- Chapter 7
- Continuous media objects
- Audio: small, low bit rate
- 100 kb/s or less
- Video: large, high bit rate
- Mb/s or more
- Live or stored
- Live: real-time viewing of an event (sports, etc.)
- Stored: old content pre-recorded and pre-formatted, watching it on an on-demand basis (Youtube)
- Needs to be viewed sequentially to make sense
- Semantic meaning to them – watched in linear order in sequential fashion that makes sense for viewers
- Strict timing and synchronization requirements
- Video has a certain frame rate at which videos displayed (30 fps)
- Creates illusion of continuous movement; if less, than you get more stop-motion
- Synchronization of video and audio
to another person and so forth – Stratisfication Approach
Application Layer Protocols in Retrospect
- Architecture: Client-Server (hybrid) P2P
- What paradigm, architecture is it using? Is it server or P2P oriented?
- Duration
- Short (transaction-oriented) vs long (session-oriented)
- Short
- Web (retrieve web)
- DNS (host name, tell me what IP is)
- Long
- Streaming (Youtube)
- FTP (connect to server, session in which you exchange control information and commands
- Reliability
- Perfect data vs unreliable data
- TCP-based (web, e-mail)
- UDP-based (streaming)
- Bandwidth needs
- Low vs high
- Low: e-mail (send a few kb of stuff)
- High: HD streaming
- Data volume
- Low vs high
- Low: web page
- High: streaming
- Statefulness
- Connection-less vs connection-oriented
- Connection-less
- No setup or teardown; just do it
- Need IP address, done in short duration
- HTTP/1.0 (simple, get web page)
- Connection-oriented
- Explicit setup, transfer, teardown
- Overhead of TCP setup
- HTTP/1.1 (might remember it’s you, keep cookies, open the connection)
- Directionality
- One-way vs two-way
- One-way: Youtube
- Two-way: Skype
- Style
- Pull vs push
- Pull: file on web server pulled from server
- Push: file on computer, pushed onto SMTP server
- FTP has both two-way and pull, push
- Both deal with one way or two way exchange of data; and which direction the data is moving
- Number of participants
- One-to-one vs one-to-many
- Who’s involved in the conversation/application
- Many are one-to-one
- Can be multi-party conversation (cideo conferencing, broadcast, multi-layer gaming apps)
- Security
- Yes vs no
- Most don’t have it as everyone assumed they were a good citizen but now they are building SSL into protocol