Courses/Computer Science/CPSC 457.F2013/Lecture Notes/Scribe2

Jump to: navigation, search

Note from Scribe 2: You may notice links to non-existent files - these are for diagrams Michael draws on the whiteboard. The wiki doesn't allow file downloads (or there is some bug with it or something), so they're not up here. I believe Michael will upload them to his site and link to them sometime down the road. In the meantime if you really need them, email me:

Scribe2/Other Information ← summary of what to read
#September 16th, 2013
#September 18th, 2013
#September 20th, 2013
#September 23rd, 2013
#September 25th, 2013
#September 27th, 2013
#September 30th, 2013
#October 9th, 2013
#October 11th, 2013
#October 16th, 2013
#October 18th, 2013
#October 22nd, 2013
#October 25th, 2013
#October 28th, 2013
#October 30th, 2013
#November 4th, 2013
#November 13th, 2013
#November 15th, 2013
#November 18th, 2013
#November 20th, 2013
#November 22nd, 2013
#November 25th, 2013
#November 27th, 2013
#December 4th, 2013


Other Information

Refer to the Scribe2/Other Information page for list of readings by date

September 16th, 2013

Back To Top

Lecture Notes

  • One week at a time material will be posted
  • Some slides are just talking points for the lecture not real content: they will be excluded
  • Scribe notes are mostly what is said in class including Q and A

Programs, how they become processes

  • Simultaneously a demo on command line
  • Showing VM VirtualBox
  • We can do whatever we want in terms of platform for our homework, but we will need to submit homework that works on this platform

Command Line Exercise

Covering common commands

Ordered List Of History
Objective Command Comments
Show history of commands
Too many lines
Pagenate output of history
 history | more 
press 'q' to quit
Produce a frequency distribution of command history
 history | gawk '{print $2}'
awk is the grandfather of perl
Count the output
 history | gawk '{print $2}' | wc 
Get information about a command
 man wc 
displays a "manual": information about the command wc.
Sort output
 history | gawk '{print $2}' | sort 
remove duplicates
 history | gawk '{print $2}' | sort | uniq -c 
includes count of each entry
Sort output numerically
 history | gawk '{print $2}' | sort | uniq -c | sort -nr 
n for numeric, r for reverse order.
Common commands
Command description
ll an alias for long list 'ls -l'
cat concatenate the contents of a file, output to command line
ps lists running processes
ps aux lists all running processes (those in square brackets are kernel threads)
pstree used to show hierarchical relationship
top gives a live updated version of ps
df gives info on disk usage
ifconfig tells you the state of network adapters
diff handy utility for programming. To produce a patch or to see the difference between two files.
kill sends a signal to a process, especially to kill a process.
objdump -d dissasembles object
strace [command] traces behavior of a program

What Are These Commands

  • Some are programs
  • Some are built in functionality


A way for the output of one command to become the input of another command


Looking again at addi.c from prev. lecture

Out notion of source code is one thing, but after a translation process the program becomes something different.

Ex: interesting behavior with Hello World program.

#include <stdio.h>

int main(int argc,
  char* argv[])
  printf("Hello, 457!\n");
  return 0;

October 16th, 2013

Back To Top

lecure notes start at 12:30 (refer to other scribe notes for the full lecture)

In the Kernel

Process Dives Into Kernel

Line 2076ish

                  #          #                                            
                  #          #                                            
                  #   P      #                                            
                  #          #                                            
              # Kernel  |                          #                      
              #         |                          #                      
              #         |                          #                      
              #         V                          #                      
              #       ******                       #                      
              #    ***     **                      #                      
              #    *        *                      #                      
              #    *  brk   ------->*******        #                      
              #    **      **     **      **       #                      
              #     ***  **      ** do-brk *       #                      
              #        ***       **       *        #                      
              #                   ***    **        #                      
              #                      ****          #                      

The 'read' System Call

Line 291:

  • the file structure has a read method: more delegation!
  • click on the 'read' link to search for it:

What's going on:

the file system

READ is an implementation specific to a file system.

Kernel reading tip: Don't give up! keep digging through the source code to reach the answer. You will not get the whole answer on one screen.

Question: Macro or System call?

How do you know whether something is a macro or a system call?

  1. toss a coin
  2. you just learn what's what
  3. it doesn't matter

Kernel is a big mixture of macros, function calls, assembly

Practical advice: click on the name and it will probably show you.

About the Delegation Pattern

  • Keeps things manageable.
  • consider sys_read: a nice API
    • it delegates to VFS
      • VFS to all sorts of different file systems
    • File system doesn't care about the physical organization

Kernel Memory Management

Talking a look at the 'grab' program

We're looking at the source code

  • "ignore the signal handler stuff"
  • "what the heck is a struct aircraft"
    • has a few member variables
  • glibc_test accepts a pointer
  • it allocates memory off the heap
  • then prints out what is stored at each memory address within a relevant range
    • Then we read in the hexadecimal numbers what the contents of our aircraft struct are
    • that is, we read "deadbeef", the ascii code for 'a', etc.

September 18th, 2013

Back To Top

Logistics & Preamble

Question Asking Protocol

  • If you want to ask a question but Michael has not noticed, get his attention loudly
    • This is an interesting OS concept: use the 'interrupt' policy instead of relying on Michael 'polling'


  • Recap your knowledge of x86: "A Tiny Guide to Programming in 32-bit x86 Assembly Language"
    • ~5 page refresher.
  • "A Whirlwind Tour .... "
    • an excellent illustration of all layers down to os
    • system calls
    • how a program becomes a process
  • Read in textbook:
    • MOS 1.3, 1.6, 1.7


  • So far we have learned a little bit about a few topics that do not seem to be connected
  • by the end of the semester we'll see how it all connects
Why Command Line Agility
  • To show off
  • Demonstrate documentation built into machine
  • Demonstrate wide range of commands
  • Give practical skill

Learning Objectives

  • GEt an intuition for what is an os
    • components
    • What's the kernel
    • problems it solves

What is an OS - the big fuzzy picture

  • An OS does not execute programs, it rather gets out of the way - giving resources to the process that you want to execute
  • manages multiple processes

system diagram to come

Kernel Is a:

  • piece of software
  • interface to hardware
  • supervisor: manage important events
  • protection - maintaining the imaginary divide between separate processes, etc.

Why Not Put All Code In Kernel?

  • Complicates kernel, makes it more error prone
  • Mistakes in your code affect every process in the machine
    • A bug in your code would crash the machine if it was in the kernel


  • Allows us to observe the system calls interface in action


write(2) does not mean calling function 'write' with argument '2'

  • The number distinguishes which of several identically named methods is being referred to. It is the section in the manual that the target method is in:

command line utilities: section 1

system calls: section 2

something else: section 3

Important For Homework

Understanding the system calls that we read as the output of strace for the 'h' hello world example

September 20th, 2013

Back To Top

Questions from last class

The big picture thing.

  • two processes cannot communicate without asking the kernel to help tham

File:CPSC457 2013 09 20.jpg

How and why are the blue dashed lines/divisions enforced?


  • able to distinguish between the execution environment for userland processes and ??? (see main notes?)

Example: silly.c

  • looking at object dump, thinking about what we have access to.
  • GPRs:
    • eax
    • ebx
    • ecx
    • edx
    • edi
    • esi
  • Other Registers
    • ebp
    • esp → stack pointer
    • eip → frame pointer
    • eflags


  • breaks memory into different parts
  • start (base)
  • limit
  • DPL - descriptive privilege level
    • two bits (00, 01, 10, or 11)

File:CPSC457 2013 09 20B.jpg

The DPL bits are the only things enforcing the isolation between user level programs and the kernel

Two Most Basic Responsibilities of Kernel

Live in Kernel memory and contain pointers to kernel functions (IDT) and priviledge levels of different memory segments (DGT)

  • SEt up the GDT table
    • Enforces protection between itself and user level processes.
  • Set up the IDT table (interrupt descriptor table)
    • Allocates memory and populates table
    • Refer to diagram: Network card sends exception to CPU (curved arrow). CPU refers to the IDT table and executes the indicated function.

Supplementary Reading

September 23rd, 2013

Back To Top

Making System Calls

  • If system calls are privileged functionality, then how can applications invoke them?

Write system calls can be called from c by including a library (see man page).

  • However, we see in the object dump that it's actually calling a glibc function, not the system call directly

So we'll write an assembly program to directly call the system call:

;; write.asm
GLOBAL _start
;; write the arguments
;; become DPL 00
;; do this by invoking a interrupt
mov  eax, 4
mov  ebx, 1
mov  ecx, mesg
mov  edx, 0xb
int 0x80
mov  ebx, eax
mov  eax, 1

INT is in a DPL 11 code page

The path of the write command:

  • DPL 11 accessable interrupt command sent to CPU
  • CPU refers to the interrupt table and executes appropriate action
  • Note: No direct path from program to kernel

When would you want to do this?

  • Really fast or small code
  • Not for drivers or interrupts (that's kernel stuff)
  • If you're on a system that does not have glibc or the support library
  • real answer: if you want to write malware

Learning Objective

If we had to write a program as above that removes a file, could we?

;;try to call unlink on ?
;;before proceeding with coding, echo "secret" > secret.text
;;this is what we are trying to remove with unlink
;;then look up details on unlink with man 2 unlink
;; unlink("secret.txt");


GLOBAL _start
filename db "secret.txt"
  mov eax, 0xA
  ;; a few more lines of code

File:CPSC 2013 09 23 A.jpg

September 25th, 2013

Back To Top

Today we look at a very important topic in operating systems: making programs run.

Process Abstraction

Continuing/completing the discussion: how does a program become a process?


A generic term referring to a running process, whether or not it is a process, thread, kernel thread. A job needing doing.
a program in execution. Has several characteristics
  • instruction pointer: what instruction will be processed next
  • CPU context: the state of the CPU
  • set of memory pages (process memory space)
    • contains the program code, static data, etc
  • dynamically allocated memory space
  • set of resources: files, metadata about process state (PID)
does not have the heavyweight stuff a process has, but rather is part of a process
lightweight process
in linux there is no distinction between a thread and a process - both are called "lightweight process".

Our VM's kernel code:


  • kernel code is C
    • but they have their own style & idioms that may take some getting used to
    • and it's not thoroughly commented - see style guidelines
line 1215 onwards: struct task_struct

This is the beginning of this task_struct

  • 1216: volatile long state controls if the process should be running (as per the comments)
    • not all processes run at the same time
    • not all processes are capable of using the CPU - it may be waiting on something
  • 1271: struct mm_struct points to data
    • M. Lacosto could probably teach the entire semester from task_struct and mm_struct
  • 1288: pid (we're familiar with this already)
  • 1344ish: the owner of the task - who spawned it
Process States
  • TASK_RUNNING: process is eligible to have the CPU i.e run
  • TASK_INTERRUPTIBLE: process sleeping (e.g. waiting for I/O
  • TASK_UNINTERRUPTIBLE: same as above, but ignores signals
  • misc: traced, stopped, zombie

File:CPSC457 2013 09 25.gif

History through Process States

the first process to start after os startup is init.

init cannot start new programs with execve() because then init would be overwritten and at the termination of the new process, everything has stopped

instead init uses fork(2) and in the 2nd copy of init executes execve() to create the new process

Question: isn't it a bit of a waste to completely duplicate init using fork(2) then immediately overwrite it with execve()?

Answer: the solution is to duplicate the process control block ONLY

  • remember the mm_struct in the kernel - only the pointer is copied rather than a deep copy
  • this works UNTIL the new process attempts to write to the destination of that pointer, at that point it performs a deep copy.
  • this is called copy on write

Question: what happens if a child dies?

Answer: the child stays around as a zombie long enough for parent to gather some last information

Question: what happens if a parent dies with children still around?

Answer: the child is re-parented (usually with init)

Some Coding Example

not quite complete:


//main with command line args
pid_t pit = 0;
pid = fork();
if(0 == pid)
  // i'm the child
  printf(stdout, "hey i'm the child, my pid is %d", pid)
else if(pid > 0)
  // parent
  printf(stdout, "hey i'm the parent");

September 27th, 2013

Back To Top


Much of what an OS does

Namespace management:

  • alloc
  • project
  • cleanup/recycle

  • CPU
  • memory
  • files
  • users

What is a Process

File:CPSC 2013 09 27A.gif

What makes these lines real?

File:CPSC 2013 09 27B.gif

Process Address Abstraction

How do you multiplex memory??

Every program believes that it (and it only) has access to the 0 → 3G range in RAM File:CPSC 2013 09 27C.gif

Problems with direct mapping addresses

  • you cannot give each process (~140 of them) free access to all RAM
  • if you did, then changed RAM hardware, would have to change program

opening up the cat process

lists the allocation of memory for a running process

or you can execute cat /proc/self/maps (where self is the current process - cat)

what's in there?

  • libraries
    • repeated if you need different memory regions (-- different permissions)
    • r-x for code
    • r-- for constants
  • heap
  • stack

Commands from the context of PAS

  • fork(): create PAS
  • execve() replace PAS comments

September 30th, 2013

Back To Top

Process Address Space

how does your program address a piece of memory?

What the MMU Does

Translates from logical to linear to physical:

Logical Address
[SS]:[offset] Has several parts:
  • segment selector
  • 32 bit offset
The segment selector is implied
Linear Address / virtual address
[Dir|Table|Offset] Made up of three parts:
Physical Address
Exercise: translating linear address to physical address by hand


Offset: 12 bits = 3 nibbles

Dir and table are 10 bits each

[0000 1000 00] [00 0100 1001] [1000 0101 1000]

base 10: [ 32 | 64 + 8 + 1 | 2048 + 64 + 16 + 8 ]

[ 3210 | 7310 | 216410 ]

Returning to discussion of MMU

TLB caching speeds up the translation process which otherwise would be relatively costly because it requires 3 table lookups for each translation.

TLB stands for Translation Lookaside Buffer

Page Table and Page Directory Details

There is a page table and page directory for every running process

  • They are part of the kernel's memory

October 9th, 2013

Back To Top


Test 1

  • Dr. Locasto is stoked on the results: you never see such well distributed grades!
    • Pleasing spread
    • Also a number of people who did do quite well
  • For our individual scores: don't freak out. Remember that this is points out of the 1000 that you can score in the semester.
  • The test indicates that half the class is not necessarily cluing in to what Dr. Locasto is saying.
    • Some people didn't know enough
  • Some people were not precise enough in their answers
Exit Times and Scores

There doesn't seem to be a correlation between the two.

Conclusion: take your time if you need it.

Homework 2

Problem 1: testing our understanding of manipulating process address space in memory regions

  • generate and execute code at run time
  • straightforward if we get the right set of systems calls involved
  • might be useful to brush up on x86 assembly language

Problem 2: a number of questions worth small amounts of points

  • This is to get us looking in the right direction
  • Look at the linux kernel code to help solve the problem
  • there will be some twist that Dr. Locasto won't discuss yet

Problem 3: "The Meat"

  • we will extend basic kernel functionality
    • add system call that marks process in a certain way
    • if process marked, kernel fails all syscalls from that process
    • Use Robert Love's textbook
  • we will have to assess what we have done
  • The really tricky part is what you need to do in the kernel to filter the processes
Other Tips & Comments
  • start early and ask questions
  • we will have to learn through this assignment
    • In one month, hopefully 93% of us will be able to do this assignment
  • We will get little bits and pieces, aspects of assignment related material in class - but don't wait for that to complete the assignment.
  • The thing that makes this assignment difficult is that we are coding in an unfamiliar development environment: the kernel.
  • "There's homework 2. Enjoy"


Class exercise: two questions.

  1. If we didn't have computers, would we still need operating systems?
  2. What is the difference between
    • protected mode
    • kernel space
    • superuser mode
    • root user
    Do root's processes run in ring zero?

Q1: Would we need OS'

principles, concepts to multiplex resources to maximize utility.

  • In a business you need an "operating system" to multiplex human resources between projects, and restrict access to file folders, etc. in order to maintain confidentiality borders between projects.
  • politics: multiplexing $$
  • Library: multiplexing books

In this course we will learn the concepts related to managing complex systems. Using a computer is just one such case.


  • protected mode (one of several modes the architecture can be in)
    • student answer: A mode the CPU runs in that doesn't let certain things be accessed
    • prof answer: A mode in the architecture that supports isolation between PAS
  • kernel space
    • student answer: the space in which a kernel dwells
    • prof answer: memory segments with certain dpl bits
File:CPSC457 2013 10 09.gif
  • supervisor mode: ring 0, have access to privileged instructions
  • root user: user account
    • Just a little more special than the other users

Most processes do not run in ring zero

What's In The Kernel

File:CPSC457 2013 10 09B.gif

We should be able to list the components and describe how they relate to one another.

Memory Page Replacement Algorithms


October 11th, 2013

Back To Top

User Level Memory Management

  • API
  • Algorithms and data structures for user level memory management

The Problem

support growing and shrinking memory allocations (ie dynamically allocated memory)


  1. First-fit
  2. Next-fit
  3. Best fit
  4. Worst fit
  5. ?

Process Address Space

      #              #                                                    
      #              #                                                    
      #              #                                                    
      # PAS          #   memory region                                      
      #              #                                                    
      #              #                                                    
      # VM           #   memory address translation                                       
      #              #                                                    
      #              #                                                    
      # physical     #                                         
      #              #                                                    

Memory allocation in C

man malloc

malloc returns null if the OS could not / did not give you any memory

A Program That Does This Stuff

It's a linked list program - when new nodes are made, malloc is called.

Each node is 12 bytes long:

  • 1 for the char
  • 4 for the long
  • 4 for the pointer
  • 3 for padding so total is a multiple of 4 bytes

We want to check if the addresses of the next pointers are in heap.

  • ps aux | grep <program_name> to find pid
  • pmap <pid> to find map of memory for process
  • more stuff

Use strace to see what happens in terms of system calls while llist runs

  • We see that other than some initial setup, while the process runs the OS is not involved.
  • lesson no. 1: malloc does not directly manipulate the heap
  • glibc does it's own internal memory management.

What is happening?

  • memory allocated to process in page-sized chunks of memory
  • that process can do what it wants with it.

Back to the main point:

  • the question is how memory is dynamically allocated.

Dynamic Allocation as a Narrative

  • program asks for 12 bytes
  • OS gives bytes 0 through 11
  • program asks for another
  • OS gives bytes 12 through 23
  • a second program asks for 1k of memory
  • OS gives bytes 24 through 1047
  • program one frees second 12 bytes of memory: bytes 12 through 23
  • program one asks for another 12 bytes
  • OS has a tough decision: allocate at the end, byte 1048, or back at byte 12?

Then imagine 140 instead of 2 processes, doing this for days on end.

Problem: fragmented heap.

Solution: one of the 5 that were alluded to at the start.

Solutions cont'd

  1. First Fit
    Start at byte 0, move upwards, allocate the first block of the right size
  2. Next Fit
  3. Best Fit
    • finds all possible fits, allocates the one that is the tightest fit.
  4. Worst Fit
    • As above, but allocate the one that is the loosest fit.
Analysis of Solutions
  • Best and Worst: slow
  • All: listening to the user (developer) is dumb, because the developer asks for unusual sizes of memory.

This causes two problems:

Internal Fragmentation

See below

External Fragmentation

It is possible to have enough memory that if it were contiguous a serious memory request could be met - but since it is fragmented, it can be useless. Eg. 1G free in 10 byte chunks.

Improved Solutions

Bucket system:

  • A small bucket, a medium bucket, a large bucket.
  • Problem: give the user program 24 bytes when it asked for 12. now there are 12 bytes unused but allocated.
    • This is internal fragmentation

Looking at the brk command

Diving into the linux kernel

search for brk command:

  • way too many results

tip: system calls are named sys_NAME, such as sys_brk.

search for sys_brk

tip: sys_brk is just a symbol that refers to a location in kernel text. how that is associated with functionality requires a macro.

  • this macro has 6 versions
  • search 'DEFINE1(brk'
  • voila: the right one is the 'mm/mmap.c, line 246 (98%)' link.
Reading the Kernel

We can read this (almost) without Dr. Locasto's help, says him.

  • If you take it one line at a time, you can probably understand the semantics.


goto out;

if used in a right way is a great way to avoid code replication.

October 18th, 2013

Back To Top

Kernel Memory Management

         #           #                                                    
         #           #                                                    
         # malloc(3) #                                                    
         #           #                                                    
         #           #                                                    
 syscall #####################################################            
         #                                                   #            
         #  brk(2)    mmap(2)                                #            
         # |      \                                          #            
           |        \                                                     
 kernel  ##|#########\########################################            
         # |          \                                      #            
         # +===========+                                     #            
         # |           |                                     #            
         # | brk imple-|                                     #            
         # | mentation |                                     #            
         # |           |                                     #            
         # +===========+                                     #            
         #                                                   #            
         #                                                   #            
  • Why don't we code with the brk command?
    • We use malloc instead - why? (or new, or ... etc)

Malloc vs brk

void *malloc(sizet n)
  • malloc deals with pieces of memory that are smaller than one page - the size of the particular type that I am manipulating.
  • brk deals with page sized chunks

  • Recall the aircraft struct from last class - it has a certain arbitrary size.
  • let's allocate a new one with malloc:
    • malloc(sizeof(struct aircraft));

Does Malloc Call Brk? When?

  • Student answer: it calls brk when malloc runs out of page to allocate and needs a new page
  • malloc hands you a pointer to somewhere in the heap
  • then you can (with pointer arithmetic) access anywhere in heap
    • you're stopped if you access beyond the heap on either side
  • if you've used all of heap, then malloc calls brk to increase the heap

Malloc Implementation

  • we could implement our own memory manager.
void *give(site_t n);
void recycle(void *addr);

In psuedo-code:

void *give(site_t n)
  // needs to obtain memory
    // detect that you are uninitialized
    // call brk
  // needs some sort of data structure to track allocated blocks
  // finds n contiguous "available" bytes, return address of those bytes
    // choose from: 
    // first
    // last
    // best
    // worst
void recycle(void *addr)
  // do some error checking
  // need data structure to track allocated blocks
    // mark this as not allocated - addr and the number of following bytes
  // or keep track of a free list

The Bucket Allocation Idea, Revisited

Question: how does the kernel allocate memory for itself.

It does two important things:

  1. supports user space by implementing mmap and brk
  2. it's own internal memory allocation: kalloc
    • technically interesting details that we wont be expected to regurgitate on a test
    • has a generic piece, 'zone allocator'
    • buddy algorithm responsible for maintaining bucket allocation approach
    • also SLAB (original), SLOB (something else - uses next fit approach), SLUB (embedded systems)
    • maintains 11 buckets containing pages
      • If one bucket is all used up, it borrows from one bucket larger (and splits)
      • then when freed, it coalesces the memory to put back to the larger bucket size

Student question: malloc needs to malloc to have data structures. How?

  • actually, it's fancier than that:
    • The metadata about what blocks are free are actually embedded in the heap

Student question: what happens when kernel out of memory?

  • You're hosed.
  • Fork fails
  • "if the user does something ridiculous, you will have a failed system"

October 22nd, 2013

Back To Top

Recap: What have we done so far?

  1. What is an OS
  2. What is a process
  3. kernel roles
  4. mem management
    • user, kernel
    • internal, external
  5. virtual memory
  6. illusion of unlimited CPU/Mem
  7. system calls
  8. paging
  9. Protection
    • protection
    • system architecture

Suggestions for Assignments

  • Encouraged to ask questions in class, office hours, or on Piazza

October 25th, 2013

Back To Top

System Startup Leads To...

Transition from sequential code execution to concurrent execution:

  1. initializes the scheduler
  2. creates two processes (via do_fork and via sys_execve)
  3. enables interruts
  4. allows scheduler to choose between processes
  5. one more step (see course slides)

The "problem" of scheduling

Example: Dr. Locasto has 5 grad students who all want his attention - they all show up at his door at the same time.

Grad students:

  • R
  • Sa
  • St
  • B
  • A


  • Pick at random
  • first come first served
  • priority / urgency / favourites
  • Estimate workload and choose the one that would take the least time
  • Alphabetically
  • save history of service
    • Or work on nearly completed task, or one that is behind, etc.
  • Divide time into equal slots and rapidly switch between them - round robin

How this relates to the big picture

  • This is the challenge of scheduling that is faced

SCHEDULING: a mechanism that creates the illusion of sumultaneous execution of multiple processes

Dr. Locasto's Computer


  • running powerpoint
  • talking to external devices
  • etc.

Human beings can't do this well, but CPUs can!

Considerations for Scheduling

  • Fairness
  • Priority
  • Efficiency of scheduler
  • Quantum -> process should be allowed to run as long or longer than the scheduler takes to make a decision about who's next
  • Preemption
  • PRIMARY CONSIDERATION: progress through the aggregate work of all processes
  • If too short, context switch overhead is high
  • If too long, concurrency disappears
    • my_stopped_time = N * quantum.length // where N is the numberof processes running on the machine
  • Interactive applications should be visited more frequently so that it does not become unresponsive

The kernel can interrupt a running process at any point in time or when its quantum expires

  • Enables: "do something else more useful"
  • preemption does not imply suspension
  • preemption in linux 2.6 can happen either in user mode or kernel mode execution
  • "This is probably one of the coolest parts of the kernel"
Note: Process List vs Run Queue

The list of all processes is logically distinct from the data structure used to organize processes into groups ready to be scheduled

The concepts are different, but implementations may share a data structure or be two completely seperate lists.


Most scheduling approaches / algorithms can be boiled down to the concept of priority

  • The 'nice' command allows you to adjust the priority
    • it does not allow you to set the priority, it's just a parameter considered in priority setting
    • default nice value for a process is zero
  • e.g. the task for the window tha you have in the foreground is sometimes marked as higher priority
Types of Processes
  • CPU-bound (eg lots of calculations) OR I/O bound (lots of time waiting on external device read/writes)
  • interactive OR batch
    • A batch process can be I/O bound or CPU bound

Conflicting Objectives

  • Fast response time (process' seem responsive)
  • Good throughput for background processes
  • Avoid "starvation"

Scheduling policy: rules for balancing these needs and helping decide what taks gets CPU

Scheduling Algorithms

  • First Come First Served
  • Shortest Job Next
  • Round-robin
  • Some simple examples: R, A, B

Artificially Simplified Problem:

quantum = 5 work units
context switch = 1 work unit

                R        A         B
              =====    =====     =====
arrival time    0        0         0
work           10        3        15

Scheduling Policy:

  • FCFS
  • asess a tie with arbitrarily choosing one process
Running history:

Process   Running Time   Running Total
          (work units)   (work units)
=======   ============   =============
 A          3              3
context switch             4
 B          5              9
context switch            10
 R          5             15
context switch            16
 B          5             21
context switch            22
 R          5             27
context switch            28
 B          5             33

Note: the exit system call calls the scheduler

October 28th, 2013

Back To Top

Scheduling Algorithms

  • First come first served
  • shortest job next
  • round-robin

Some Simple Examples

         A   B   C
        === === ===
arrival  0   0   0
work     1   2   3

Comparing Scheduling Algorithms

What metrics do we use?

  • Wait time
    • how long from when task arrived until it was serviced
  • Turnaround time
    • Time from when task starts to when task is complete

Linux's Scheduling Support

Linux supports several different policies:

#define SCHED_NORMAL 0
#define SCHED_FIFO   1
#define SCHED_RR     2
#define SCHED_BATCH  3

These flags are used to select the type of scheduling

Implementing Scheduling

"who gets to run next?"

The 2.4 scheduler - a big loop - O(n)

  • iterates over the list of all tasks
    • calculates goodness
  • goodness
    • takes nice and a few other heuristics and returns a weight

Problem: it is a O(n) algorithm, doesn't perform well with large numbers of tasks (eg on a huge server)

The 2.6 scheduler: a priority queue - O(1)

  • maintain a priority queue
  • the work is done when a new process is added

What is it doing?

Maintaining "multi-level feedback"

  • every CPU has a run queue
  • in the queue there are a few fields pointing to a few arrays
    • a list of active entries - those who have not yet run
    • a list of _____ entries - those who have already completed
  • these arrays are about 140 entries long
  • The policy decision is determining which slot (array element) to put a task in.

Simple Problem

B                Round Robin (RR)
D               |                                              
E               |-----|-----|-----|-----|-----|-----|--->
.               |                                              
  • What if one user owns A-Y and another owns Z. Then owner one will get way more processing time.
  • There is a relation between the number of processes and the unit of time

  • fixed quantum - each process gets a useful amount
  • variable quantum - everyone gets a slice, but following the graph below, as N number of processes increases, the total amount of useful time decreases
                  Quantum - N relationship                                    
        # *                                                               
        # *                                                               
        # **                                                              
        #  **                                                             
        #   **                                                            
        #    **                                                           
  %cpu  #---------------------------------------                          
        #      **                                                         
        #       ***                                                       
        #          **                                                     
        #           **                                                    
        #             ****                                                
        #                **********************                           
        # --------------------------------------                          
                   N # of proc                                        

Whiteboard Notes

Policy vs Mechanism

  • scheduler structure
  • other areas:
    • mem management
    • disk
    • other resources

Process Lifecycle

      \                                                          /|       
       \                                                        /         
        \|                                                     /          
        -+                                        #############           
         ###########----------------------------> #           #           
         #         #                              # Running   #           
         # Ready   # <----------------------------#           #           
         #         #                              #           #           
         #         #                              #############           
         ########### +-                           /                       
                     |\                          /                        
                       \                        /                         
                        \                      /                          
                         \                    /                           
                          \                 |/                            
                           \ ############## +-                            
                             #            #                               
                             #            #                               
                             #  Sleep     #                               
                             #  Wait      #                               
                             #            #                               

October 30th, 2013

Back To Top

Concurrency As A Problem

           Pete                                   ############            
         ###########                              #          #            
         #         #                              #          #            
         #         #                              #          #            
         #         #                              #          #            
         #         #                              #          #            
         #         #\                          _  ############            
         ########### \                         /|                        
                      \                       /                          
                       \         Bob         /                           
                        _| ##################                            
                           #                #                             
                           #                #                             
                           #   the shared   #                             
                           #    buffer      #                             
                           #                #                             
                           #                #                             

Critical section: you could think of it as code - it is code. But a better definition is to describe it as a shared state between.

→ the question is: "is the data involved in that section of code manipulated by multiple threads?"

Race condition

Avoid it through mutual exclusion - don't let multiple threads access a critical section at the same time. That one task moves into the critical section, executes the code, and when complete there is the opportunity for another task to move in.

Command Line Concurrency: history | grep "awk"

The pipe is actually a shared buffer. Whatever history produces get put into the shared buffer, then grep attempts to read from that pipe.

     history                                                grep "awk"    
       |                                                        ^         
       |                                                        |         
       |                                                        |         
       |                ______________________________          |         
       |            ****                              **        |         
       |          **    ***                            **       |         
       |          *       **                            **      |         
       +--------------->   *        the pipe             **-----+         
                  *        *                             **               
                  **       *                             *                
                   **     **                            **                

This works because the os is delegated to to provide mutual exclusion to the critical section

Practical Concurrency Examples

  • Shell I/O redirection
  • command line pipes
  • named pipes (special files called FIFOs)
  • Courses/Computer_Science/CPSC_457.W2012/<some_file_name_I_didn-t_have_time_to_write>

Multithreading with fork to demonstrate this issue

How can we multithread without anything fancy?

  • fork? not really, each has it's own process address space
  • clone? yup. it exists to create another thread of control that executes in the same process address space.

This difference really illustrates the concept of a critical section.

Potential Solutions

  • spinlocks / busy waiting (wastful b/c CPU is busy doing "nothing")
  • lock variables (recursive problem)
  • disabling interrupts (not trustworthy, multi-CPU systems unnaffected)
    • do YOU trust programmer Pete to re-enable interrupts? not really

Much like the architecture provides protection and other things, it provides support for multi-threading which is the only way that modern computer works. MOS p.131 has code for correct entry and exit from a critical section of code

November 4th, 2013

Back To Top

Finishing our discussion of user level concurrency

Using Locks VS Not using locks

Unsafe Race

  • The routine has a for loop that counts to 2 billion
  • We want to see how fast it is
→ 0 s

Safe race is the same program but it uses locks

→ 4 s

What to learn:

  • well this is actually worth it
  • we see above that locking is "expensive"
    • one heck of a slow down
  • so make critical sections as small as possible - but don't sprinkle them all over your code
    • balance making them few and making them small
  • DO the locking, but realize you're imposing a performance cost.

Critical sections are slower than unsafe code, but of course unsafe code is unsafe.

Kernel Syncrhonization

Kernels are naturally concurrent

  • servicing many requests from processes
  • servicing many requests from hardware interrupts
  • these requests are not independant

BIG PICTURE: how can the kernel correctly service requests?

It is an event handling system: any function could be executing at any given time.

  • you have multiple control tasks all in different states, all being interwoven
  • below are the major concepts to take away:
Atomic Operations in x86

What does the x86 platform offer us in terms of ...?

Kernel Locking / Synchronization primitives

that is, the "API" that linux offers:

  • Atomic opperations
  • disable interrupts (cli/sti modify IF of eflags)
    • assembly instructions for this - on a per cpu basis
  • lock memory bus (x86 lock prefix)
    • this enforces system-wide serialization
  • spin locks (basically a mutex)
  • semaphores
    • an important skill to take away from this course is if you should use a mutex or a semaphore
  • sequence locks
  • barriers
    • the spin lock or semaphore makes sure one task or thread controls within one critical section, but a barrier accumulates them

Note: remember that if you understand the key data structures in the linux kernel - what it is, what it does, and what operates on it - you would understand everything you need to know for this course

down_read and up_read will be useful for homework 2 problem 2

  • totally voluntary - tells system that you're only reading
Spin Locks

Caution: we're looking at many layers of definitions. This may seem frustrating at first - but it is a good thing. It means there is a generic form that is adapted to each platform.

slock = 0 // locked

slock = 1 // unlocked

Spinlock API: After many layers of definition and wrapping we get down to assembly code. We should know the semantics of this:

  • try to aquire the mutex
  • test if you have it
  • if you do, carry on
  • if not, keep trying until you do
Kernel Preemption
  • the kernel can even preempt itself
  • treats readers and writers the same way intitially
  • you enter a critical section where you will manipulate data
  • when writers come in, they read and copy the data, operate, then update it when they're done.
The "big kernel lock"

in the beginning of the early days of the kernel there was one mutex

  • allows people to be lazy about determining what their critical section is


  • The kernel has to do synchronization
  • it handles it on its own based on what the machine offers.

November 13th, 2013

Back To Top

Course Scheduling

The home stretch begins on Friday and should be useful for homework 3


  • OS support for generating & delivering signals
  • case study of 'ptrace'


  • A simple, limited API for sending messages
  • basically a number delivered by OS between processes
  • source of signal
    • the user
    • another process
    • the kernel
  • relatively easy to know the semantics because it's a small, simple API
  • because the semantics are so well understood and standardized, you can implement a reasonable response to most signals.

'kill' process Sends A Message

  • it doesn't really kill a process, it just sends the message for a process to kill itself

Locasto ssh's into a lab machine...

  • several of his grad students are on the machine
  • after examining who's running stuff ('w'), the process list ('ps aux'), he chooses one process owned by another user to kill
  • he fires up 'man kill' then 'man 7 signal' to determine that he will have to use 'kill -9 <pid>' to kill process with pid <pid>
  • but the "Operation is not permitted" when you attempt to send the kill message to a process owned by another user
  • Side Note: When your ssh client has hung, sending the SIGHUP will instruct it to 'Hangup' and exit in a graceful way.
  • Most processes don't have custom signal handlers for extra signals
    • signals are convenient and well-defined, but only useful for sending information about exceptional conditions

How does the kernel handle the signal delivery task?

  • Figure 11-2 from (???) shows the control flow of catching a signal.
    • a signal handler is just a function
    • once it has finished executing,
   USERLAND                KERNEL SPACE
   ========                ============                                          
   start here                                                             
         |                                        P1 executes                        
         |                                        at X it jumps to the kernel                        
         |                     X                                          
         |                     |                                          
         |                     |                  which returns to the signal handler                        
         |                     |                                          
         | P1's                Y                                          
         | signal                                                        
         | handler                                                    
  _~~~~> X   Y                                                            
 (           |                                                            
  )          |                                                            
 (           |                                                            
  )          |                                    and then.... ? back to kernel,                         
 (           ?  ~~~~~~~~~~~~~> ?                                          
  )                                                then back to X - but should you continue?                        
 (_                          kernel                                       
  • can we handle exceptional conditions in a more graceful fashion?
    • the safest, although unsatisfactory answer is to exit, then 'pick up the pieces later'

Ptrace: a case study

  • it's a very versatile and useful system call
  • it wraps a complex api
  • it offers an interface for the process calling ptrace to communicate with another process indicated by pid
  • ptrace is a bad system call because it does a lot of stuff
  • it is a useful system call because it purposefully breaks the boundary between processes
    • with ptrace you can access all parts of another process - including system calls, etc.
  • if your process wants to ptrace Locasto's, and has the permission to do that, it asks the kernel to send signals to Locasto's process
    • things like gdb are built on top of this
  • You can read and write any address in another process' address space
    • refer to the man page for the full details on how to do this
  • Allows you to step between system calls. That is, the process runs until it gets to a system call, then stops. etc.
  • Or even, you can singlestep a process line to line
  • Q: isn't this a security risk - for malware
    • A: the user ids need to match - though root can ptrace anything. And it's also very useful for security.
    • At their core, debugging, malware, and security are the same task: monitoring and manipulating the innards of a running process.

sample code - something that uses ptrace


  • Dr. Locasto will put this on the wiki also.

side note: we could write strace right now using ptrace.

November 15th, 2013

Back To Top

"Welcome to the last sprint. "


  • USRI on Dec 2
  • Final Exam: Dec 18, 8-10 am

Persistent Storage - The Problem

Files. We've all seen files, likely made files. We interact with them daily.

What is behind them? How do we manipulate them?

We'll go through that today and end with directories (a special type of file)

We'll be traversing several layers of abstraction.

  • from userland utilities,
  • to the underlying system calls,
  • and what do those system calls do?

Q: How does the kernel provide support for managing persistent data?

A: The "file" abstraction

~ In linux, generic wisdom states that everything is a file.

Activity: writing things on a piece of paper

  • everyone writes a sentence on a piece of paper and passes it to the center row
  • Things noticed about these papers:

  • Dr. Locasto asked for someone to say what their sentence was. We got "Monkey Time"
  • he instructs the paper holding guys (two of them) to find the paper with "Monkey Time" on it.

  • Finding the papers is hard
  • the issues are the same as for file management: (full list on course notes ppt)
    • how do you find your piece of paper? - needs a name
    • how do you retrieve the 3rd word of your sentence. The 75th?
      • For "Monkey Time", there is no 3rd or 75th word
      • And a linear search is slow
      • Filesystems do this
    • how do I find duplicate sentences
    • how do I handle multiple requests for the same paper?
    • what if we had 1450, 14500, 1.5 million papers?
    • what if we were not co-located?
    • who can read or write?

Another angle: Answering why do we need files?

  • virtual memory isn't big enough
  • useful data needs to persist beyond the power cycle (ie reboot)
  • THE KEY THING for us to understand today is what the problem of persistent storage is.
  • → how do you find files, assert ownership control, dynamically grow and shrink the file, and do this in an efficient way

Persistent Storage - Some Definitions

  • What is a file?
  • What is in a file?

Class brainstorm guided by Locasto:

What is in a file?

  • relationship between contents
  • a "container"

What is a file?

  • bytes/data
  • metadata
  • header and 'list of allocated blocks'

... well these are not the same answer. What is the real answer?

Pretend that the paper stacks from the monkey time activity are disks, how do you point to a single file? You need to know the physical location.

Back to brainstorm:

  • Address/identifier
  • physical location
    • bytes of content

We need to handle collections of bytes in a collection of blocks (think: pages)

So to "point to a file" you would have to point to a lot of locations: picture using both hands, every finger pointing somewhere else.

How do we manipulate those bytes?

  • we don't want to see all files as a sequence of raw bytes.
  • we are familiar with a wide variety of file types.
  • the windows solution is to arbitrarily assign a three letter extension to differentiate between file types
  • in the unix world one way is not to mess with the file name, but to do something to the file itself.
    • the 'file' utility determines the type of a file.

Suppose we have a file named A. It's 4 mb. What type of file is it?

  • It would be quite difficult to determine it without knowing - it's a matter of language recognition.
  • remember looking at ELF files - the characters 'ELF' are written at the start
  • This task is hard or at times impossible depending on the file type

File Monipulation

  • the course notes has a long list of useful files for manipulating files from the command line.
  • 'chattr' and 'lsattr' are used for changin file attributes
    • most files do not use this set of attributes
    • but useful trick: you can chattr +i a file to make it immutable, even by root. This is good if you have a lot of management scripts running that are liable to delete files you want.

Looking at output of 'ls -l' unix file systems have a file owner and a group (which is supposed to be a collection of users)

the protection bits apply to user, group, other. Full read, write, execute for user, group, and other is: rwxrwxrwx

  • the right three are for world, middle three for group, left three for user
  • user and group are listed to the right of the permission bits in ls -l
 u   g   o
rwx rwx rwx

The 'd' at the left is to mark if it is a directory (d) or not (-)

Back to OS The operating system is responsible for enforcing these protections.

November 18th, 2013

Back To Top


  • recap
  • file ssytems
    • motivation: "file holes"
    • the file "address space"
    • indexed allocation

File Systems Continued

File systems and how they support the illusion of contiguous files.

What we may not realize as userland programmers is that there are many layers within the kernel required to deal with file system stuff.

Key Question: How can the kernel efficiently index and locate file content?

Many of the same issues that come up in memory management are present with persistent storage.

Put yourself in the shoes of an operating system developer. You're thinking about developing your own operating system. What you want to do is keep consistent with some of the linux patterns. You are therefore locked in to the linux file interface.

Dr. Locasto collected some data about one of his workstation computers, determined that the majority of files are a few kilobytes in size, and there are a fewer larger files.

Two questions arise:

  • Where is the content for these files?
  • How can you keep track of the space needed for growing these files?

Demo: lseek past the end of a file

The program writes some text to a file, seeks 1GB into the file, then writes a little more.

ll -h
shows that the file created is in fact 1GB
df -h
shows that essentially the same amount of space is taken up on hard drive as was before the 1 GB file was created.

This implies that there is some sort of mapping being done.

How do I do this mapping?

Each answer to this is a different file system. If you list the contents of fs in the linux kernel, you will see a long list of filesystems.

Clarifying a concept:

Directory Structure

  • we have a notion of files as being stored in a directory tree.
  • in linux, the root is denoted by '/'
                  |    |    |      |    |      |           
                 usr  bin  etc   var  dev    home                                                                            
                                            (and so on)                                                                     

You probably have multiple file systems attached to your directory tree.

  • and you can attach any file system to any point in your directory tree.
  • the integration is seamless - what filesystem is being used is meaningless to you as a userland programer or user.
  • use 'mount' command to show what file systems are currently mounted.
    • read the man page if you want to

What is a file system?

Note: File System, NOT FileSystem.

Has the same responsibilities of a memory manager.

Question: where is the metadata stored?

  • Note: even directories are files.

We have a data structure called inode that remembers important things for us.

  • unique identifier
  • pointer to contents of file
  • operations depending on filesystem

Hard Link and Soft link and inodes

using command

soft link refers to a new inode linking to the same content

hard link refers to the same inode linking to (obviously) the same content

Making Access Fast

The secret is multi-level index schemes with triple indirect addressing.

This is informed by our understanding of common file sizes

November 20th, 2013

Back To Top

Virtual File System - Conceptual

imagining what the 'cat' utility would do...

cat "/home/eye/file.txt"
    fopen(     )
      open("path", ... , ...)
      [open] [read] [write]
          →generic I/O
  • the VFS is an internal API
    • it's the one that deals with "what file system is this stored in"

VFS from kernel code

Now Dr. Locasto will attempt to show us that the above whiteboard diagram is in fact what actually happens. The goal is to leave today with some confidance in reading this code.

An inode is:

  • a generic concept,
  • a reference to the actual data
  • it is a central data structure that allows an os to reference a particular file
  • but it's not the only data structure
  • the dirty secret is that there are several versions of inode
    • there is an instance of inode created in kernel memory when a file is open
    • but if this was the extent of the inode life, that wouldn't be useful
    • so there are specific versions of inode that exist on disk - physical bytes on disk
      • EXT4 inode is around line 435 of fs/ext4/ext4.h

The 'open' system call

fs/open.c line: 1053

don't get confused by the syscall_define macro.

Then we go to line 1031 for 'do_sys_open'

  • the filename is turned into a file descriptor
  • we do_filp_open
    • for loading the necessary metadata - mapping between the struct file and the struct d
  • if do_filp_open succeeded, we do 'fd_install'

Then we go to line 1018 for 'fd_install'

  • adds the metadata to the list of open files at a certain index fd
    • now the file metadata can be accessed from the integer fd

Then we go to line 1669 for 'do_filp_open'

  • this is where the work for opening is done
  • we're not going to trace it, it's complex
  • the point: the reason that this is so much more work than the rest of the functions that we looked at is that this is where the 'rubber hits the road' and we're interacting with the hardware.

From 'open' to 'read' to 'close'

The 'read' system call

fs/read_write.c line 372

Again, not much work.

  1. vfs_read - we're delegating to the virtual file system

To vfs_read on line 277

  • if file->f_op->read is not null....
    • which means, if the function pointer for 'read' in the f_op function pointer table contained in the file struct is actually there...
  • then do it
  • otherwise, do the kernels default read: do_sync_read

To 'do_sync_read' on line 252

  • we go to the same file->f_op function pointer table, but look up aio_read (asynchronous input output)
  • this goes to ram

Back up to 'vfs_read'

  • read is a function pointer to a file system dependant read function
  • to find it, we'll choose any of the file systems (ext2) and look at its read function

To fs/ext2/file.c line 45

  • file operations table that we see before
  • it turns out that 'read' calls do_sync_read, which we know calls aio_read
  • this is because ext* file systems are the native file systems for linux
  • for other file systems, some other read function would be provided

November 22nd, 2013

Back To Top

Learn about particular features of file system: extended attributes and ACLs from multiple perspectives


  • We're familiar with ownership and permission bits
  • we use the following utilities for that:
    • chmod(1)
    • charrt(1)
    • stat(1)
    • shred(1)

What is an access control list? (ACL)


/etc/x /var/r /home/mike/hello.txt /l/a/cloud/h453
Mike r rw rwx
Alice rw r r rwx
Bob x r r
xdman x w wx

We can read this from two perspectives:

  • row wise: what capabilities do each user have?
  • column wise: what access is granted for each resource

Typically, the access control list is stored along with the resource (file)


is this approach better, equivalent, worse than linux permission bits?
  • What about user groups?
  • what about all other users?

In fact, this is a more general mechanism than the linux permission bits that we're used to.

Demo - permissions on virtual machine

$ ssh
The authenticity of host ' (' can't be established.
RSA key fingerprint is cc:36:a4:22:58:af:e1:38:56:b6:e6:5c:ec:72:b7:ca.
Are you sure you want to continue connecting (yes/no)? yes
Warning: Permanently added ',' (RSA) to the list of known hosts.'s password: 

Welcome to the University of Calgary Department of Computer Science
This system is for use by authorized users only.

         TECH SUPPORT:
                        help desk: MS 151

Please send bugs reports and package requests to

[abird@csl ~]$ cd /tmp
[abird@csl tmp]$ cat locasto.password.txt 
cat: locasto.password.txt: Permission denied
[abird@csl tmp]$ ls -l locasto.password.txt 
-rw------- 1 locasto profs 8 Nov 22 12:12 locasto.password.txt
[abird@csl tmp]$ # Then dr. Locasto changed the permissions: 
[abird@csl tmp]$ ls -l locasto.password.txt 
-rw----r-- 1 locasto profs 8 Nov 22 12:12 locasto.password.txt
[abird@csl tmp]$ cat locasto.password.txt 
[abird@csl tmp]$ groups
jr s301 c457 c471
[abird@csl tmp]$ # The groups I am in. 
[abird@csl tmp]$ # If I wanted to restrict permissions to just one group of people, 
[abird@csl tmp]$ #    for working on a group project or something...
[abird@csl tmp]$ groupadd birdsgroup
-bash: /usr/sbin/groupadd: Permission denied
[abird@csl tmp]$ # that doesn't work - but apparently using the access control list, 
[abird@csl tmp]$ #    we can give access only to a few users
[abird@csl tmp]$ #    even without having them in the group
[abird@csl tmp]$ cd michael/
-bash: cd: michael/: Permission denied
[abird@csl tmp]$ getfacl michael/
# file: michael/
# owner: locasto
# group: profs

[abird@csl tmp]$ cd michael/
-bash: cd: michael/: Permission denied
[abird@csl tmp]$ # Dr. Locasto gave read permission to world for his directory:
[abird@csl tmp]$ cd michael/
[abird@csl michael]$ ls
ls: cannot open directory .: Permission denied
[abird@csl michael]$ # I cannot list anything because I don't have read 
[abird@csl michael]$ #     permissions for any files in this directory
  • Dr. Locasto showed that he could grant file access to specific users
  • one student was given access, and once he could read the directory, and was added using setfacl, he could change the locasto.password.txt file.

Looking at ATTR things

  • must give it a valid namespace
  • must have permission to modify that namespace

The setfattr tool allows you to modify the metadata for any file - you can change the default application to open it, send secret messages, etc.

  • however, your file system needs to have support for this
    • it has to implement setfattr and getfattr system calls

Example - if you copy the file to another filesystem, the attributes are not necessarily copied

  • This is a difficulty that comes with transferring data between filesystems, or to different operating systems
  • because it's not ONLY moving it, it's converting it to a new format.

In fact, even if you're copying from one place into another within the same filesystem, those file attributes are not preserved.

  • but there is an option for 'cp' that preserves the file attributes

Note - the timestamps are stored in inode. But as with other attributes, when you copy a file, you're really making a new file with new timestamps.

November 25th, 2013

Back To Top

How Would We Design A Minimally Viable File System

Acknowledgement: 'viable' is only loosely defined.

Assume you have a well behaved storage mechanism which provides us with some number of available storage units.

It's our job to write a filesystem on it.

What are we supporting?

  • open
  • read
  • write
  • seek
  • close

Other Requirements

  • delete
  • expand

File Access Table: FAT

  • the file system cannot simply ram data onto the physical medium: you need metadata also.
  • one challenge: you have a file size limit because of the maximum address size
    • it also limits the number of files
      0  +------------------+      0 +----------------+                   
         |                  |        |                |                   
         |                  |        |                |                   
         +------------------+        |                |                   
         |                  |        |                |                   
  'one'  |    next 'block'  |        |                |                   
         +------------------+        |                |                   
         |                  |        |                |                   
         |                  |        |                |                   
         |                  |        |                |                   
         |                  |        |                |                   
         |                  |        |                |                   
         +------------------+        |                |                   
         |      EOF         |      M +----------------+                   
         +------------------+      max                                    
         |                  |      size                                   
         |                  |                                             
         |                  |                                             
         |                  |                                             
         |                  |                                             
         |                  |                                             
         |                  |                                             
      K  +------------------+                                             
   only store                                                             
 so much in RAM  


Envisions a file as a sequence of blocks

   | B1                |                                                  
   | B2                |                                                  
   | B3                |                                                  
   |                   |                                                  
   |                   |                                                  
   |                   |                                                  
   | Bn                |                                                  
   |                   |                                                  
   |                   |                                                  
   /boot                   block 1   ...       block n                    
   |     |    |     |     |        |         |         |                  
   |     |    |     |     |        |         |         |                  
   |     |    |     |     |        |         |         |                  
   |     |    |     |     |        |         |         |                  
   |     |    |     |     |        |         |         |                  
   |     |    |     |     |        |         |         |                  
   |     |    |     |     |        |         |         |                  
   |     |    |     |     |        |         |         |                  
   |     |    |     |     |        |         |         |                  
   |     |    |     |     |        |         |         |                  
   |     |    |     |     |        |         |         |                  
   |     |    |     |     |        |         |         |                  
   0         /       \                                 M                  
            /         \                                                   
           /           \                                                  
          /             \                                                 
         /               \                                                
        /                 \                                               
       /                   \                                              
      |  |  |  |  |  |  |   |                                             
      |  |  |  |  |  |  |   |  ← metadata including                                       
      |  |  |  |  |  |  |   |             references to inodes               

Journaling Filesystem

  • keeping a track of the operations that they system is about to do
    • in a buffer
  • In some filesystems, both metadata and data. In others, just the metadata

November 27th, 2013

Back To Top

Design Challenge: operating system that bridges the gap between user level programs and hardware

What is /dev/null

  • we treat it like a file.
    • it looks like a file
    • we operate on it like a file
  • it is a bit of code that mimics a file
  • it does not act like a file

Devices are nothing more than files

  • name appears in file system
  • responds to read(2) and write(2)

the landscape of real devices

       |                              kernel|                             
       |                                    |                             
       |                                    |                             
       |                                    |                             
       |                                    |                             
       |                                    |                             
       |                                    |                             
       |                                    |                             
       |  +--------+                        |                             
+---+  |  |device  |                        |                             
|CPU|→ +--+driver  +------------------------+                             
+---+     |        |                                                      
  ↑       |        |                                                      
  |       +-+------+                                                      
   \        |   ↑                                                         
    \       ↓   |                                                         
     \ +--------+---------------------------+                             
      \|           device controller        |                             
       |                                    |                             
       |                                    |                             
       |                                    |                             
       |                                    |                             
       | device                             |                             

What is the CPU's role?

  • it handles interrupts from the device controller
  • this translates to a piece of kernel code responsible for responding to whatever action caused the interrupt
    • eg copy data from a buffer


A thing in this sense is not what it is named but rather what is done to it.

block device - data is accessed a block at a time; random access
character device - a stream of characters (or data values)

The important question is how the device imports and exports the data

Major and Minor numbers

cat /proc/self/maps

the column with ##:## is the major and minor number of the device in the format MJ:MN

to the kernel code! Device drivers

discussion of device driver writing.

Interesting point: the need to write device-specific sequences of bits to the device driver to communicate with the device.

December 4th, 2013

Back To Top

How does the Kernel support Networking


A Packet Traversing the Netfilter System:

                 |            ^
                 |            |
                 |         [ROUTE]
                 v            |
                [2]          [5]
                 |            ^
                 |            |
                 v            |
              inbound      outgoing

Return to Diagram from Last Week

              virtual machine                                              
               **         ***                                             
              **            **                                            
             **              *                                            
             **    *                                           
              *               *                                           
              *              **                                           
              **            **                                            
               **    |   ***                                              
                     | eth0                                               
                  __/ \___                                                
                 /        \                                               
                /          \                                              
                |          |                                              

Every modern unix box can do all of this

  • that is, the general architecture for receiving and intercepting packets

PORT: a destination within a machine. Determines which program the kernel will send the packet to.

  • applications bind to certain ports.
  • all of this is convention only - so you don't have to follow the convention if you don't want to, as long as both ends of the communication (client and server) agree.

The beautiful thing about networks is that you can just read what is there.

Catching some packets

  • Dr. Locasto ran the wireshark program, which is a GUI for tcp
  • he starts poking through some packets
  • back on his mac he uses tcpdump to take a look at network traffic
    • point made: isn't this illegal?
    • he used
      tcpdump -i en0 -v -X -A
  • what it does is instead of dropping all packets not destined for your own ip address, it takes a look at all of them.

One thing to note

  1. initiating a network connection and communicating with another machine takes several protocols
    • diagnosing network problems is rarely a one-problem problem.

Debugging network issues

  • using a packet inspector and strace you can poke in and look at what's happening and what not happening.