Courses/Computer Science/CPSC 457.F2013/Lecture Notes/Scribe2

Note from Scribe 2: You may notice links to non-existent files - these are for diagrams Michael draws on the whiteboard. The wiki doesn't allow file downloads (or there is some bug with it or something), so they're not up here. I believe Michael will upload them to his site and link to them sometime down the road. In the meantime if you really need them, email me: abird@ucalgary.ca.

Scribe2/Other Information ← summary of what to read

#September 16th, 2013

#September 18th, 2013

#September 20th, 2013

#September 23rd, 2013

#September 25th, 2013

#September 27th, 2013

#September 30th, 2013

Other Information

Refer to the Scribe2/Other Information page for list of readings by date

September 16th, 2013

Back To Top

Lecture Notes

One week at a time material will be posted
Some slides are just talking points for the lecture not real content: they will be excluded
Scribe notes are mostly what is said in class including Q and A

Programs, how they become processes

Simultaneously a demo on command line
Showing VM VirtualBox
We can do whatever we want in terms of platform for our homework, but we will need to submit homework that works on this platform

Command Line Exercise

Covering common commands

Ordered List Of History
Objective	Command	Comments
Show history of commands	history	Too many lines
Pagenate output of history	history \| more	press 'q' to quit
Produce a frequency distribution of command history	history \| gawk '{print $2}'	awk is the grandfather of perl
Count the output	history \| gawk '{print $2}' \| wc
Get information about a command	man wc	displays a "manual": information about the command wc.
Sort output	history \| gawk '{print $2}' \| sort
remove duplicates	history \| gawk '{print $2}' \| sort \| uniq -c	includes count of each entry
Sort output numerically	history \| gawk '{print $2}' \| sort \| uniq -c \| sort -nr	n for numeric, r for reverse order.

Common commands
Command	description
ls
cd
man
clear
emacs
make
ll	an alias for long list 'ls -l'
cat	concatenate the contents of a file, output to command line
ps	lists running processes
ps aux	lists all running processes (those in square brackets are kernel threads)
pstree	used to show hierarchical relationship
top	gives a live updated version of ps
df	gives info on disk usage
ifconfig	tells you the state of network adapters
diff	handy utility for programming. To produce a patch or to see the difference between two files.
yes
kill	sends a signal to a process, especially to kill a process.
objdump -d	dissasembles object
strace [command]	traces behavior of a program

What Are These Commands

Some are programs
Some are built in functionality

Pipes

A way for the output of one command to become the input of another command

Processes

Looking again at addi.c from prev. lecture

Out notion of source code is one thing, but after a translation process the program becomes something different.

Ex: interesting behavior with Hello World program.

#include <stdio.h>

int main(int argc,
  char* argv[])
{
  printf("Hello, 457!\n");
  return 0;
}

October 16th, 2013

Back To Top

lecure notes start at 12:30 (refer to other scribe notes for the full lecture)

In the Kernel

Process Dives Into Kernel

Line 2076ish

                  ############                                            
                  #          #                                            
                  #          #                                            
                  #   P      #                                            
                  #          #                                            
                  ############                                            
                        |                                                 
                        |                                                 
              ##########|###########################                      
              # Kernel  |                          #                      
              #         |                          #                      
              #         |                          #                      
              #         V                          #                      
              #       ******                       #                      
              #    ***     **                      #                      
              #    *        *                      #                      
              #    *  brk   ------->*******        #                      
              #    **      **     **      **       #                      
              #     ***  **      ** do-brk *       #                      
              #        ***       **       *        #                      
              #                   ***    **        #                      
              #                      ****          #                      
              ######################################

The 'read' System Call

Line 291:

the file structure has a read method: more delegation!
click on the 'read' link to search for it:

What's going on:

sys_r

vfs_read

read

the file system

device

READ is an implementation specific to a file system.

Kernel reading tip: Don't give up! keep digging through the source code to reach the answer. You will not get the whole answer on one screen.

Question: Macro or System call?

How do you know whether something is a macro or a system call?

toss a coin
you just learn what's what
it doesn't matter

Kernel is a big mixture of macros, function calls, assembly

Practical advice: click on the name and it will probably show you.

About the Delegation Pattern

Keeps things manageable.
consider sys_read: a nice API
- it delegates to VFS
  - VFS to all sorts of different file systems
- File system doesn't care about the physical organization

Kernel Memory Management

Talking a look at the 'grab' program

We're looking at the source code

"ignore the signal handler stuff"
"what the heck is a struct aircraft"
- has a few member variables
glibc_test accepts a pointer
it allocates memory off the heap
then prints out what is stored at each memory address within a relevant range
- Then we read in the hexadecimal numbers what the contents of our aircraft struct are
- that is, we read "deadbeef", the ascii code for 'a', etc.

September 18th, 2013

Back To Top

Logistics & Preamble

Question Asking Protocol

If you want to ask a question but Michael has not noticed, get his attention loudly
- This is an interesting OS concept: use the 'interrupt' policy instead of relying on Michael 'polling'

Readings

Recap your knowledge of x86: "A Tiny Guide to Programming in 32-bit x86 Assembly Language"
- ~5 page refresher.
"A Whirlwind Tour .... "
- an excellent illustration of all layers down to os
- system calls
- how a program becomes a process
Read in textbook:
- MOS 1.3, 1.6, 1.7

Recap

So far we have learned a little bit about a few topics that do not seem to be connected
by the end of the semester we'll see how it all connects

Why Command Line Agility

To show off
Demonstrate documentation built into machine
Demonstrate wide range of commands
Give practical skill

Learning Objectives

GEt an intuition for what is an os
- components
- What's the kernel
- problems it solves

What is an OS - the big fuzzy picture

An OS does not execute programs, it rather gets out of the way - giving resources to the process that you want to execute
manages multiple processes

system diagram to come

Kernel Is a:

piece of software
interface to hardware
supervisor: manage important events
protection - maintaining the imaginary divide between separate processes, etc.

Why Not Put All Code In Kernel?

Complicates kernel, makes it more error prone
Mistakes in your code affect every process in the machine
- A bug in your code would crash the machine if it was in the kernel

strace

Allows us to observe the system calls interface in action

Notation

write(2) does not mean calling function 'write' with argument '2'

The number distinguishes which of several identically named methods is being referred to. It is the section in the manual that the target method is in:

command line utilities: section 1

system calls: section 2

something else: section 3

Important For Homework

Understanding the system calls that we read as the output of strace for the 'h' hello world example

September 20th, 2013

Back To Top

Questions from last class

The big picture thing.

two processes cannot communicate without asking the kernel to help tham

File:CPSC457 2013 09 20.jpg

How and why are the blue dashed lines/divisions enforced?

LEARNING OBJECTIVE for today:

able to distinguish between the execution environment for userland processes and ??? (see main notes?)

Example: silly.c

looking at object dump, thinking about what we have access to.
GPRs:
- eax
- ebx
- ecx
- edx
- edi
- esi
Other Registers
- ebp
- esp → stack pointer
- eip → frame pointer
- eflags

Segments

breaks memory into different parts
start (base)
limit
DPL - descriptive privilege level
- two bits (00, 01, 10, or 11)

File:CPSC457 2013 09 20B.jpg

The DPL bits are the only things enforcing the isolation between user level programs and the kernel

Two Most Basic Responsibilities of Kernel

Live in Kernel memory and contain pointers to kernel functions (IDT) and priviledge levels of different memory segments (DGT)

SEt up the GDT table
- Enforces protection between itself and user level processes.
Set up the IDT table (interrupt descriptor table)
- Allocates memory and populates table
- Refer to diagram: Network card sends exception to CPU (curved arrow). CPU refers to the IDT table and executes the indicated function.

Supplementary Reading

http://www.intel.com/products/processor/manuals
- Section 2.1 (skim this)
- Section 2.2 (pay attention to description of pipelined microacrchitecture)
- one other
http://www/intel.com/products/processor/manuals
- three parts also (see main notes?)

September 23rd, 2013

Back To Top

Making System Calls

If system calls are privileged functionality, then how can applications invoke them?

Write system calls can be called from c by including a library (see man page).

However, we see in the object dump that it's actually calling a glibc function, not the system call directly

So we'll write an assembly program to directly call the system call:

;; write.asm
BITS 32
GLOBAL _start
SECTION .text
_start:
;; write the arguments
;; become DPL 00
;; do this by invoking a interrupt
mov  eax, 4
mov  ebx, 1
mov  ecx, mesg
mov  edx, 0xb
int 0x80
mov  ebx, eax
mov  eax, 1

INT is in a DPL 11 code page

The path of the write command:

DPL 11 accessable interrupt command sent to CPU
CPU refers to the interrupt table and executes appropriate action
Note: No direct path from program to kernel

When would you want to do this?

Really fast or small code
Not for drivers or interrupts (that's kernel stuff)
If you're on a system that does not have glibc or the support library
real answer: if you want to write malware

Learning Objective

If we had to write a program as above that removes a file, could we?

;;try to call unlink on ?
;;before proceeding with coding, echo "secret" > secret.text
;;this is what we are trying to remove with unlink
;;then look up details on unlink with man 2 unlink
;; unlink("secret.txt");

BITS 32

GLOBAL _start
SECTION .data
filename db "secret.txt"
SECTION .text
  mov eax, 0xA
  ;; a few more lines of code

File:CPSC 2013 09 23 A.jpg

September 25th, 2013

Back To Top

Today we look at a very important topic in operating systems: making programs run.

Process Abstraction

Continuing/completing the discussion: how does a program become a process?

Terminology

task

A generic term referring to a running process, whether or not it is a process, thread, kernel thread. A job needing doing.

process

a program in execution. Has several characteristics

instruction pointer: what instruction will be processed next
CPU context: the state of the CPU
set of memory pages (process memory space)
- contains the program code, static data, etc
dynamically allocated memory space
set of resources: files, metadata about process state (PID)

thread

does not have the heavyweight stuff a process has, but rather is part of a process

lightweight process

in linux there is no distinction between a thread and a process - both are called "lightweight process".

Our VM's kernel code:

goto http://lxr.cpsc.ucalgary.ca/lxr/#linux+v2.6.32/include/linux/sched.h#L1215

kernel code is C
- but they have their own style & idioms that may take some getting used to
- and it's not thoroughly commented - see style guidelines

line 1215 onwards: struct task_struct

This is the beginning of this task_struct

1216: volatile long state controls if the process should be running (as per the comments)
- not all processes run at the same time
- not all processes are capable of using the CPU - it may be waiting on something
1271: struct mm_struct points to data
- M. Lacosto could probably teach the entire semester from task_struct and mm_struct
1288: pid (we're familiar with this already)
1344ish: the owner of the task - who spawned it

Process States

TASK_RUNNING: process is eligible to have the CPU i.e run
TASK_INTERRUPTIBLE: process sleeping (e.g. waiting for I/O
TASK_UNINTERRUPTIBLE: same as above, but ignores signals
misc: traced, stopped, zombie

File:CPSC457 2013 09 25.gif

History through Process States

the first process to start after os startup is init.

init cannot start new programs with execve() because then init would be overwritten and at the termination of the new process, everything has stopped

instead init uses fork(2) and in the 2nd copy of init executes execve() to create the new process

Question: isn't it a bit of a waste to completely duplicate init using fork(2) then immediately overwrite it with execve()?

Answer: the solution is to duplicate the process control block ONLY

remember the mm_struct in the kernel - only the pointer is copied rather than a deep copy
this works UNTIL the new process attempts to write to the destination of that pointer, at that point it performs a deep copy.
this is called copy on write

Question: what happens if a child dies?

Answer: the child stays around as a zombie long enough for parent to gather some last information

Question: what happens if a parent dies with children still around?

Answer: the child is re-parented (usually with init)

Some Coding Example

not quite complete:

//include:
//stdio
//unistd

//main with command line args
{
pid_t pit = 0;
pid = fork();
if(0 == pid)
{
  // i'm the child
  printf(stdout, "hey i'm the child, my pid is %d", pid)
}
else if(pid > 0)
{
  // parent
  printf(stdout, "hey i'm the parent");
}
}

September 27th, 2013

Back To Top

Thoughts

Much of what an OS does

Namespace management:

alloc
project
cleanup/recycle

CPU
memory
files
users

What is a Process

File:CPSC 2013 09 27A.gif

What makes these lines real?

File:CPSC 2013 09 27B.gif

Process Address Abstraction

How do you multiplex memory??

Every program believes that it (and it only) has access to the 0 → 3G range in RAM File:CPSC 2013 09 27C.gif

Problems with direct mapping addresses

you cannot give each process (~140 of them) free access to all RAM
if you did, then changed RAM hardware, would have to change program

opening up the cat process

lists the allocation of memory for a running process

or you can execute cat /proc/self/maps (where self is the current process - cat)

what's in there?

libraries
- repeated if you need different memory regions (-- different permissions)
- r-x for code
- r-- for constants
heap
stack

Commands from the context of PAS

fork(): create PAS
execve() replace PAS comments

September 30th, 2013

Back To Top

Process Address Space

how does your program address a piece of memory?

What the MMU Does

Translates from logical to linear to physical:

Logical Address

[SS]:[offset] Has several parts:

segment selector
32 bit offset

The segment selector is implied

Linear Address / virtual address

[Dir|Table|Offset] Made up of three parts:

Physical Address

Exercise: translating linear address to physical address by hand

0x08049858

Offset: 12 bits = 3 nibbles

Dir and table are 10 bits each

[0000 1000 00] [00 0100 1001] [1000 0101 1000]

base 10: [ 32 | 64 + 8 + 1 | 2048 + 64 + 16 + 8 ]

[ 32₁₀ | 73₁₀ | 2164₁₀ ]

Returning to discussion of MMU

TLB caching speeds up the translation process which otherwise would be relatively costly because it requires 3 table lookups for each translation.

TLB stands for Translation Lookaside Buffer

Page Table and Page Directory Details

There is a page table and page directory for every running process

They are part of the kernel's memory

October 9th, 2013

Back To Top

Recap

Test 1

Dr. Locasto is stoked on the results: you never see such well distributed grades!
- Pleasing spread
- Also a number of people who did do quite well
For our individual scores: don't freak out. Remember that this is points out of the 1000 that you can score in the semester.
The test indicates that half the class is not necessarily cluing in to what Dr. Locasto is saying.
- Some people didn't know enough
Some people were not precise enough in their answers

Exit Times and Scores

There doesn't seem to be a correlation between the two.

Conclusion: take your time if you need it.

Homework 2

Problem 1: testing our understanding of manipulating process address space in memory regions

generate and execute code at run time
straightforward if we get the right set of systems calls involved
might be useful to brush up on x86 assembly language

Problem 2: a number of questions worth small amounts of points

This is to get us looking in the right direction
Look at the linux kernel code to help solve the problem
there will be some twist that Dr. Locasto won't discuss yet

Problem 3: "The Meat"

we will extend basic kernel functionality
- add system call that marks process in a certain way
- if process marked, kernel fails all syscalls from that process
- Use Robert Love's textbook
we will have to assess what we have done
The really tricky part is what you need to do in the kernel to filter the processes

Other Tips & Comments

start early and ask questions
we will have to learn through this assignment
- In one month, hopefully 93% of us will be able to do this assignment
We will get little bits and pieces, aspects of assignment related material in class - but don't wait for that to complete the assignment.
The thing that makes this assignment difficult is that we are coding in an unfamiliar development environment: the kernel.
"There's homework 2. Enjoy"

Activities

Class exercise: two questions.

If we didn't have computers, would we still need operating systems?
What is the difference between
- protected mode
- kernel space
- superuser mode
- root user
Do root's processes run in ring zero?

Q1: Would we need OS'

principles, concepts to multiplex resources to maximize utility.

In a business you need an "operating system" to multiplex human resources between projects, and restrict access to file folders, etc. in order to maintain confidentiality borders between projects.
politics: multiplexing $$
Library: multiplexing books

In this course we will learn the concepts related to managing complex systems. Using a computer is just one such case.

Q2:

protected mode (one of several modes the architecture can be in)
- student answer: A mode the CPU runs in that doesn't let certain things be accessed
- prof answer: A mode in the architecture that supports isolation between PAS
kernel space
- student answer: the space in which a kernel dwells
- prof answer: memory segments with certain dpl bits

File:CPSC457 2013 10 09.gif

supervisor mode: ring 0, have access to privileged instructions
root user: user account
- Just a little more special than the other users

Most processes do not run in ring zero

What's In The Kernel

File:CPSC457 2013 10 09B.gif

We should be able to list the components and describe how they relate to one another.

Memory Page Replacement Algorithms

Later

October 11th, 2013

Back To Top

User Level Memory Management

API
Algorithms and data structures for user level memory management

The Problem

support growing and shrinking memory allocations (ie dynamically allocated memory)

Solutions

First-fit
Next-fit
Best fit
Worst fit
?

Process Address Space

                                                                         
                                                                          
      ################                                                    
      #              #                                                    
      #              #                                                    
      ################                                                    
      #              #                                                    
      # PAS          #   memory region                                      
      #              #                                                    
      ################                                                    
      #              #                                                    
      # VM           #   memory address translation                                       
      #              #                                                    
      ################                                                    
      #              #                                                    
      # physical     #                                         
      #              #                                                    
      ################

Memory allocation in C

man malloc

malloc returns null if the OS could not / did not give you any memory

A Program That Does This Stuff

It's a linked list program - when new nodes are made, malloc is called.

Each node is 12 bytes long:

1 for the char
4 for the long
4 for the pointer
3 for padding so total is a multiple of 4 bytes

We want to check if the addresses of the next pointers are in heap.

ps aux | grep <program_name> to find pid
pmap <pid> to find map of memory for process
more stuff

Use strace to see what happens in terms of system calls while llist runs

We see that other than some initial setup, while the process runs the OS is not involved.
lesson no. 1: malloc does not directly manipulate the heap
glibc does it's own internal memory management.

What is happening?

memory allocated to process in page-sized chunks of memory
that process can do what it wants with it.

Back to the main point:

the question is how memory is dynamically allocated.

Dynamic Allocation as a Narrative

program asks for 12 bytes
OS gives bytes 0 through 11
program asks for another
OS gives bytes 12 through 23
a second program asks for 1k of memory
OS gives bytes 24 through 1047
program one frees second 12 bytes of memory: bytes 12 through 23
program one asks for another 12 bytes
OS has a tough decision: allocate at the end, byte 1048, or back at byte 12?

Then imagine 140 instead of 2 processes, doing this for days on end.

Problem: fragmented heap.

Solution: one of the 5 that were alluded to at the start.

Solutions cont'd

First Fit
Start at byte 0, move upwards, allocate the first block of the right size
Next Fit
Best Fit
- finds all possible fits, allocates the one that is the tightest fit.
Worst Fit
- As above, but allocate the one that is the loosest fit.

Analysis of Solutions

Best and Worst: slow
All: listening to the user (developer) is dumb, because the developer asks for unusual sizes of memory.

This causes two problems:

Internal Fragmentation

See below

External Fragmentation

It is possible to have enough memory that if it were contiguous a serious memory request could be met - but since it is fragmented, it can be useless. Eg. 1G free in 10 byte chunks.

Improved Solutions

Bucket system:

A small bucket, a medium bucket, a large bucket.
Problem: give the user program 24 bytes when it asked for 12. now there are 12 bytes unused but allocated.
- This is internal fragmentation

Looking at the brk command

Diving into the linux kernel

search for brk command:

way too many results

tip: system calls are named sys_NAME, such as sys_brk.

search for sys_brk

tip: sys_brk is just a symbol that refers to a location in kernel text. how that is associated with functionality requires a macro.

this macro has 6 versions
search 'DEFINE1(brk'
voila: the right one is the 'mm/mmap.c, line 246 (98%)' link.

Reading the Kernel

We can read this (almost) without Dr. Locasto's help, says him.

If you take it one line at a time, you can probably understand the semantics.

Tips

goto out;

if used in a right way is a great way to avoid code replication.

October 18th, 2013

Back To Top

Kernel Memory Management

                                                                          
                                                                          
                                                                          
         #############                                                    
         #           #                                                    
         #           #                                                    
         # malloc(3) #                                                    
         #           #                                                    
         #           #                                                    
         #############                                                    
                                                                          
 syscall #####################################################            
         #                                                   #            
         #  brk(2)    mmap(2)                                #            
         # |      \                                          #            
         ##|#######\##########################################            
           |        \                                                     
 kernel  ##|#########\########################################            
         # |          \                                      #            
         # +===========+                                     #            
         # |           |                                     #            
         # | brk imple-|                                     #            
         # | mentation |                                     #            
         # |           |                                     #            
         # +===========+                                     #            
         #                                                   #            
         #                                                   #            
         #####################################################

Why don't we code with the brk command?
- We use malloc instead - why? (or new, or ... etc)

Malloc vs brk

void *malloc(sizet n)

malloc deals with pieces of memory that are smaller than one page - the size of the particular type that I am manipulating.
brk deals with page sized chunks

Recall the aircraft struct from last class - it has a certain arbitrary size.
let's allocate a new one with malloc:
- ```
malloc(sizeof(struct aircraft));
```

Does Malloc Call Brk? When?

Student answer: it calls brk when malloc runs out of page to allocate and needs a new page
malloc hands you a pointer to somewhere in the heap
then you can (with pointer arithmetic) access anywhere in heap
- you're stopped if you access beyond the heap on either side
if you've used all of heap, then malloc calls brk to increase the heap

Malloc Implementation

we could implement our own memory manager.

//malloc:
void *give(site_t n);
//free:
void recycle(void *addr);

In psuedo-code:

void *give(site_t n)
{
  // needs to obtain memory
    // detect that you are uninitialized
    // call brk
  // needs some sort of data structure to track allocated blocks
  // finds n contiguous "available" bytes, return address of those bytes
    // choose from: 
    // first
    // last
    // best
    // worst
}
void recycle(void *addr)
{
  // do some error checking
  // need data structure to track allocated blocks
    // mark this as not allocated - addr and the number of following bytes
  // or keep track of a free list
}

The Bucket Allocation Idea, Revisited

Question: how does the kernel allocate memory for itself.

It does two important things:

supports user space by implementing mmap and brk
it's own internal memory allocation: kalloc
- technically interesting details that we wont be expected to regurgitate on a test
- has a generic piece, 'zone allocator'
- buddy algorithm responsible for maintaining bucket allocation approach
- also SLAB (original), SLOB (something else - uses next fit approach), SLUB (embedded systems)
- maintains 11 buckets containing pages
  - If one bucket is all used up, it borrows from one bucket larger (and splits)
  - then when freed, it coalesces the memory to put back to the larger bucket size

Student question: malloc needs to malloc to have data structures. How?

actually, it's fancier than that:
- The metadata about what blocks are free are actually embedded in the heap

Student question: what happens when kernel out of memory?

You're hosed.
Fork fails
"if the user does something ridiculous, you will have a failed system"

October 22nd, 2013

Back To Top

Recap: What have we done so far?

What is an OS
What is a process
kernel roles
mem management
- user, kernel
- internal, external
virtual memory
illusion of unlimited CPU/Mem
system calls
paging
Protection
- protection
- system architecture

Suggestions for Assignments

Encouraged to ask questions in class, office hours, or on Piazza

October 25th, 2013

Back To Top

System Startup Leads To...

Transition from sequential code execution to concurrent execution:

initializes the scheduler
creates two processes (via do_fork and via sys_execve)
enables interruts
allows scheduler to choose between processes
one more step (see course slides)

The "problem" of scheduling

Example: Dr. Locasto has 5 grad students who all want his attention - they all show up at his door at the same time.

Grad students:

R
Sa
St
B
A

Approaches

Pick at random
first come first served
priority / urgency / favourites
Estimate workload and choose the one that would take the least time
Alphabetically
save history of service
- Or work on nearly completed task, or one that is behind, etc.
Divide time into equal slots and rapidly switch between them - round robin

How this relates to the big picture

This is the challenge of scheduling that is faced

SCHEDULING: a mechanism that creates the illusion of sumultaneous execution of multiple processes

Dr. Locasto's Computer

Currently:

running powerpoint
talking to external devices
etc.

Human beings can't do this well, but CPUs can!

Considerations for Scheduling

Fairness
Priority
Efficiency of scheduler
Quantum -> process should be allowed to run as long or longer than the scheduler takes to make a decision about who's next
Preemption
PRIMARY CONSIDERATION: progress through the aggregate work of all processes

Quantum

If too short, context switch overhead is high
If too long, concurrency disappears
- my_stopped_time = N * quantum.length // where N is the numberof processes running on the machine
Interactive applications should be visited more frequently so that it does not become unresponsive

Preemption

The kernel can interrupt a running process at any point in time or when its quantum expires

Enables: "do something else more useful"
preemption does not imply suspension
preemption in linux 2.6 can happen either in user mode or kernel mode execution
"This is probably one of the coolest parts of the kernel"

Note: Process List vs Run Queue

The list of all processes is logically distinct from the data structure used to organize processes into groups ready to be scheduled

The concepts are different, but implementations may share a data structure or be two completely seperate lists.

Priority

Most scheduling approaches / algorithms can be boiled down to the concept of priority

The 'nice' command allows you to adjust the priority
- it does not allow you to set the priority, it's just a parameter considered in priority setting
- default nice value for a process is zero
e.g. the task for the window tha you have in the foreground is sometimes marked as higher priority

Types of Processes

CPU-bound (eg lots of calculations) OR I/O bound (lots of time waiting on external device read/writes)
interactive OR batch
- A batch process can be I/O bound or CPU bound

Conflicting Objectives

Fast response time (process' seem responsive)
Good throughput for background processes
Avoid "starvation"

Scheduling policy: rules for balancing these needs and helping decide what taks gets CPU

Scheduling Algorithms

First Come First Served
Shortest Job Next
Round-robin
Some simple examples: R, A, B

Artificially Simplified Problem:

quantum = 5 work units
context switch = 1 work unit

                R        A         B
              =====    =====     =====
arrival time    0        0         0
work           10        3        15

Scheduling Policy:

FCFS
asess a tie with arbitrarily choosing one process

Running history:

Process   Running Time   Running Total
          (work units)   (work units)
=======   ============   =============
 A          3              3
context switch             4
 B          5              9
context switch            10
 R          5             15
context switch            16
 B          5             21
context switch            22
 R          5             27
context switch            28
 B          5             33

Note: the exit system call calls the scheduler

October 28th, 2013

Back To Top

Scheduling Algorithms

First come first served
shortest job next
round-robin

Some Simple Examples

         A   B   C
        === === ===
arrival  0   0   0
work     1   2   3

Comparing Scheduling Algorithms

What metrics do we use?

Wait time
- how long from when task arrived until it was serviced
Turnaround time
- Time from when task starts to when task is complete

Linux's Scheduling Support

Linux supports several different policies:

#define SCHED_NORMAL 0
#define SCHED_FIFO   1
#define SCHED_RR     2
#define SCHED_BATCH  3

These flags are used to select the type of scheduling

Implementing Scheduling

"who gets to run next?"

The 2.4 scheduler - a big loop - O(n)

iterates over the list of all tasks
- calculates goodness
goodness
- takes nice and a few other heuristics and returns a weight

Problem: it is a O(n) algorithm, doesn't perform well with large numbers of tasks (eg on a huge server)

The 2.6 scheduler: a priority queue - O(1)

maintain a priority queue
the work is done when a new process is added

What is it doing?

Maintaining "multi-level feedback"

every CPU has a run queue
in the queue there are a few fields pointing to a few arrays
- a list of active entries - those who have not yet run
- a list of _____ entries - those who have already completed
these arrays are about 140 entries long
The policy decision is determining which slot (array element) to put a task in.

Simple Problem

A
B                Round Robin (RR)
C                                                              
D               |                                              
E               |-----|-----|-----|-----|-----|-----|--->
.               |                                              
.                                                              
.                                                              
Z

What if one user owns A-Y and another owns Z. Then owner one will get way more processing time.
There is a relation between the number of processes and the unit of time

fixed quantum - each process gets a useful amount
variable quantum - everyone gets a slice, but following the graph below, as N number of processes increases, the total amount of useful time decreases

                                                                          
                  Quantum - N relationship                                    
                  ========================                                
        # *                                                               
        # *                                                               
        # **                                                              
        #  **                                                             
        #   **                                                            
        #    **                                                           
  %cpu  #---------------------------------------                          
        #      **                                                         
        #       ***                                                       
        #          **                                                     
        #           **                                                    
        #             ****                                                
        #                **********************                           
        # --------------------------------------                          
        #                                                                 
        #                                                                 
        #                                                                 
        #                                                                 
        #                                                                 
        #                                                                 
        ########################################                          
                                                                          
                   N # of proc

Whiteboard Notes

Policy vs Mechanism

scheduler structure
other areas:
- mem management
- disk
- other resources

Process Lifecycle

                                                                         
                                                                 -+       
      \                                                          /|       
       \                                                        /         
        \|                                                     /          
        -+                                        #############           
         ###########----------------------------> #           #           
         #         #                              # Running   #           
         # Ready   # <----------------------------#           #           
         #         #                              #           #           
         #         #                              #############           
         ########### +-                           /                       
                     |\                          /                        
                       \                        /                         
                        \                      /                          
                         \                    /                           
                          \                 |/                            
                           \ ############## +-                            
                             #            #                               
                             #            #                               
                             #  Sleep     #                               
                             #  Wait      #                               
                             #            #                               
                             ##############

October 30th, 2013

Back To Top

Concurrency As A Problem

                                                                          
                                                                          
                                                     Carol               
           Pete                                   ############            
         ###########                              #          #            
         #         #                              #          #            
         #         #                              #          #            
         #         #                              #          #            
         #         #                              #          #            
         #         #\                          _  ############            
         ########### \                         /|                        
                      \                       /                          
                       \         Bob         /                           
                        _| ##################                            
                           #                #                             
                           #                #                             
                           #   the shared   #                             
                           #    buffer      #                             
                           #                #                             
                           #                #                             
                           ##################

Critical section: you could think of it as code - it is code. But a better definition is to describe it as a shared state between.

→ the question is: "is the data involved in that section of code manipulated by multiple threads?"

Race condition

Avoid it through mutual exclusion - don't let multiple threads access a critical section at the same time. That one task moves into the critical section, executes the code, and when complete there is the opportunity for another task to move in.

Command Line Concurrency: history | grep "awk"

The pipe is actually a shared buffer. Whatever history produces get put into the shared buffer, then grep attempts to read from that pipe.

                                                                          
                                                                          
     history                                                grep "awk"    
       |                                                        ^         
       |                                                        |         
       |                                                        |         
       |                ______________________________          |         
       |            ****                              **        |         
       |          **    ***                            **       |         
       |          *       **                            **      |         
       +--------------->   *        the pipe             **-----+         
                  *        *                             **               
                  **       *                             *                
                   **     **                            **                
                    *******____________________________**

This works because the os is delegated to to provide mutual exclusion to the critical section

Practical Concurrency Examples

Shell I/O redirection
command line pipes
named pipes (special files called FIFOs)
Courses/Computer_Science/CPSC_457.W2012/<some_file_name_I_didn-t_have_time_to_write>

Multithreading with fork to demonstrate this issue

How can we multithread without anything fancy?

fork? not really, each has it's own process address space
clone? yup. it exists to create another thread of control that executes in the same process address space.

This difference really illustrates the concept of a critical section.

Potential Solutions

spinlocks / busy waiting (wastful b/c CPU is busy doing "nothing")
lock variables (recursive problem)
disabling interrupts (not trustworthy, multi-CPU systems unnaffected)
- do YOU trust programmer Pete to re-enable interrupts? not really

Much like the architecture provides protection and other things, it provides support for multi-threading which is the only way that modern computer works. MOS p.131 has code for correct entry and exit from a critical section of code

November 4th, 2013

Back To Top

Finishing our discussion of user level concurrency

Using Locks VS Not using locks

Unsafe Race

The routine has a for loop that counts to 2 billion
We want to see how fast it is

→ 0 s

Safe race is the same program but it uses locks

→ 4 s

What to learn:

well this is actually worth it
we see above that locking is "expensive"
- one heck of a slow down
so make critical sections as small as possible - but don't sprinkle them all over your code
- balance making them few and making them small
DO the locking, but realize you're imposing a performance cost.

Critical sections are slower than unsafe code, but of course unsafe code is unsafe.

Kernel Syncrhonization

Kernels are naturally concurrent

servicing many requests from processes
servicing many requests from hardware interrupts
these requests are not independant

BIG PICTURE: how can the kernel correctly service requests?

It is an event handling system: any function could be executing at any given time.

you have multiple control tasks all in different states, all being interwoven
below are the major concepts to take away:

Atomic Operations in x86

What does the x86 platform offer us in terms of ...?

Kernel Locking / Synchronization primitives

that is, the "API" that linux offers:

Atomic opperations
disable interrupts (cli/sti modify IF of eflags)
- assembly instructions for this - on a per cpu basis
lock memory bus (x86 lock prefix)
- this enforces system-wide serialization
spin locks (basically a mutex)
semaphores
- an important skill to take away from this course is if you should use a mutex or a semaphore
sequence locks
barriers
- the spin lock or semaphore makes sure one task or thread controls within one critical section, but a barrier accumulates them

Note: remember that if you understand the key data structures in the linux kernel - what it is, what it does, and what operates on it - you would understand everything you need to know for this course

down_read and up_read will be useful for homework 2 problem 2

totally voluntary - tells system that you're only reading

Spin Locks

Caution: we're looking at many layers of definitions. This may seem frustrating at first - but it is a good thing. It means there is a generic form that is adapted to each platform.

slock = 0 // locked

slock = 1 // unlocked

Spinlock API: After many layers of definition and wrapping we get down to assembly code. We should know the semantics of this:

try to aquire the mutex
test if you have it
if you do, carry on
if not, keep trying until you do

Semaphores

Kernel Preemption

the kernel can even preempt itself

Read-Copy-Update

treats readers and writers the same way intitially
you enter a critical section where you will manipulate data
when writers come in, they read and copy the data, operate, then update it when they're done.

The "big kernel lock"

in the beginning of the early days of the kernel there was one mutex

allows people to be lazy about determining what their critical section is

Conclusion

The kernel has to do synchronization
it handles it on its own based on what the machine offers.

November 13th, 2013

Back To Top

Course Scheduling

The home stretch begins on Friday and should be useful for homework 3

Signals

OS support for generating & delivering signals
case study of 'ptrace'

Defined

A simple, limited API for sending messages
basically a number delivered by OS between processes
source of signal
- the user
- another process
- the kernel
relatively easy to know the semantics because it's a small, simple API
because the semantics are so well understood and standardized, you can implement a reasonable response to most signals.

'kill' process Sends A Message

it doesn't really kill a process, it just sends the message for a process to kill itself

Locasto ssh's into a lab machine...

several of his grad students are on the machine
after examining who's running stuff ('w'), the process list ('ps aux'), he chooses one process owned by another user to kill
he fires up 'man kill' then 'man 7 signal' to determine that he will have to use 'kill -9 <pid>' to kill process with pid <pid>
but the "Operation is not permitted" when you attempt to send the kill message to a process owned by another user
Side Note: When your ssh client has hung, sending the SIGHUP will instruct it to 'Hangup' and exit in a graceful way.
Most processes don't have custom signal handlers for extra signals
- signals are convenient and well-defined, but only useful for sending information about exceptional conditions

How does the kernel handle the signal delivery task?

Figure 11-2 from (???) shows the control flow of catching a signal.
- a signal handler is just a function
- once it has finished executing,

   USERLAND                KERNEL SPACE
   ========                ============                                          
   start here                                                             
                                                                          
         P1                                                               
         |                                        P1 executes                        
         |                                                                
         |                                        at X it jumps to the kernel                        
         |                     X                                          
         |                     |                                          
         |                     |                  which returns to the signal handler                        
         |                     |                                          
         | P1's                Y                                          
         | signal                                                        
         | handler                                                    
  _~~~~> X   Y                                                            
 (           |                                                            
  )          |                                                            
 (           |                                                            
  )          |                                    and then.... ? back to kernel,                         
 (           ?  ~~~~~~~~~~~~~> ?                                          
  )                                                then back to X - but should you continue?                        
 (_                          kernel                                       
   ~~~~~~~~~~~~~~~~~~~~~~~~~~~~

can we handle exceptional conditions in a more graceful fashion?
- the safest, although unsatisfactory answer is to exit, then 'pick up the pieces later'

Ptrace: a case study

it's a very versatile and useful system call
it wraps a complex api
it offers an interface for the process calling ptrace to communicate with another process indicated by pid
ptrace is a bad system call because it does a lot of stuff
it is a useful system call because it purposefully breaks the boundary between processes
- with ptrace you can access all parts of another process - including system calls, etc.
if your process wants to ptrace Locasto's, and has the permission to do that, it asks the kernel to send signals to Locasto's process
- things like gdb are built on top of this
You can read and write any address in another process' address space
- refer to the man page for the full details on how to do this
Allows you to step between system calls. That is, the process runs until it gets to a system call, then stops. etc.
Or even, you can singlestep a process line to line

Q: isn't this a security risk - for malware
- A: the user ids need to match - though root can ptrace anything. And it's also very useful for security.
- At their core, debugging, malware, and security are the same task: monitoring and manipulating the innards of a running process.

sample code - something that uses ptrace

[view-source:http://tsg.cpsc.ucalgary.ca/teaching/ptrace/snyfer.c]

Dr. Locasto will put this on the wiki also.

side note: we could write strace right now using ptrace.

November 15th, 2013

Back To Top

"Welcome to the last sprint. "

Announcements

USRI on Dec 2
Final Exam: Dec 18, 8-10 am

Persistent Storage - The Problem

Files. We've all seen files, likely made files. We interact with them daily.

What is behind them? How do we manipulate them?

We'll go through that today and end with directories (a special type of file)

We'll be traversing several layers of abstraction.

from userland utilities,
to the underlying system calls,
and what do those system calls do?

Q: How does the kernel provide support for managing persistent data?

A: The "file" abstraction

~ In linux, generic wisdom states that everything is a file.

Activity: writing things on a piece of paper

everyone writes a sentence on a piece of paper and passes it to the center row
Things noticed about these papers:

Dr. Locasto asked for someone to say what their sentence was. We got "Monkey Time"
he instructs the paper holding guys (two of them) to find the paper with "Monkey Time" on it.

Finding the papers is hard
the issues are the same as for file management: (full list on course notes ppt)
- how do you find your piece of paper? - needs a name
- how do you retrieve the 3rd word of your sentence. The 75th?
  - For "Monkey Time", there is no 3rd or 75th word
  - And a linear search is slow
  - Filesystems do this
- how do I find duplicate sentences
- how do I handle multiple requests for the same paper?
- what if we had 1450, 14500, 1.5 million papers?
- what if we were not co-located?
- who can read or write?

Another angle: Answering why do we need files?

virtual memory isn't big enough
useful data needs to persist beyond the power cycle (ie reboot)
THE KEY THING for us to understand today is what the problem of persistent storage is.
→ how do you find files, assert ownership control, dynamically grow and shrink the file, and do this in an efficient way

Persistent Storage - Some Definitions

What is a file?
What is in a file?

Class brainstorm guided by Locasto:

What is in a file?

relationship between contents
a "container"

What is a file?

bytes/data
metadata
header and 'list of allocated blocks'

... well these are not the same answer. What is the real answer?

Pretend that the paper stacks from the monkey time activity are disks, how do you point to a single file? You need to know the physical location.

Back to brainstorm:

Address/identifier
physical location
- bytes of content

We need to handle collections of bytes in a collection of blocks (think: pages)

So to "point to a file" you would have to point to a lot of locations: picture using both hands, every finger pointing somewhere else.

How do we manipulate those bytes?

we don't want to see all files as a sequence of raw bytes.
we are familiar with a wide variety of file types.
the windows solution is to arbitrarily assign a three letter extension to differentiate between file types
in the unix world one way is not to mess with the file name, but to do something to the file itself.
- the 'file' utility determines the type of a file.

Suppose we have a file named A. It's 4 mb. What type of file is it?

It would be quite difficult to determine it without knowing - it's a matter of language recognition.
remember looking at ELF files - the characters 'ELF' are written at the start
This task is hard or at times impossible depending on the file type

File Monipulation

the course notes has a long list of useful files for manipulating files from the command line.
'chattr' and 'lsattr' are used for changin file attributes
- most files do not use this set of attributes
- but useful trick: you can chattr +i a file to make it immutable, even by root. This is good if you have a lot of management scripts running that are liable to delete files you want.

Looking at output of 'ls -l' unix file systems have a file owner and a group (which is supposed to be a collection of users)

the protection bits apply to user, group, other. Full read, write, execute for user, group, and other is: rwxrwxrwx

the right three are for world, middle three for group, left three for user
user and group are listed to the right of the permission bits in ls -l

 u   g   o
rwx rwx rwx

The 'd' at the left is to mark if it is a directory (d) or not (-)

Back to OS The operating system is responsible for enforcing these protections.

November 18th, 2013

Back To Top

Agenda

recap
file ssytems
- motivation: "file holes"
- the file "address space"
- indexed allocation

File Systems Continued

File systems and how they support the illusion of contiguous files.

What we may not realize as userland programmers is that there are many layers within the kernel required to deal with file system stuff.

Key Question: How can the kernel efficiently index and locate file content?

Many of the same issues that come up in memory management are present with persistent storage.

Put yourself in the shoes of an operating system developer. You're thinking about developing your own operating system. What you want to do is keep consistent with some of the linux patterns. You are therefore locked in to the linux file interface.

Dr. Locasto collected some data about one of his workstation computers, determined that the majority of files are a few kilobytes in size, and there are a fewer larger files.

Two questions arise:

Where is the content for these files?
How can you keep track of the space needed for growing these files?

Demo: lseek past the end of a file

The program writes some text to a file, seeks 1GB into the file, then writes a little more.

ll -h

shows that the file created is in fact 1GB

df -h

shows that essentially the same amount of space is taken up on hard drive as was before the 1 GB file was created.

This implies that there is some sort of mapping being done.

How do I do this mapping?

Each answer to this is a different file system. If you list the contents of fs in the linux kernel, you will see a long list of filesystems.

Clarifying a concept:

Directory Structure

we have a notion of files as being stored in a directory tree.
in linux, the root is denoted by '/'

                                '/'                                                            
                  ---------------^--------------     
                  |    |    |      |    |      |           
                 usr  bin  etc   var  dev    home                                                                            
                                                                                            
                                            (and so on)

You probably have multiple file systems attached to your directory tree.

and you can attach any file system to any point in your directory tree.
the integration is seamless - what filesystem is being used is meaningless to you as a userland programer or user.
use 'mount' command to show what file systems are currently mounted.
- read the man page if you want to

What is a file system?

Note: File System, NOT FileSystem.

Has the same responsibilities of a memory manager.

Question: where is the metadata stored?

Note: even directories are files.

We have a data structure called inode that remembers important things for us.

unique identifier
pointer to contents of file
operations depending on filesystem

Hard Link and Soft link and inodes

using command

ln TARGET LINK_NAME

soft link refers to a new inode linking to the same content

hard link refers to the same inode linking to (obviously) the same content

Making Access Fast

The secret is multi-level index schemes with triple indirect addressing.

This is informed by our understanding of common file sizes

November 20th, 2013

Back To Top

Virtual File System - Conceptual

imagining what the 'cat' utility would do...

cat "/home/eye/file.txt"

fopen(     )

open("path", ... , ...)
API:	[open] [read] [write]
API:	VFS
	→EXT2
	→EXT4
	→generic I/O
	→hardware

the VFS is an internal API
- it's the one that deals with "what file system is this stored in"

VFS from kernel code

Now Dr. Locasto will attempt to show us that the above whiteboard diagram is in fact what actually happens. The goal is to leave today with some confidance in reading this code.

An inode is:

a generic concept,
a reference to the actual data

it is a central data structure that allows an os to reference a particular file
but it's not the only data structure
the dirty secret is that there are several versions of inode
- there is an instance of inode created in kernel memory when a file is open
- but if this was the extent of the inode life, that wouldn't be useful
- so there are specific versions of inode that exist on disk - physical bytes on disk
  - EXT4 inode is around line 435 of fs/ext4/ext4.h

The 'open' system call

fs/open.c line: 1053

don't get confused by the syscall_define macro.

Then we go to line 1031 for 'do_sys_open'

the filename is turned into a file descriptor
we do_filp_open
- for loading the necessary metadata - mapping between the struct file and the struct d
if do_filp_open succeeded, we do 'fd_install'

Then we go to line 1018 for 'fd_install'

adds the metadata to the list of open files at a certain index fd
- now the file metadata can be accessed from the integer fd

Then we go to line 1669 for 'do_filp_open'

this is where the work for opening is done
we're not going to trace it, it's complex
the point: the reason that this is so much more work than the rest of the functions that we looked at is that this is where the 'rubber hits the road' and we're interacting with the hardware.

From 'open' to 'read' to 'close'

The 'read' system call

fs/read_write.c line 372

Again, not much work.

vfs_read - we're delegating to the virtual file system

To vfs_read on line 277

if file->f_op->read is not null....
- which means, if the function pointer for 'read' in the f_op function pointer table contained in the file struct is actually there...
then do it
otherwise, do the kernels default read: do_sync_read

To 'do_sync_read' on line 252

we go to the same file->f_op function pointer table, but look up aio_read (asynchronous input output)
this goes to ram

Back up to 'vfs_read'

read is a function pointer to a file system dependant read function
to find it, we'll choose any of the file systems (ext2) and look at its read function

To fs/ext2/file.c line 45

file operations table that we see before
it turns out that 'read' calls do_sync_read, which we know calls aio_read
this is because ext* file systems are the native file systems for linux
for other file systems, some other read function would be provided

November 22nd, 2013

Back To Top

Learn about particular features of file system: extended attributes and ACLs from multiple perspectives

Background

We're familiar with ownership and permission bits
we use the following utilities for that:
- chmod(1)
- charrt(1)
- stat(1)
- shred(1)

What is an access control list? (ACL)

Example:

	/etc/x	/var/r	/home/mike/hello.txt	/l/a/cloud/h453
Mike	r	rw	rwx
Alice	rw	r	r	rwx
Bob	x	r		r
xdman	x	w		wx

We can read this from two perspectives:

row wise: what capabilities do each user have?
column wise: what access is granted for each resource

Typically, the access control list is stored along with the resource (file)

Question:

is this approach better, equivalent, worse than linux permission bits?

What about user groups?
what about all other users?

In fact, this is a more general mechanism than the linux permission bits that we're used to.

Demo - permissions on virtual machine

$ ssh abird@csl.cpsc.ucalgary.ca
The authenticity of host 'csl.cpsc.ucalgary.ca (136.159.5.22)' can't be established.
RSA key fingerprint is cc:36:a4:22:58:af:e1:38:56:b6:e6:5c:ec:72:b7:ca.
Are you sure you want to continue connecting (yes/no)? yes
Warning: Permanently added 'csl.cpsc.ucalgary.ca,136.159.5.22' (RSA) to the list of known hosts.
abird@csl.cpsc.ucalgary.ca's password: 

Welcome to the University of Calgary Department of Computer Science
This system is for use by authorized users only.

   DEPARTMENT WEBPAGE:  http://cpsc.ucalgary.ca/
 NEWS & ANNOUNCEMENTS:  http://cpsc.ucalgary.ca/news/
         TECH SUPPORT:  http://cpsc.ucalgary.ca/tech_support/
                        email: help@cpsc.ucalgary.ca
                        help desk: MS 151

Please send bugs reports and package requests to help@cpsc.ucalgary.ca

[abird@csl ~]$ cd /tmp
[abird@csl tmp]$ cat locasto.password.txt 
cat: locasto.password.txt: Permission denied
[abird@csl tmp]$ ls -l locasto.password.txt 
-rw------- 1 locasto profs 8 Nov 22 12:12 locasto.password.txt
[abird@csl tmp]$ # Then dr. Locasto changed the permissions: 
[abird@csl tmp]$ ls -l locasto.password.txt 
-rw----r-- 1 locasto profs 8 Nov 22 12:12 locasto.password.txt
[abird@csl tmp]$ cat locasto.password.txt 
secret7
[abird@csl tmp]$ groups
jr s301 c457 c471
[abird@csl tmp]$ # The groups I am in. 
[abird@csl tmp]$ # If I wanted to restrict permissions to just one group of people, 
[abird@csl tmp]$ #    for working on a group project or something...
[abird@csl tmp]$ groupadd birdsgroup
-bash: /usr/sbin/groupadd: Permission denied
[abird@csl tmp]$ # that doesn't work - but apparently using the access control list, 
[abird@csl tmp]$ #    we can give access only to a few users
[abird@csl tmp]$ #    even without having them in the group
[abird@csl tmp]$ cd michael/
-bash: cd: michael/: Permission denied
[abird@csl tmp]$ getfacl michael/
# file: michael/
# owner: locasto
# group: profs
user::rwx
group::---
other::---

[abird@csl tmp]$ cd michael/
-bash: cd: michael/: Permission denied
[abird@csl tmp]$ # Dr. Locasto gave read permission to world for his directory:
[abird@csl tmp]$ cd michael/
[abird@csl michael]$ ls
ls: cannot open directory .: Permission denied
[abird@csl michael]$ # I cannot list anything because I don't have read 
[abird@csl michael]$ #     permissions for any files in this directory

Dr. Locasto showed that he could grant file access to specific users
one student was given access, and once he could read the directory, and was added using setfacl, he could change the locasto.password.txt file.

Looking at ATTR things

must give it a valid namespace
must have permission to modify that namespace

The setfattr tool allows you to modify the metadata for any file - you can change the default application to open it, send secret messages, etc.

however, your file system needs to have support for this
- it has to implement setfattr and getfattr system calls

Example - if you copy the file to another filesystem, the attributes are not necessarily copied

This is a difficulty that comes with transferring data between filesystems, or to different operating systems
because it's not ONLY moving it, it's converting it to a new format.

In fact, even if you're copying from one place into another within the same filesystem, those file attributes are not preserved.

but there is an option for 'cp' that preserves the file attributes

Note - the timestamps are stored in inode. But as with other attributes, when you copy a file, you're really making a new file with new timestamps.

November 25th, 2013

Back To Top

How Would We Design A Minimally Viable File System

Acknowledgement: 'viable' is only loosely defined.

Assume you have a well behaved storage mechanism which provides us with some number of available storage units.

It's our job to write a filesystem on it.

What are we supporting?

open
read
write
seek
~~close~~

Other Requirements

delete
expand

File Access Table: FAT

the file system cannot simply ram data onto the physical medium: you need metadata also.
one challenge: you have a file size limit because of the maximum address size
- it also limits the number of files

        FAT32                                                            
      0  +------------------+      0 +----------------+                   
         |                  |        |                |                   
         |                  |        |                |                   
         +------------------+        |                |                   
         |                  |        |                |                   
  'one'  |    next 'block'  |        |                |                   
         +------------------+        |                |                   
         |                  |        |                |                   
         |                  |        |                |                   
         |                  |        |                |                   
         |                  |        |                |                   
         |                  |        |                |                   
         +------------------+        |                |                   
         |      EOF         |      M +----------------+                   
         +------------------+      max                                    
         |                  |      size                                   
         |                  |                                             
         |                  |                                             
         |                  |                                             
         |                  |                                             
         |                  |                                             
         |                  |                                             
      K  +------------------+                                             
   only store                                                             
 so much in RAM

EXT2

Envisions a file as a sequence of blocks

  File                                                                   
   +-------------------+                                                  
   | B1                |                                                  
   +-------------------+                                                  
   | B2                |                                                  
   +-------------------+                                                  
   | B3                |                                                  
   +-------------------+                                                  
   |                   |                                                  
   |                   |                                                  
   |                   |                                                  
   +-------------------+                                                  
   | Bn                |                                                  
   +-------------------+                                                  
   |                   |                                                  
   |                   |                                                  
   +-------------------+

       swap                                                             
   /boot                   block 1   ...       block n                    
   +-----+----+-----+-----+--------+---------+---------+                  
   |     |    |     |     |        |         |         |                  
   |     |    |     |     |        |         |         |                  
   |     |    |     |     |        |         |         |                  
   |     |    |     |     |        |         |         |                  
   |     |    |     |     |        |         |         |                  
   |     |    |     |     |        |         |         |                  
   |     |    |     |     |        |         |         |                  
   |     |    |     |     |        |         |         |                  
   |     |    |     |     |        |         |         |                  
   |     |    |     |     |        |         |         |                  
   |     |    |     |     |        |         |         |                  
   |     |    |     |     |        |         |         |                  
   +-----+----+-----+-----+--------+---------+---------+                  
   0         /       \                                 M                  
            /         \                                                   
           /           \                                                  
          /             \                                                 
         /               \                                                
        /                 \                                               
       /                   \                                              
      +--+--+--+--+--+--+---+                                             
      |  |  |  |  |  |  |   |                                             
      |  |  |  |  |  |  |   |  ← metadata including                                       
      |  |  |  |  |  |  |   |             references to inodes               
      +--+--+--+--+--+--+---+

Journaling Filesystem

keeping a track of the operations that they system is about to do
- in a buffer
In some filesystems, both metadata and data. In others, just the metadata

November 27th, 2013

Back To Top

Design Challenge: operating system that bridges the gap between user level programs and hardware

What is /dev/null

we treat it like a file.
- it looks like a file
- we operate on it like a file
it is a bit of code that mimics a file
it does not act like a file

Devices are nothing more than files

name appears in file system
responds to read(2) and write(2)

the landscape of real devices

       +------------------------------------+                             
       |                              kernel|                             
       |                                    |                             
       |                                    |                             
       |                                    |                             
       |                                    |                             
       |                                    |                             
       |                                    |                             
       |                                    |                             
       |  +--------+                        |                             
+---+  |  |device  |                        |                             
|CPU|→ +--+driver  +------------------------+                             
+---+     |        |                                                      
  ↑       |        |                                                      
  |       +-+------+                                                      
   \        |   ↑                                                         
    \       ↓   |                                                         
     \ +--------+---------------------------+                             
      \|           device controller        |                             
       +------------------------------------+                             
       |                                    |                             
       |                                    |                             
       |                                    |                             
       |                                    |                             
       | device                             |                             
       +------------------------------------+

What is the CPU's role?

it handles interrupts from the device controller
this translates to a piece of kernel code responsible for responding to whatever action caused the interrupt
- eg copy data from a buffer

Types

A thing in this sense is not what it is named but rather what is done to it.

block device - data is accessed a block at a time; random access

character device - a stream of characters (or data values)

The important question is how the device imports and exports the data

Major and Minor numbers

cat /proc/self/maps

the column with ##:## is the major and minor number of the device in the format MJ:MN

to the kernel code! Device drivers

discussion of device driver writing.

Interesting point: the need to write device-specific sequences of bits to the device driver to communicate with the device.

December 4th, 2013

Back To Top

How does the Kernel support Networking

(from http://www.netfilter.org/documentation/HOWTO/netfilter-hacking-HOWTO-3.html)

A Packet Traversing the Netfilter System:

   --->[1]--->[ROUTE]--->[3]--->[4]--->
                 |            ^
                 |            |
                 |         [ROUTE]
                 v            |
                [2]          [5]
                 |            ^
                 |            |
                 v            |
              inbound      outgoing

Return to Diagram from Last Week

                                                                         
              virtual machine                                              
                                                                          
                **********                                                
               **         ***                                             
              **            **                                            
             **              *                                            
             **  10.0.15.2    *                                           
              *               *                                           
              *              **                                           
              **            **                                            
               **    |   ***                                              
                *****|****                                                
                     | eth0                                               
                    ↑                                                     
                  __/ \___                                                
                 /        \                                               
                /          \                                              
                |          |                                              
                           ↓

Every modern unix box can do all of this

that is, the general architecture for receiving and intercepting packets

PORT: a destination within a machine. Determines which program the kernel will send the packet to.

applications bind to certain ports.
all of this is convention only - so you don't have to follow the convention if you don't want to, as long as both ends of the communication (client and server) agree.

The beautiful thing about networks is that you can just read what is there.

Catching some packets

Dr. Locasto ran the wireshark program, which is a GUI for tcp
he starts poking through some packets
back on his mac he uses tcpdump to take a look at network traffic
- point made: isn't this illegal?
- he used
```
tcpdump -i en0 -v -X -A
```
what it does is instead of dropping all packets not destined for your own ip address, it takes a look at all of them.

One thing to note

initiating a network connection and communicating with another machine takes several protocols
- diagnosing network problems is rarely a one-problem problem.

Debugging network issues

using a packet inspector and strace you can poke in and look at what's happening and what not happening.

Courses/Computer Science/CPSC 457.F2013/Lecture Notes/Scribe2

Contents

Other Information

September 16th, 2013

Lecture Notes

Programs, how they become processes

Command Line Exercise

What Are These Commands

Pipes

Processes

October 16th, 2013

In the Kernel

Process Dives Into Kernel

The 'read' System Call

Question: Macro or System call?

About the Delegation Pattern

Kernel Memory Management

September 18th, 2013

Logistics & Preamble

Question Asking Protocol

Readings

Recap

Why Command Line Agility

Learning Objectives

What is an OS - the big fuzzy picture

Kernel Is a:

Why Not Put All Code In Kernel?

strace

Notation

Important For Homework

September 20th, 2013

Questions from last class

How and why are the blue dashed lines/divisions enforced?

Example: silly.c

Segments

Two Most Basic Responsibilities of Kernel

Supplementary Reading

September 23rd, 2013

Making System Calls

When would you want to do this?

Learning Objective

September 25th, 2013

Process Abstraction

Terminology

Our VM's kernel code:

line 1215 onwards: struct task_struct

Process States

History through Process States

Some Coding Example

September 27th, 2013

Thoughts

Much of what an OS does

What is a Process

What makes these lines real?

Process Address Abstraction

Problems with direct mapping addresses

opening up the cat process

Commands from the context of PAS

September 30th, 2013

Process Address Space

What the MMU Does

Exercise: translating linear address to physical address by hand

Returning to discussion of MMU

Page Table and Page Directory Details

October 9th, 2013

Recap

Test 1

Exit Times and Scores

Homework 2

Other Tips & Comments

Activities

Q1: Would we need OS'

Q2:

What's In The Kernel

Memory Page Replacement Algorithms

October 11th, 2013

User Level Memory Management

The Problem

Solutions

Process Address Space