Courses/Computer Science/CPSC 457.F2013/Lecture Notes/Scribe2
Note from Scribe 2: You may notice links to non-existent files - these are for diagrams Michael draws on the whiteboard. The wiki doesn't allow file downloads (or there is some bug with it or something), so they're not up here. I believe Michael will upload them to his site and link to them sometime down the road. In the meantime if you really need them, email me: abird@ucalgary.ca.
- Scribe2/Other Information ← summary of what to read
- #September 16th, 2013
- #September 18th, 2013
- #September 20th, 2013
- #September 23rd, 2013
- #September 25th, 2013
- #September 27th, 2013
- #September 30th, 2013
- #October 9th, 2013
- #October 11th, 2013
- #October 16th, 2013
- #October 18th, 2013
- #October 22nd, 2013
- #October 25th, 2013
- #October 28th, 2013
- #October 30th, 2013
- #November 4th, 2013
- #November 13th, 2013
- #November 15th, 2013
- #November 18th, 2013
- #November 20th, 2013
- #November 22nd, 2013
- #November 25th, 2013
- #November 27th, 2013
- #December 4th, 2013
Contents
- 1 Other Information
- 2 September 16th, 2013
- 3 October 16th, 2013
- 4 September 18th, 2013
- 5 September 20th, 2013
- 6 September 23rd, 2013
- 7 September 25th, 2013
- 8 September 27th, 2013
- 9 September 30th, 2013
- 10 October 9th, 2013
- 11 October 11th, 2013
- 12 October 18th, 2013
- 13 October 22nd, 2013
- 14 October 25th, 2013
- 15 October 28th, 2013
- 16 October 30th, 2013
- 17 November 4th, 2013
- 18 November 13th, 2013
- 19 November 15th, 2013
- 20 November 18th, 2013
- 21 November 20th, 2013
- 22 November 22nd, 2013
- 23 November 25th, 2013
- 24 November 27th, 2013
- 25 December 4th, 2013
Other Information
Refer to the Scribe2/Other Information page for list of readings by date
September 16th, 2013
Lecture Notes
- One week at a time material will be posted
- Some slides are just talking points for the lecture not real content: they will be excluded
- Scribe notes are mostly what is said in class including Q and A
Programs, how they become processes
- Simultaneously a demo on command line
- Showing VM VirtualBox
- We can do whatever we want in terms of platform for our homework, but we will need to submit homework that works on this platform
Command Line Exercise
Covering common commands
Ordered List Of History | ||
---|---|---|
Objective | Command | Comments |
Show history of commands | history |
Too many lines |
Pagenate output of history | history | more |
press 'q' to quit |
Produce a frequency distribution of command history | history | gawk '{print $2}' |
awk is the grandfather of perl |
Count the output | history | gawk '{print $2}' | wc |
|
Get information about a command | man wc |
displays a "manual": information about the command wc. |
Sort output | history | gawk '{print $2}' | sort |
|
remove duplicates | history | gawk '{print $2}' | sort | uniq -c |
includes count of each entry |
Sort output numerically | history | gawk '{print $2}' | sort | uniq -c | sort -nr |
n for numeric, r for reverse order. |
Common commands | |
---|---|
Command | description |
ls | |
cd | |
man | |
clear | |
emacs | |
make | |
ll | an alias for long list 'ls -l' |
cat | concatenate the contents of a file, output to command line |
ps | lists running processes |
ps aux | lists all running processes (those in square brackets are kernel threads) |
pstree | used to show hierarchical relationship |
top | gives a live updated version of ps |
df | gives info on disk usage |
ifconfig | tells you the state of network adapters |
diff | handy utility for programming. To produce a patch or to see the difference between two files. |
yes | |
kill | sends a signal to a process, especially to kill a process. |
objdump -d | dissasembles object |
strace [command] | traces behavior of a program |
What Are These Commands
- Some are programs
- Some are built in functionality
Pipes
A way for the output of one command to become the input of another command
Processes
Looking again at addi.c from prev. lecture
Out notion of source code is one thing, but after a translation process the program becomes something different.
Ex: interesting behavior with Hello World program.
#include <stdio.h> int main(int argc, char* argv[]) { printf("Hello, 457!\n"); return 0; }
October 16th, 2013
lecure notes start at 12:30 (refer to other scribe notes for the full lecture)
In the Kernel
Process Dives Into Kernel
Line 2076ish
############ # # # # # P # # # ############ | | ##########|########################### # Kernel | # # | # # | # # V # # ****** # # *** ** # # * * # # * brk ------->******* # # ** ** ** ** # # *** ** ** do-brk * # # *** ** * # # *** ** # # **** # ######################################
The 'read' System Call
Line 291:
- the file structure has a read method: more delegation!
- click on the 'read' link to search for it:
What's going on:
- sys_r
- vfs_read
- read
- the file system
- device
READ is an implementation specific to a file system.
Kernel reading tip: Don't give up! keep digging through the source code to reach the answer. You will not get the whole answer on one screen.
Question: Macro or System call?
How do you know whether something is a macro or a system call?
- toss a coin
- you just learn what's what
- it doesn't matter
Kernel is a big mixture of macros, function calls, assembly
Practical advice: click on the name and it will probably show you.
About the Delegation Pattern
- Keeps things manageable.
- consider sys_read: a nice API
- it delegates to VFS
- VFS to all sorts of different file systems
- File system doesn't care about the physical organization
- it delegates to VFS
Kernel Memory Management
Talking a look at the 'grab' program
We're looking at the source code
- "ignore the signal handler stuff"
- "what the heck is a struct aircraft"
- has a few member variables
- glibc_test accepts a pointer
- it allocates memory off the heap
- then prints out what is stored at each memory address within a relevant range
- Then we read in the hexadecimal numbers what the contents of our aircraft struct are
- that is, we read "deadbeef", the ascii code for 'a', etc.
September 18th, 2013
Logistics & Preamble
Question Asking Protocol
- If you want to ask a question but Michael has not noticed, get his attention loudly
- This is an interesting OS concept: use the 'interrupt' policy instead of relying on Michael 'polling'
Readings
- Recap your knowledge of x86: "A Tiny Guide to Programming in 32-bit x86 Assembly Language"
- ~5 page refresher.
- "A Whirlwind Tour .... "
- an excellent illustration of all layers down to os
- system calls
- how a program becomes a process
- Read in textbook:
- MOS 1.3, 1.6, 1.7
Recap
- So far we have learned a little bit about a few topics that do not seem to be connected
- by the end of the semester we'll see how it all connects
Why Command Line Agility
- To show off
- Demonstrate documentation built into machine
- Demonstrate wide range of commands
- Give practical skill
Learning Objectives
- GEt an intuition for what is an os
- components
- What's the kernel
- problems it solves
What is an OS - the big fuzzy picture
- An OS does not execute programs, it rather gets out of the way - giving resources to the process that you want to execute
- manages multiple processes
Kernel Is a:
- piece of software
- interface to hardware
- supervisor: manage important events
- protection - maintaining the imaginary divide between separate processes, etc.
Why Not Put All Code In Kernel?
- Complicates kernel, makes it more error prone
- Mistakes in your code affect every process in the machine
- A bug in your code would crash the machine if it was in the kernel
strace
- Allows us to observe the system calls interface in action
Notation
write(2) does not mean calling function 'write' with argument '2'
- The number distinguishes which of several identically named methods is being referred to. It is the section in the manual that the target method is in:
command line utilities: section 1
system calls: section 2
something else: section 3
Important For Homework
Understanding the system calls that we read as the output of strace for the 'h' hello world example
September 20th, 2013
Questions from last class
The big picture thing.
- two processes cannot communicate without asking the kernel to help tham
How and why are the blue dashed lines/divisions enforced?
LEARNING OBJECTIVE for today:
- able to distinguish between the execution environment for userland processes and ??? (see main notes?)
Example: silly.c
- looking at object dump, thinking about what we have access to.
- GPRs:
- eax
- ebx
- ecx
- edx
- edi
- esi
- Other Registers
- ebp
- esp → stack pointer
- eip → frame pointer
- eflags
Segments
- breaks memory into different parts
- start (base)
- limit
- DPL - descriptive privilege level
- two bits (00, 01, 10, or 11)
The DPL bits are the only things enforcing the isolation between user level programs and the kernel
Two Most Basic Responsibilities of Kernel
Live in Kernel memory and contain pointers to kernel functions (IDT) and priviledge levels of different memory segments (DGT)
- SEt up the GDT table
- Enforces protection between itself and user level processes.
- Set up the IDT table (interrupt descriptor table)
- Allocates memory and populates table
- Refer to diagram: Network card sends exception to CPU (curved arrow). CPU refers to the IDT table and executes the indicated function.
Supplementary Reading
- http://www.intel.com/products/processor/manuals
- Section 2.1 (skim this)
- Section 2.2 (pay attention to description of pipelined microacrchitecture)
- one other
- http://www/intel.com/products/processor/manuals
- three parts also (see main notes?)
September 23rd, 2013
Making System Calls
- If system calls are privileged functionality, then how can applications invoke them?
Write system calls can be called from c by including a library (see man page).
- However, we see in the object dump that it's actually calling a glibc function, not the system call directly
So we'll write an assembly program to directly call the system call:
;; write.asm BITS 32 GLOBAL _start SECTION .text _start: ;; write the arguments ;; become DPL 00 ;; do this by invoking a interrupt mov eax, 4 mov ebx, 1 mov ecx, mesg mov edx, 0xb int 0x80 mov ebx, eax mov eax, 1
INT is in a DPL 11 code page
The path of the write command:
- DPL 11 accessable interrupt command sent to CPU
- CPU refers to the interrupt table and executes appropriate action
- Note: No direct path from program to kernel
When would you want to do this?
- Really fast or small code
- Not for drivers or interrupts (that's kernel stuff)
- If you're on a system that does not have glibc or the support library
- real answer: if you want to write malware
Learning Objective
If we had to write a program as above that removes a file, could we?
;;try to call unlink on ? ;;before proceeding with coding, echo "secret" > secret.text ;;this is what we are trying to remove with unlink ;;then look up details on unlink with man 2 unlink ;; unlink("secret.txt"); BITS 32 GLOBAL _start SECTION .data filename db "secret.txt" SECTION .text mov eax, 0xA ;; a few more lines of code
September 25th, 2013
Today we look at a very important topic in operating systems: making programs run.
Process Abstraction
Continuing/completing the discussion: how does a program become a process?
Terminology
- task
- A generic term referring to a running process, whether or not it is a process, thread, kernel thread. A job needing doing.
- process
- a program in execution. Has several characteristics
- instruction pointer: what instruction will be processed next
- CPU context: the state of the CPU
- set of memory pages (process memory space)
- contains the program code, static data, etc
- dynamically allocated memory space
- set of resources: files, metadata about process state (PID)
- thread
- does not have the heavyweight stuff a process has, but rather is part of a process
- lightweight process
- in linux there is no distinction between a thread and a process - both are called "lightweight process".
Our VM's kernel code:
goto http://lxr.cpsc.ucalgary.ca/lxr/#linux+v2.6.32/include/linux/sched.h#L1215
- kernel code is C
- but they have their own style & idioms that may take some getting used to
- and it's not thoroughly commented - see style guidelines
line 1215 onwards: struct task_struct
This is the beginning of this task_struct
- 1216: volatile long state controls if the process should be running (as per the comments)
- not all processes run at the same time
- not all processes are capable of using the CPU - it may be waiting on something
- 1271: struct mm_struct points to data
- M. Lacosto could probably teach the entire semester from task_struct and mm_struct
- 1288: pid (we're familiar with this already)
- 1344ish: the owner of the task - who spawned it
Process States
- TASK_RUNNING: process is eligible to have the CPU i.e run
- TASK_INTERRUPTIBLE: process sleeping (e.g. waiting for I/O
- TASK_UNINTERRUPTIBLE: same as above, but ignores signals
- misc: traced, stopped, zombie
History through Process States
the first process to start after os startup is init.
init cannot start new programs with execve() because then init would be overwritten and at the termination of the new process, everything has stopped
instead init uses fork(2) and in the 2nd copy of init executes execve() to create the new process
Question: isn't it a bit of a waste to completely duplicate init using fork(2) then immediately overwrite it with execve()?
Answer: the solution is to duplicate the process control block ONLY
- remember the mm_struct in the kernel - only the pointer is copied rather than a deep copy
- this works UNTIL the new process attempts to write to the destination of that pointer, at that point it performs a deep copy.
- this is called copy on write
Question: what happens if a child dies?
Answer: the child stays around as a zombie long enough for parent to gather some last information
Question: what happens if a parent dies with children still around?
Answer: the child is re-parented (usually with init)
Some Coding Example
not quite complete:
//include: //stdio //unistd //main with command line args { pid_t pit = 0; pid = fork(); if(0 == pid) { // i'm the child printf(stdout, "hey i'm the child, my pid is %d", pid) } else if(pid > 0) { // parent printf(stdout, "hey i'm the parent"); } }
September 27th, 2013
Thoughts
Much of what an OS does
Namespace management:
- alloc
- project
- cleanup/recycle
- CPU
- memory
- files
- users
What is a Process
What makes these lines real?
Process Address Abstraction
How do you multiplex memory??
Every program believes that it (and it only) has access to the 0 → 3G range in RAM File:CPSC 2013 09 27C.gif
Problems with direct mapping addresses
- you cannot give each process (~140 of them) free access to all RAM
- if you did, then changed RAM hardware, would have to change program
opening up the cat process
lists the allocation of memory for a running process
or you can execute cat /proc/self/maps (where self is the current process - cat)
what's in there?
- libraries
- repeated if you need different memory regions (-- different permissions)
- r-x for code
- r-- for constants
- heap
- stack
Commands from the context of PAS
- fork(): create PAS
- execve() replace PAS comments
September 30th, 2013
Process Address Space
how does your program address a piece of memory?
What the MMU Does
Translates from logical to linear to physical:
- Logical Address
- [SS]:[offset]
Has several parts:
- segment selector
- 32 bit offset
- Linear Address / virtual address
- [Dir|Table|Offset] Made up of three parts:
- Physical Address
Exercise: translating linear address to physical address by hand
0x08049858
Offset: 12 bits = 3 nibbles
Dir and table are 10 bits each
[0000 1000 00] [00 0100 1001] [1000 0101 1000]
base 10: [ 32 | 64 + 8 + 1 | 2048 + 64 + 16 + 8 ]
[ 3210 | 7310 | 216410 ]
Returning to discussion of MMU
TLB caching speeds up the translation process which otherwise would be relatively costly because it requires 3 table lookups for each translation.
TLB stands for Translation Lookaside Buffer
Page Table and Page Directory Details
There is a page table and page directory for every running process
- They are part of the kernel's memory
October 9th, 2013
Recap
Test 1
- Dr. Locasto is stoked on the results: you never see such well distributed grades!
- Pleasing spread
- Also a number of people who did do quite well
- For our individual scores: don't freak out. Remember that this is points out of the 1000 that you can score in the semester.
- The test indicates that half the class is not necessarily cluing in to what Dr. Locasto is saying.
- Some people didn't know enough
- Some people were not precise enough in their answers
Exit Times and Scores
There doesn't seem to be a correlation between the two.
Conclusion: take your time if you need it.
Homework 2
Problem 1: testing our understanding of manipulating process address space in memory regions
- generate and execute code at run time
- straightforward if we get the right set of systems calls involved
- might be useful to brush up on x86 assembly language
Problem 2: a number of questions worth small amounts of points
- This is to get us looking in the right direction
- Look at the linux kernel code to help solve the problem
- there will be some twist that Dr. Locasto won't discuss yet
Problem 3: "The Meat"
- we will extend basic kernel functionality
- add system call that marks process in a certain way
- if process marked, kernel fails all syscalls from that process
- Use Robert Love's textbook
- we will have to assess what we have done
- The really tricky part is what you need to do in the kernel to filter the processes
Other Tips & Comments
- start early and ask questions
- we will have to learn through this assignment
- In one month, hopefully 93% of us will be able to do this assignment
- We will get little bits and pieces, aspects of assignment related material in class - but don't wait for that to complete the assignment.
- The thing that makes this assignment difficult is that we are coding in an unfamiliar development environment: the kernel.
- "There's homework 2. Enjoy"
Activities
Class exercise: two questions.
- If we didn't have computers, would we still need operating systems?
- What is the difference between
- protected mode
- kernel space
- superuser mode
- root user
- Do root's processes run in ring zero?
Q1: Would we need OS'
principles, concepts to multiplex resources to maximize utility.
- In a business you need an "operating system" to multiplex human resources between projects, and restrict access to file folders, etc. in order to maintain confidentiality borders between projects.
- politics: multiplexing $$
- Library: multiplexing books
In this course we will learn the concepts related to managing complex systems. Using a computer is just one such case.
Q2:
- protected mode (one of several modes the architecture can be in)
- student answer: A mode the CPU runs in that doesn't let certain things be accessed
- prof answer: A mode in the architecture that supports isolation between PAS
- kernel space
- student answer: the space in which a kernel dwells
- prof answer: memory segments with certain dpl bits
- supervisor mode: ring 0, have access to privileged instructions
- root user: user account
- Just a little more special than the other users
Most processes do not run in ring zero
What's In The Kernel
We should be able to list the components and describe how they relate to one another.
Memory Page Replacement Algorithms
Later
October 11th, 2013
User Level Memory Management
- API
- Algorithms and data structures for user level memory management
The Problem
support growing and shrinking memory allocations (ie dynamically allocated memory)
Solutions
- First-fit
- Next-fit
- Best fit
- Worst fit
- ?
Process Address Space
################ # # # # ################ # # # PAS # memory region # # ################ # # # VM # memory address translation # # ################ # # # physical # # # ################
Memory allocation in C
man malloc
malloc returns null if the OS could not / did not give you any memory
A Program That Does This Stuff
It's a linked list program - when new nodes are made, malloc is called.
Each node is 12 bytes long:
- 1 for the char
- 4 for the long
- 4 for the pointer
- 3 for padding so total is a multiple of 4 bytes
We want to check if the addresses of the next pointers are in heap.
- ps aux | grep <program_name> to find pid
- pmap <pid> to find map of memory for process
- more stuff
Use strace to see what happens in terms of system calls while llist runs
- We see that other than some initial setup, while the process runs the OS is not involved.
- lesson no. 1: malloc does not directly manipulate the heap
- glibc does it's own internal memory management.
What is happening?
- memory allocated to process in page-sized chunks of memory
- that process can do what it wants with it.
Back to the main point:
- the question is how memory is dynamically allocated.
Dynamic Allocation as a Narrative
- program asks for 12 bytes
- OS gives bytes 0 through 11
- program asks for another
- OS gives bytes 12 through 23
- a second program asks for 1k of memory
- OS gives bytes 24 through 1047
- program one frees second 12 bytes of memory: bytes 12 through 23
- program one asks for another 12 bytes
- OS has a tough decision: allocate at the end, byte 1048, or back at byte 12?
Then imagine 140 instead of 2 processes, doing this for days on end.
Problem: fragmented heap.
Solution: one of the 5 that were alluded to at the start.
Solutions cont'd
- First Fit
- Start at byte 0, move upwards, allocate the first block of the right size
- Next Fit
- Best Fit
- finds all possible fits, allocates the one that is the tightest fit.
- Worst Fit
- As above, but allocate the one that is the loosest fit.
Analysis of Solutions
- Best and Worst: slow
- All: listening to the user (developer) is dumb, because the developer asks for unusual sizes of memory.
This causes two problems:
- Internal Fragmentation
See below
- External Fragmentation
It is possible to have enough memory that if it were contiguous a serious memory request could be met - but since it is fragmented, it can be useless. Eg. 1G free in 10 byte chunks.
Improved Solutions
Bucket system:
- A small bucket, a medium bucket, a large bucket.
- Problem: give the user program 24 bytes when it asked for 12. now there are 12 bytes unused but allocated.
- This is internal fragmentation
Looking at the brk command
Diving into the linux kernel
search for brk command:
- way too many results
tip: system calls are named sys_NAME, such as sys_brk.
search for sys_brk
tip: sys_brk is just a symbol that refers to a location in kernel text. how that is associated with functionality requires a macro.
- this macro has 6 versions
- search 'DEFINE1(brk'
- voila: the right one is the 'mm/mmap.c, line 246 (98%)' link.
Reading the Kernel
We can read this (almost) without Dr. Locasto's help, says him.
- If you take it one line at a time, you can probably understand the semantics.
Tips
goto out;
if used in a right way is a great way to avoid code replication.
October 18th, 2013
Kernel Memory Management
############# # # # # # malloc(3) # # # # # ############# syscall ##################################################### # # # brk(2) mmap(2) # # | \ # ##|#######\########################################## | \ kernel ##|#########\######################################## # | \ # # +===========+ # # | | # # | brk imple-| # # | mentation | # # | | # # +===========+ # # # # # #####################################################
- Why don't we code with the brk command?
- We use malloc instead - why? (or new, or ... etc)
Malloc vs brk
void *malloc(sizet n)
- malloc deals with pieces of memory that are smaller than one page - the size of the particular type that I am manipulating.
- brk deals with page sized chunks
- Recall the aircraft struct from last class - it has a certain arbitrary size.
- let's allocate a new one with malloc:
malloc(sizeof(struct aircraft));
Does Malloc Call Brk? When?
- Student answer: it calls brk when malloc runs out of page to allocate and needs a new page
- malloc hands you a pointer to somewhere in the heap
- then you can (with pointer arithmetic) access anywhere in heap
- you're stopped if you access beyond the heap on either side
- if you've used all of heap, then malloc calls brk to increase the heap
Malloc Implementation
- we could implement our own memory manager.
//malloc: void *give(site_t n); //free: void recycle(void *addr);
In psuedo-code:
void *give(site_t n) { // needs to obtain memory // detect that you are uninitialized // call brk // needs some sort of data structure to track allocated blocks // finds n contiguous "available" bytes, return address of those bytes // choose from: // first // last // best // worst } void recycle(void *addr) { // do some error checking // need data structure to track allocated blocks // mark this as not allocated - addr and the number of following bytes // or keep track of a free list }
The Bucket Allocation Idea, Revisited
Question: how does the kernel allocate memory for itself.
It does two important things:
- supports user space by implementing mmap and brk
- it's own internal memory allocation: kalloc
- technically interesting details that we wont be expected to regurgitate on a test
- has a generic piece, 'zone allocator'
- buddy algorithm responsible for maintaining bucket allocation approach
- also SLAB (original), SLOB (something else - uses next fit approach), SLUB (embedded systems)
- maintains 11 buckets containing pages
- If one bucket is all used up, it borrows from one bucket larger (and splits)
- then when freed, it coalesces the memory to put back to the larger bucket size
Student question: malloc needs to malloc to have data structures. How?
- actually, it's fancier than that:
- The metadata about what blocks are free are actually embedded in the heap
Student question: what happens when kernel out of memory?
- You're hosed.
- Fork fails
- "if the user does something ridiculous, you will have a failed system"
October 22nd, 2013
Recap: What have we done so far?
- What is an OS
- What is a process
- kernel roles
- mem management
- user, kernel
- internal, external
- virtual memory
- illusion of unlimited CPU/Mem
- system calls
- paging
- Protection
- protection
- system architecture
Suggestions for Assignments
- Encouraged to ask questions in class, office hours, or on Piazza
October 25th, 2013
System Startup Leads To...
Transition from sequential code execution to concurrent execution:
- initializes the scheduler
- creates two processes (via do_fork and via sys_execve)
- enables interruts
- allows scheduler to choose between processes
- one more step (see course slides)
The "problem" of scheduling
Example: Dr. Locasto has 5 grad students who all want his attention - they all show up at his door at the same time.
Grad students:
- R
- Sa
- St
- B
- A
Approaches
- Pick at random
- first come first served
- priority / urgency / favourites
- Estimate workload and choose the one that would take the least time
- Alphabetically
- save history of service
- Or work on nearly completed task, or one that is behind, etc.
- Divide time into equal slots and rapidly switch between them - round robin
How this relates to the big picture
- This is the challenge of scheduling that is faced
SCHEDULING: a mechanism that creates the illusion of sumultaneous execution of multiple processes
Dr. Locasto's Computer
Currently:
- running powerpoint
- talking to external devices
- etc.
Human beings can't do this well, but CPUs can!
Considerations for Scheduling
- Fairness
- Priority
- Efficiency of scheduler
- Quantum -> process should be allowed to run as long or longer than the scheduler takes to make a decision about who's next
- Preemption
- PRIMARY CONSIDERATION: progress through the aggregate work of all processes
Quantum
- If too short, context switch overhead is high
- If too long, concurrency disappears
- my_stopped_time = N * quantum.length // where N is the numberof processes running on the machine
- Interactive applications should be visited more frequently so that it does not become unresponsive
Preemption
The kernel can interrupt a running process at any point in time or when its quantum expires
- Enables: "do something else more useful"
- preemption does not imply suspension
- preemption in linux 2.6 can happen either in user mode or kernel mode execution
- "This is probably one of the coolest parts of the kernel"
Note: Process List vs Run Queue
The list of all processes is logically distinct from the data structure used to organize processes into groups ready to be scheduled
The concepts are different, but implementations may share a data structure or be two completely seperate lists.
Priority
Most scheduling approaches / algorithms can be boiled down to the concept of priority
- The 'nice' command allows you to adjust the priority
- it does not allow you to set the priority, it's just a parameter considered in priority setting
- default nice value for a process is zero
- e.g. the task for the window tha you have in the foreground is sometimes marked as higher priority
Types of Processes
- CPU-bound (eg lots of calculations) OR I/O bound (lots of time waiting on external device read/writes)
- interactive OR batch
- A batch process can be I/O bound or CPU bound
Conflicting Objectives
- Fast response time (process' seem responsive)
- Good throughput for background processes
- Avoid "starvation"
Scheduling policy: rules for balancing these needs and helping decide what taks gets CPU
Scheduling Algorithms
- First Come First Served
- Shortest Job Next
- Round-robin
- Some simple examples: R, A, B
Artificially Simplified Problem:
quantum = 5 work units context switch = 1 work unit R A B ===== ===== ===== arrival time 0 0 0 work 10 3 15
Scheduling Policy:
- FCFS
- asess a tie with arbitrarily choosing one process
Running history: Process Running Time Running Total (work units) (work units) ======= ============ ============= A 3 3 context switch 4 B 5 9 context switch 10 R 5 15 context switch 16 B 5 21 context switch 22 R 5 27 context switch 28 B 5 33
Note: the exit system call calls the scheduler
October 28th, 2013
Scheduling Algorithms
- First come first served
- shortest job next
- round-robin
Some Simple Examples
A B C === === === arrival 0 0 0 work 1 2 3
Comparing Scheduling Algorithms
What metrics do we use?
- Wait time
- how long from when task arrived until it was serviced
- Turnaround time
- Time from when task starts to when task is complete
Linux's Scheduling Support
Linux supports several different policies:
#define SCHED_NORMAL 0 #define SCHED_FIFO 1 #define SCHED_RR 2 #define SCHED_BATCH 3
These flags are used to select the type of scheduling
Implementing Scheduling
"who gets to run next?"
The 2.4 scheduler - a big loop - O(n)
- iterates over the list of all tasks
- calculates goodness
- goodness
- takes nice and a few other heuristics and returns a weight
Problem: it is a O(n) algorithm, doesn't perform well with large numbers of tasks (eg on a huge server)
The 2.6 scheduler: a priority queue - O(1)
- maintain a priority queue
- the work is done when a new process is added
What is it doing?
Maintaining "multi-level feedback"
- every CPU has a run queue
- in the queue there are a few fields pointing to a few arrays
- a list of active entries - those who have not yet run
- a list of _____ entries - those who have already completed
- these arrays are about 140 entries long
- The policy decision is determining which slot (array element) to put a task in.
Simple Problem
A B Round Robin (RR) C D | E |-----|-----|-----|-----|-----|-----|---> . | . . Z
- What if one user owns A-Y and another owns Z. Then owner one will get way more processing time.
- There is a relation between the number of processes and the unit of time
- fixed quantum - each process gets a useful amount
- variable quantum - everyone gets a slice, but following the graph below, as N number of processes increases, the total amount of useful time decreases
Quantum - N relationship ======================== # * # * # ** # ** # ** # ** %cpu #--------------------------------------- # ** # *** # ** # ** # **** # ********************** # -------------------------------------- # # # # # # ######################################## N # of proc
Whiteboard Notes
Policy vs Mechanism
- scheduler structure
- other areas:
- mem management
- disk
- other resources
Process Lifecycle
-+ \ /| \ / \| / -+ ############# ###########----------------------------> # # # # # Running # # Ready # <----------------------------# # # # # # # # ############# ########### +- / |\ / \ / \ / \ / \ |/ \ ############## +- # # # # # Sleep # # Wait # # # ##############
October 30th, 2013
Concurrency As A Problem
Carol Pete ############ ########### # # # # # # # # # # # # # # # # # # # #\ _ ############ ########### \ /| \ / \ Bob / _| ################## # # # # # the shared # # buffer # # # # # ##################
Critical section: you could think of it as code - it is code. But a better definition is to describe it as a shared state between.
- → the question is: "is the data involved in that section of code manipulated by multiple threads?"
Race condition
Avoid it through mutual exclusion - don't let multiple threads access a critical section at the same time. That one task moves into the critical section, executes the code, and when complete there is the opportunity for another task to move in.
Command Line Concurrency: history | grep "awk"
The pipe is actually a shared buffer. Whatever history produces get put into the shared buffer, then grep attempts to read from that pipe.
history grep "awk" | ^ | | | | | ______________________________ | | **** ** | | ** *** ** | | * ** ** | +---------------> * the pipe **-----+ * * ** ** * * ** ** ** *******____________________________**
This works because the os is delegated to to provide mutual exclusion to the critical section
Practical Concurrency Examples
- Shell I/O redirection
- command line pipes
- named pipes (special files called FIFOs)
- Courses/Computer_Science/CPSC_457.W2012/<some_file_name_I_didn-t_have_time_to_write>
Multithreading with fork to demonstrate this issue
How can we multithread without anything fancy?
- fork? not really, each has it's own process address space
- clone? yup. it exists to create another thread of control that executes in the same process address space.
This difference really illustrates the concept of a critical section.
Potential Solutions
- spinlocks / busy waiting (wastful b/c CPU is busy doing "nothing")
- lock variables (recursive problem)
- disabling interrupts (not trustworthy, multi-CPU systems unnaffected)
- do YOU trust programmer Pete to re-enable interrupts? not really
Much like the architecture provides protection and other things, it provides support for multi-threading which is the only way that modern computer works. MOS p.131 has code for correct entry and exit from a critical section of code
November 4th, 2013
Finishing our discussion of user level concurrency
Using Locks VS Not using locks
Unsafe Race
- The routine has a for loop that counts to 2 billion
- We want to see how fast it is
- → 0 s
Safe race is the same program but it uses locks
- → 4 s
What to learn:
- well this is actually worth it
- we see above that locking is "expensive"
- one heck of a slow down
- so make critical sections as small as possible - but don't sprinkle them all over your code
- balance making them few and making them small
- DO the locking, but realize you're imposing a performance cost.
Critical sections are slower than unsafe code, but of course unsafe code is unsafe.
Kernel Syncrhonization
Kernels are naturally concurrent
- servicing many requests from processes
- servicing many requests from hardware interrupts
- these requests are not independant
BIG PICTURE: how can the kernel correctly service requests?
It is an event handling system: any function could be executing at any given time.
- you have multiple control tasks all in different states, all being interwoven
- below are the major concepts to take away:
Atomic Operations in x86
What does the x86 platform offer us in terms of ...?
Kernel Locking / Synchronization primitives
that is, the "API" that linux offers:
- Atomic opperations
- disable interrupts (cli/sti modify IF of eflags)
- assembly instructions for this - on a per cpu basis
- lock memory bus (x86 lock prefix)
- this enforces system-wide serialization
- spin locks (basically a mutex)
- semaphores
- an important skill to take away from this course is if you should use a mutex or a semaphore
- sequence locks
- barriers
- the spin lock or semaphore makes sure one task or thread controls within one critical section, but a barrier accumulates them
Note: remember that if you understand the key data structures in the linux kernel - what it is, what it does, and what operates on it - you would understand everything you need to know for this course
down_read and up_read will be useful for homework 2 problem 2
- totally voluntary - tells system that you're only reading
Spin Locks
Caution: we're looking at many layers of definitions. This may seem frustrating at first - but it is a good thing. It means there is a generic form that is adapted to each platform.
slock = 0 // locked
slock = 1 // unlocked
Spinlock API:
After many layers of definition and wrapping we get down to assembly code. We should know the semantics of this:
- try to aquire the mutex
- test if you have it
- if you do, carry on
- if not, keep trying until you do
Semaphores
Kernel Preemption
- the kernel can even preempt itself
Read-Copy-Update
- treats readers and writers the same way intitially
- you enter a critical section where you will manipulate data
- when writers come in, they read and copy the data, operate, then update it when they're done.
The "big kernel lock"
in the beginning of the early days of the kernel there was one mutex
- allows people to be lazy about determining what their critical section is
Conclusion
- The kernel has to do synchronization
- it handles it on its own based on what the machine offers.
November 13th, 2013
Course Scheduling
The home stretch begins on Friday and should be useful for homework 3
Signals
- OS support for generating & delivering signals
- case study of 'ptrace'
Defined
- A simple, limited API for sending messages
- basically a number delivered by OS between processes
- source of signal
- the user
- another process
- the kernel
- relatively easy to know the semantics because it's a small, simple API
- because the semantics are so well understood and standardized, you can implement a reasonable response to most signals.
'kill' process Sends A Message
- it doesn't really kill a process, it just sends the message for a process to kill itself
Locasto ssh's into a lab machine...
- several of his grad students are on the machine
- after examining who's running stuff ('w'), the process list ('ps aux'), he chooses one process owned by another user to kill
- he fires up 'man kill' then 'man 7 signal' to determine that he will have to use 'kill -9 <pid>' to kill process with pid <pid>
- but the "Operation is not permitted" when you attempt to send the kill message to a process owned by another user
- Side Note: When your ssh client has hung, sending the SIGHUP will instruct it to 'Hangup' and exit in a graceful way.
- Most processes don't have custom signal handlers for extra signals
- signals are convenient and well-defined, but only useful for sending information about exceptional conditions
How does the kernel handle the signal delivery task?
- Figure 11-2 from (???) shows the control flow of catching a signal.
- a signal handler is just a function
- once it has finished executing,
USERLAND KERNEL SPACE ======== ============ start here P1 | P1 executes | | at X it jumps to the kernel | X | | | | which returns to the signal handler | | | P1's Y | signal | handler _~~~~> X Y ( | ) | ( | ) | and then.... ? back to kernel, ( ? ~~~~~~~~~~~~~> ? ) then back to X - but should you continue? (_ kernel ~~~~~~~~~~~~~~~~~~~~~~~~~~~~
- can we handle exceptional conditions in a more graceful fashion?
- the safest, although unsatisfactory answer is to exit, then 'pick up the pieces later'
Ptrace: a case study
- it's a very versatile and useful system call
- it wraps a complex api
- it offers an interface for the process calling ptrace to communicate with another process indicated by pid
- ptrace is a bad system call because it does a lot of stuff
- it is a useful system call because it purposefully breaks the boundary between processes
- with ptrace you can access all parts of another process - including system calls, etc.
- if your process wants to ptrace Locasto's, and has the permission to do that, it asks the kernel to send signals to Locasto's process
- things like gdb are built on top of this
- You can read and write any address in another process' address space
- refer to the man page for the full details on how to do this
- Allows you to step between system calls. That is, the process runs until it gets to a system call, then stops. etc.
- Or even, you can singlestep a process line to line
- Q: isn't this a security risk - for malware
- A: the user ids need to match - though root can ptrace anything. And it's also very useful for security.
- At their core, debugging, malware, and security are the same task: monitoring and manipulating the innards of a running process.
sample code - something that uses ptrace
[view-source:http://tsg.cpsc.ucalgary.ca/teaching/ptrace/snyfer.c]
- Dr. Locasto will put this on the wiki also.
side note: we could write strace right now using ptrace.
November 15th, 2013
"Welcome to the last sprint. "
Announcements
- USRI on Dec 2
- Final Exam: Dec 18, 8-10 am
Persistent Storage - The Problem
Files. We've all seen files, likely made files. We interact with them daily.
What is behind them? How do we manipulate them?
We'll go through that today and end with directories (a special type of file)
We'll be traversing several layers of abstraction.
- from userland utilities,
- to the underlying system calls,
- and what do those system calls do?
Q: How does the kernel provide support for managing persistent data?
A: The "file" abstraction
~ In linux, generic wisdom states that everything is a file.
Activity: writing things on a piece of paper
- everyone writes a sentence on a piece of paper and passes it to the center row
- Things noticed about these papers:
- Dr. Locasto asked for someone to say what their sentence was. We got "Monkey Time"
- he instructs the paper holding guys (two of them) to find the paper with "Monkey Time" on it.
- Finding the papers is hard
- the issues are the same as for file management: (full list on course notes ppt)
- how do you find your piece of paper? - needs a name
- how do you retrieve the 3rd word of your sentence. The 75th?
- For "Monkey Time", there is no 3rd or 75th word
- And a linear search is slow
- Filesystems do this
- how do I find duplicate sentences
- how do I handle multiple requests for the same paper?
- what if we had 1450, 14500, 1.5 million papers?
- what if we were not co-located?
- who can read or write?
Another angle: Answering why do we need files?
- virtual memory isn't big enough
- useful data needs to persist beyond the power cycle (ie reboot)
- THE KEY THING for us to understand today is what the problem of persistent storage is.
- → how do you find files, assert ownership control, dynamically grow and shrink the file, and do this in an efficient way
Persistent Storage - Some Definitions
- What is a file?
- What is in a file?
Class brainstorm guided by Locasto:
What is in a file?
- relationship between contents
- a "container"
What is a file?
- bytes/data
- metadata
- header and 'list of allocated blocks'
... well these are not the same answer. What is the real answer?
Pretend that the paper stacks from the monkey time activity are disks, how do you point to a single file? You need to know the physical location.
Back to brainstorm:
- Address/identifier
- physical location
- bytes of content
We need to handle collections of bytes in a collection of blocks (think: pages)
So to "point to a file" you would have to point to a lot of locations: picture using both hands, every finger pointing somewhere else.
How do we manipulate those bytes?
- we don't want to see all files as a sequence of raw bytes.
- we are familiar with a wide variety of file types.
- the windows solution is to arbitrarily assign a three letter extension to differentiate between file types
- in the unix world one way is not to mess with the file name, but to do something to the file itself.
- the 'file' utility determines the type of a file.
Suppose we have a file named A. It's 4 mb. What type of file is it?
- It would be quite difficult to determine it without knowing - it's a matter of language recognition.
- remember looking at ELF files - the characters 'ELF' are written at the start
- This task is hard or at times impossible depending on the file type
File Monipulation
- the course notes has a long list of useful files for manipulating files from the command line.
- 'chattr' and 'lsattr' are used for changin file attributes
- most files do not use this set of attributes
- but useful trick: you can chattr +i a file to make it immutable, even by root. This is good if you have a lot of management scripts running that are liable to delete files you want.
Looking at output of 'ls -l' unix file systems have a file owner and a group (which is supposed to be a collection of users)
the protection bits apply to user, group, other. Full read, write, execute for user, group, and other is: rwxrwxrwx
- the right three are for world, middle three for group, left three for user
- user and group are listed to the right of the permission bits in ls -l
u g o rwx rwx rwx
The 'd' at the left is to mark if it is a directory (d) or not (-)
Back to OS The operating system is responsible for enforcing these protections.
November 18th, 2013
Agenda
- recap
- file ssytems
- motivation: "file holes"
- the file "address space"
- indexed allocation
File Systems Continued
File systems and how they support the illusion of contiguous files.
What we may not realize as userland programmers is that there are many layers within the kernel required to deal with file system stuff.
Key Question: How can the kernel efficiently index and locate file content?
Many of the same issues that come up in memory management are present with persistent storage.
Put yourself in the shoes of an operating system developer. You're thinking about developing your own operating system. What you want to do is keep consistent with some of the linux patterns. You are therefore locked in to the linux file interface.
Dr. Locasto collected some data about one of his workstation computers, determined that the majority of files are a few kilobytes in size, and there are a fewer larger files.
Two questions arise:
- Where is the content for these files?
- How can you keep track of the space needed for growing these files?
Demo: lseek past the end of a file
The program writes some text to a file, seeks 1GB into the file, then writes a little more.
ll -h
- shows that the file created is in fact 1GB
df -h
- shows that essentially the same amount of space is taken up on hard drive as was before the 1 GB file was created.
This implies that there is some sort of mapping being done.
How do I do this mapping?
Each answer to this is a different file system. If you list the contents of fs in the linux kernel, you will see a long list of filesystems.
Clarifying a concept:
Directory Structure
- we have a notion of files as being stored in a directory tree.
- in linux, the root is denoted by '/'
'/' ---------------^-------------- | | | | | | usr bin etc var dev home (and so on)
You probably have multiple file systems attached to your directory tree.
- and you can attach any file system to any point in your directory tree.
- the integration is seamless - what filesystem is being used is meaningless to you as a userland programer or user.
- use 'mount' command to show what file systems are currently mounted.
- read the man page if you want to
What is a file system?
Note: File System, NOT FileSystem.
Has the same responsibilities of a memory manager.
Question: where is the metadata stored?
- Note: even directories are files.
We have a data structure called inode that remembers important things for us.
- unique identifier
- pointer to contents of file
- operations depending on filesystem
Hard Link and Soft link and inodes
using commandln TARGET LINK_NAME
soft link refers to a new inode linking to the same content
hard link refers to the same inode linking to (obviously) the same content
Making Access Fast
The secret is multi-level index schemes with triple indirect addressing.
This is informed by our understanding of common file sizes
November 20th, 2013
Virtual File System - Conceptual
imagining what the 'cat' utility would do...
cat "/home/eye/file.txt"
fopen( )
open("path", ... , ...) | |
API: | [open] [read] [write] |
API: | VFS |
→EXT2 | |
→EXT4 | |
→generic I/O | |
→hardware |
- the VFS is an internal API
- it's the one that deals with "what file system is this stored in"
VFS from kernel code
Now Dr. Locasto will attempt to show us that the above whiteboard diagram is in fact what actually happens. The goal is to leave today with some confidance in reading this code.
An inode is:
- a generic concept,
- a reference to the actual data
- it is a central data structure that allows an os to reference a particular file
- but it's not the only data structure
- the dirty secret is that there are several versions of inode
- there is an instance of inode created in kernel memory when a file is open
- but if this was the extent of the inode life, that wouldn't be useful
- so there are specific versions of inode that exist on disk - physical bytes on disk
- EXT4 inode is around line 435 of fs/ext4/ext4.h
The 'open' system call
fs/open.c line: 1053
don't get confused by the syscall_define macro.
Then we go to line 1031 for 'do_sys_open'
- the filename is turned into a file descriptor
- we do_filp_open
- for loading the necessary metadata - mapping between the struct file and the struct d
- if do_filp_open succeeded, we do 'fd_install'
Then we go to line 1018 for 'fd_install'
- adds the metadata to the list of open files at a certain index fd
- now the file metadata can be accessed from the integer fd
Then we go to line 1669 for 'do_filp_open'
- this is where the work for opening is done
- we're not going to trace it, it's complex
- the point: the reason that this is so much more work than the rest of the functions that we looked at is that this is where the 'rubber hits the road' and we're interacting with the hardware.
From 'open' to 'read' to 'close'
The 'read' system call
fs/read_write.c line 372
Again, not much work.
- vfs_read - we're delegating to the virtual file system
To vfs_read on line 277
- if file->f_op->read is not null....
- which means, if the function pointer for 'read' in the f_op function pointer table contained in the file struct is actually there...
- then do it
- otherwise, do the kernels default read: do_sync_read
To 'do_sync_read' on line 252
- we go to the same file->f_op function pointer table, but look up aio_read (asynchronous input output)
- this goes to ram
Back up to 'vfs_read'
- read is a function pointer to a file system dependant read function
- to find it, we'll choose any of the file systems (ext2) and look at its read function
To fs/ext2/file.c line 45
- file operations table that we see before
- it turns out that 'read' calls do_sync_read, which we know calls aio_read
- this is because ext* file systems are the native file systems for linux
- for other file systems, some other read function would be provided
November 22nd, 2013
Learn about particular features of file system: extended attributes and ACLs from multiple perspectives
Background
- We're familiar with ownership and permission bits
- we use the following utilities for that:
- chmod(1)
- charrt(1)
- stat(1)
- shred(1)
What is an access control list? (ACL)
Example:
/etc/x | /var/r | /home/mike/hello.txt | /l/a/cloud/h453 | |
---|---|---|---|---|
Mike | r | rw | rwx | |
Alice | rw | r | r | rwx |
Bob | x | r | r | |
xdman | x | w | wx |
We can read this from two perspectives:
- row wise: what capabilities do each user have?
- column wise: what access is granted for each resource
Typically, the access control list is stored along with the resource (file)
Question:
- is this approach better, equivalent, worse than linux permission bits?
- What about user groups?
- what about all other users?
In fact, this is a more general mechanism than the linux permission bits that we're used to.
Demo - permissions on virtual machine
$ ssh abird@csl.cpsc.ucalgary.ca The authenticity of host 'csl.cpsc.ucalgary.ca (136.159.5.22)' can't be established. RSA key fingerprint is cc:36:a4:22:58:af:e1:38:56:b6:e6:5c:ec:72:b7:ca. Are you sure you want to continue connecting (yes/no)? yes Warning: Permanently added 'csl.cpsc.ucalgary.ca,136.159.5.22' (RSA) to the list of known hosts. abird@csl.cpsc.ucalgary.ca's password: Welcome to the University of Calgary Department of Computer Science This system is for use by authorized users only. DEPARTMENT WEBPAGE: http://cpsc.ucalgary.ca/ NEWS & ANNOUNCEMENTS: http://cpsc.ucalgary.ca/news/ TECH SUPPORT: http://cpsc.ucalgary.ca/tech_support/ email: help@cpsc.ucalgary.ca help desk: MS 151 Please send bugs reports and package requests to help@cpsc.ucalgary.ca [abird@csl ~]$ cd /tmp [abird@csl tmp]$ cat locasto.password.txt cat: locasto.password.txt: Permission denied [abird@csl tmp]$ ls -l locasto.password.txt -rw------- 1 locasto profs 8 Nov 22 12:12 locasto.password.txt [abird@csl tmp]$ # Then dr. Locasto changed the permissions: [abird@csl tmp]$ ls -l locasto.password.txt -rw----r-- 1 locasto profs 8 Nov 22 12:12 locasto.password.txt [abird@csl tmp]$ cat locasto.password.txt secret7 [abird@csl tmp]$ groups jr s301 c457 c471 [abird@csl tmp]$ # The groups I am in. [abird@csl tmp]$ # If I wanted to restrict permissions to just one group of people, [abird@csl tmp]$ # for working on a group project or something... [abird@csl tmp]$ groupadd birdsgroup -bash: /usr/sbin/groupadd: Permission denied [abird@csl tmp]$ # that doesn't work - but apparently using the access control list, [abird@csl tmp]$ # we can give access only to a few users [abird@csl tmp]$ # even without having them in the group [abird@csl tmp]$ cd michael/ -bash: cd: michael/: Permission denied [abird@csl tmp]$ getfacl michael/ # file: michael/ # owner: locasto # group: profs user::rwx group::--- other::--- [abird@csl tmp]$ cd michael/ -bash: cd: michael/: Permission denied [abird@csl tmp]$ # Dr. Locasto gave read permission to world for his directory: [abird@csl tmp]$ cd michael/ [abird@csl michael]$ ls ls: cannot open directory .: Permission denied [abird@csl michael]$ # I cannot list anything because I don't have read [abird@csl michael]$ # permissions for any files in this directory
- Dr. Locasto showed that he could grant file access to specific users
- one student was given access, and once he could read the directory, and was added using setfacl, he could change the locasto.password.txt file.
Looking at ATTR things
- must give it a valid namespace
- must have permission to modify that namespace
The setfattr tool allows you to modify the metadata for any file - you can change the default application to open it, send secret messages, etc.
- however, your file system needs to have support for this
- it has to implement setfattr and getfattr system calls
Example - if you copy the file to another filesystem, the attributes are not necessarily copied
- This is a difficulty that comes with transferring data between filesystems, or to different operating systems
- because it's not ONLY moving it, it's converting it to a new format.
In fact, even if you're copying from one place into another within the same filesystem, those file attributes are not preserved.
- but there is an option for 'cp' that preserves the file attributes
Note - the timestamps are stored in inode. But as with other attributes, when you copy a file, you're really making a new file with new timestamps.
November 25th, 2013
How Would We Design A Minimally Viable File System
Acknowledgement: 'viable' is only loosely defined.
Assume you have a well behaved storage mechanism which provides us with some number of available storage units.
It's our job to write a filesystem on it.
What are we supporting?
- open
- read
- write
- seek
-
close
Other Requirements
- delete
- expand
File Access Table: FAT
- the file system cannot simply ram data onto the physical medium: you need metadata also.
- one challenge: you have a file size limit because of the maximum address size
- it also limits the number of files
FAT32 0 +------------------+ 0 +----------------+ | | | | | | | | +------------------+ | | | | | | 'one' | next 'block' | | | +------------------+ | | | | | | | | | | | | | | | | | | | | | | +------------------+ | | | EOF | M +----------------+ +------------------+ max | | size | | | | | | | | | | | | K +------------------+ only store so much in RAM
EXT2
Envisions a file as a sequence of blocks
File +-------------------+ | B1 | +-------------------+ | B2 | +-------------------+ | B3 | +-------------------+ | | | | | | +-------------------+ | Bn | +-------------------+ | | | | +-------------------+
swap /boot block 1 ... block n +-----+----+-----+-----+--------+---------+---------+ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | +-----+----+-----+-----+--------+---------+---------+ 0 / \ M / \ / \ / \ / \ / \ / \ +--+--+--+--+--+--+---+ | | | | | | | | | | | | | | | | ← metadata including | | | | | | | | references to inodes +--+--+--+--+--+--+---+
Journaling Filesystem
- keeping a track of the operations that they system is about to do
- in a buffer
- In some filesystems, both metadata and data. In others, just the metadata
November 27th, 2013
Design Challenge: operating system that bridges the gap between user level programs and hardware
What is /dev/null
- we treat it like a file.
- it looks like a file
- we operate on it like a file
- it is a bit of code that mimics a file
- it does not act like a file
Devices are nothing more than files
- name appears in file system
- responds to read(2) and write(2)
the landscape of real devices
+------------------------------------+ | kernel| | | | | | | | | | | | | | | | +--------+ | +---+ | |device | | |CPU|→ +--+driver +------------------------+ +---+ | | ↑ | | | +-+------+ \ | ↑ \ ↓ | \ +--------+---------------------------+ \| device controller | +------------------------------------+ | | | | | | | | | device | +------------------------------------+
What is the CPU's role?
- it handles interrupts from the device controller
- this translates to a piece of kernel code responsible for responding to whatever action caused the interrupt
- eg copy data from a buffer
Types
A thing in this sense is not what it is named but rather what is done to it.
- block device - data is accessed a block at a time; random access
- character device - a stream of characters (or data values)
The important question is how the device imports and exports the data
Major and Minor numbers
cat /proc/self/maps
the column with ##:## is the major and minor number of the device in the format MJ:MN
to the kernel code! Device drivers
discussion of device driver writing.
Interesting point: the need to write device-specific sequences of bits to the device driver to communicate with the device.
December 4th, 2013
How does the Kernel support Networking
(from http://www.netfilter.org/documentation/HOWTO/netfilter-hacking-HOWTO-3.html)
A Packet Traversing the Netfilter System: --->[1]--->[ROUTE]--->[3]--->[4]---> | ^ | | | [ROUTE] v | [2] [5] | ^ | | v | inbound outgoing
Return to Diagram from Last Week
virtual machine ********** ** *** ** ** ** * ** 10.0.15.2 * * * * ** ** ** ** | *** *****|**** | eth0 ↑ __/ \___ / \ / \ | | ↓
Every modern unix box can do all of this
- that is, the general architecture for receiving and intercepting packets
PORT: a destination within a machine. Determines which program the kernel will send the packet to.
- applications bind to certain ports.
- all of this is convention only - so you don't have to follow the convention if you don't want to, as long as both ends of the communication (client and server) agree.
The beautiful thing about networks is that you can just read what is there.
Catching some packets
- Dr. Locasto ran the wireshark program, which is a GUI for tcp
- he starts poking through some packets
- back on his mac he uses tcpdump to take a look at network traffic
- point made: isn't this illegal?
- he used
tcpdump -i en0 -v -X -A
- what it does is instead of dropping all packets not destined for your own ip address, it takes a look at all of them.
One thing to note
- initiating a network connection and communicating with another machine takes several protocols
- diagnosing network problems is rarely a one-problem problem.
Debugging network issues
- using a packet inspector and strace you can poke in and look at what's happening and what not happening.