Navigation

Week 1	Week 2	Week 3	Week 4	Week 5	Week 6	Week 7	Week 8	Week 9	Week 10	Week 11	Week 12	Week 13
Sep 13	Sep 16	Sep 23	Sep 30	Oct 7		Oct 21	Oct 28	Nov 4		Nov 18	Nov 25	Dec 2
Links	Sep 18	Sep 25	Oct 2	Oct 9	Oct 16	Oct 23	Oct 30	Nov 6 (TBA)	Nov 13	Nov 20	Nov 27	Dec 4
	Sep 20	Sep 27	Oct 4	Oct 11	Oct 18	Oct 25	Nov 1	Nov 8	Nov 15	Nov 22	Nov 29	Dec 6
	Tut 1	Tut 3	Tut 5	Tut 7	Tut 9	Tut 11	Tut 13		Tut 14
	Tut 2	Tut 4	Tut 6	Tut 8	Tut 10	Tut 12

Links

Idea based on Alexander Bird's Other Information page

Directory of Readings, LXR Guide, Code Examples and Slides for Class
Time Period	Extra Readings	LXR Guide & Code Examples	Textbook Readings	Slides	Wiki Page
Week 1	"Strange Birth and Long Life of UNIX" Skim "How To Ask Smart Questions"		MOS: 1.1: What is an Operating System? MOS: 1.2: History of Operating Systems MOS: 1.4: The Operating System Zoo MOS: 1.5: Operating System Concepts MOS: 10.1: History of Unix and Linux MOS: 13.1: The Nature of the Design Problem (optional)	Sep 9: Course Overview Sep 13: What Problem Do Operating Systems Solve?	Sep 9: Course Overview Sep 13: What Problem Do OS Solve?
Week 2	A Tiny Guide to Programming in 32-bit x86 Assembly Language A Whirlwind Tour on Creating Really Teensy ELF Executables for Linux Handling Interrupt Descriptor Table for fun and profit - for reference The Intel IA32 Developer Manuals, Volume 1 - for background 2.1: Brief History of Intel 64 and IA-32 Architecture (skim this) 2.2: More on Specific Advances (paying particular attention to the description of the pipelined microarchitecture) 3: Basic Execution Environment (paying particular attention to 3.1 (and Figure 3-1), 3.2, 3.3, 3.4) The Intel IA32 Developer Manuals, Volume 3A 2: System Architecture Overview (paying special attention to Figure 2-1, 2.2, Figure 2-3, and 9.1)	Linux source code - for reference	MOS: 1.3: Computer Hardware Review MOS: 1.6: System Calls MOS: 1.7: Operating System Structure LDK: 1: Introduction to the Linux Kernel	Sep 18: What is an Operating System? Sep 20: System Architecture Sept 20: System Architecture	Sep 16: From Program to Processes Sep 18: What is an Operating System? Sep 20: System Architecture
Week 3	Locasto's Blog Post on System Calls - for more detail Handling Interrupt Descriptor Table for fun and profit - for reference	LXR - System Call Table - kernel code information LXR - x86_64 syscall table data structure declaration - kernel code information LXR - System Call Numbers LXR - PCB task_struct - kernel code process information LXR - do_fork - kernel code process information LXR - copy_process - kernel code process information LXR - load_elf_process - kernel code process information LXR - mm_struct declared inside task_struct - kernel code information LXR - type definition of mm_struct - kernel code information LXR - type definition of vm_area_struct - kernel code information LXR - mprotect system call entry point - example of using these vm area structure fields and properties (particularly the flags) LXR - allocation routine for the page table directory - kernel code information	MOS: 1.6: System Calls (you should have read this already) LDK: 5: System Calls (this describes how to add a system call to your OS, something you'll have an opportunity to practice in your tutorials later and in HW3) LDK: 3: Process Management MOS: 2.1: Processes (this should be partly review of what we've talked about in class) MOS:10.2.5: Kernel structure MOS: 10.3.1: Fundamental Concepts (about processes in Linux, should reinforce some class discussion) MOS: 3.1: No Memory Abstraction MOS: 3.2: A Memory Abstraction: Address Spaces	Sep 23: System Calls Sep 25: Process Creation and the Process Control Block Sep 27: The Process Address Space (Memory Regions)	Sep 23: System Calls Sep 25: Process Creation and the Process Control Block Sep 27: The Process Address Space (Memory Regions)
Week 4	Paging - paging information	Code - one.c - example code program for address translation Code - Kernel Ooops - segment registers information Code - greedysbrk.c - example code program that calls sbrk() to expand the heap memory region and write to it via memset() to exhaust available memory Sample Run of Code - example code program run Sample Run of Activity 1 - example code program run x86 Page Table Flags - page table information LXR - declaration of mem_map array - virtual memory information for iterating over page frame meta-data for the purposes of allocation management LXR - struct page - virtual memory information representing the items in the mem_map array (i.e., meta-data about the physical page frames) Code - oom_kill.c - code	MOS: 3.2.3: Managing Free Memory LDK: 15, pages 320-322: The Process Address Space MOS: 3.3: Virtual Memory (this subsection also contains a review of paging and TLBs) MOS: 3.7.3: Segmentation with Paging: The Intel Pentium	Sep 30: Memory Addressing in the Process Address Space Oct 4: Virtual Memory	Sep 30: Memory Addressing in the Process Address Space Oct 4: Virtual Memory
Week 5		Code - llist.c - example code program Output - llist.c - example code output LXR - sys_brk entry point - brk(2) system call LXR - do_brk routine - brk(2) system call	MOS: 3.4: Page Replacement Algorithms MOS: 3.5: Design Issues for Paging Systems MOS: 3.6: Implementation Issues	Oct 7: Page Replacement	Oct 7: Page Replacement Oct 9: Recap Oct 11: User--level Memory Management
Week 6	Kernel Memory Exploitation - excellent article with an overview of the three allocators Reading /proc/slabinfo - related topics from article Additional Allocators in Kernel - related topics from article SLEB - related topics from article SLQB - related topics from article Smashing Linux Heap - kernel exploitation information A Heap of Trouble - kernel exploitation information	Code - grab.c - example code program for demonstrating compiler-declared size of userland structures and how malloc operates by exploring the data "around" a returned pointer Details for grab.c - more details for grab.c Code - break.c - example code program that invokes mmap to try to allocate a single byte Output - break.c - example code program output mm_struct - brk(2) system call brk(2) entry point - brk(2) system call do_brk() helper function - brk(2) system call helper routine to find vma intersection - brk(2) system call find_vma routine - brk(2) system call LXR - struct page - kernel memory management information for holding metadata about physical page frames (and metadata used by the SLUB allocator)		Oct 18: Kernel Memory Management (cont.)	Oct 16/18: Intro to Kernel Memory Management
Week 7	It Can Be Done - Multics anecdote, writing correct code with pencil An Ode to Real Mode Setup Code from "Operating Systems Lecture Notes, Copyright 1997 Martin C. Rinard."	LXR - boot - bootloader LXR - Version 2.6.32 and 2.6.27.41 for boot details LXR - current.h - code LXR - value of current from the CPU state - code LXR - sched_fair.c - code for CFS LXR - sched.c - code for 2.6 O(1) scheduler (multi-level feedback queue); relevant lines are 4453-4455 LXR - sched.c - code for 2.4 scheduler (for 2.4.24) O(n) snippet	MOS: 10.3.5: Booting Linux MOS: 2.4.1: Introduction to Scheduling (plus 2 intro paragraphs in Section 2.4) MOS: 2.4.5: Policy vs. Mechanism MOS: 10.3.4: Scheduling in Linux or LDK: 4: Process Scheduling MOS: 2.7: Summary (this talks about some things we'll consider next)	Oct 21: System Startup Oct 25: Process Scheduling	Oct 21: System Startup Oct 23: HW1 Review and HW2 Hints Oct 25: Process Scheduling
Week 8	Wiki notes on concurrency	LXR - 2.4, 2.6 scheduler and timing and scheduling Code - tie.c	MOS: 5.5: Clocks LDK: 9: An Introduction to Kernel Synchronization MOS: 2.2: Threads MOS: 2.3: Interprocess Communication MOS: 2.5: Classical IPC Problems Beware typos in Figure 2-23, pg 122 (semicolon placement) MOS: 6.2: Introduction to Deadlocks (NB: MOS 6.2.1 should be memorized!) MOS: 6.4: Deadlock Detection and Recovery MOS: 6.5: Deadlock Avoidance MOS: 6.6: Deadlock Prevention Livelock: MOS: 6.7.3	Oct 30: Concurrency Concepts: Communication and Synchronization Nov 1: Deadlock	Oct 28: Clocks, Timing, Interrupts, and Scheduling Oct 30: Concurrency Concepts: Communication and Synchronization Nov 1: Deadlock
Week 9	ULK: 5: Kernel Synchronization	Code - srace.c Code - urace.c Code - pc.c	LKD: 10: Kernel Synchronization Methods	Nov 4: Kernel Synchronization Primitives	Nov 4: Kernel Synchronization Primitives Nov 8: Threads (User--level Concurrency)
Week 10	C notes on shared memory		MOS: 4.1: Files MOS: 4.2: Directories	Nov 13: Signals Nov 15: File I/O	Nov 13: Signals Nov 15: File I/O
Week 11	ULK: 1: SS: An Overview of the Unix Filesystem (pp 12---19)		MOS: 4.3: File System Implementation MOS: 4.5: Example File Systems MOS: 10.6: The Linux File System LKD: 13: The Virtual File System	Nov 18: File Systems Overview	Nov 18: File Systems Overview Nov 20: The Linux VFS Nov 22: Extended Attributes in Ext4
Week 12	XFS file system man 5 fs man 5 fstab man 8 mount / man 2 mount Snooping on USB		MOS: 5.1: Principles of I/O Hardware MOS: 5.2: Principles of I/O Software MOS 5.3: I/O Software Layers MOS: 5.4.3: Disk Arm Scheduling Algorithm MOS: 5.4: Disks LKD: 14: The Block I/O Layer, especially subsection "I/O Schedulers"	Nov 27: Devices Nov 29: Disk and I/O Scheduling - Adapted from Dr. Carey Williamson's slides	Nov 25: Example File Systems Nov 27: Devices Nov 29: Disk and I/O Scheduling
Week 13					Dec 2: User--level Networking Dec 4: Kernel Network Support Dec 6: Final Exam Review

Back To Navigation

Introduction

Hello, my name is Carrie Mah and I am currently in my 3rd year of Computer Science with a concentration in Human Computer Interaction. I am also an Executive Officer for the Computer Science Undergraduate Society. If you have any questions (whether it be CPSC-related, events around the city, or an eclectic of random things), please do not hesitate to contact me.

I hope you find my notes useful, and if there are any errors please correct me by clicking "Edit" on the top of the page and making the appropriate changes. You should also "Watch this page" for the CPSC 457 class to check when Professor Locasto edits anything in the Wiki page. Just click "Edit" and scroll down to check mark "Watch this page."

You are welcome to use my notes, just credit me when possible. If you have any suggestions or major concerns, please contact me at cmah[at]ucalgary[dot]ca. Thanks in advance for reading!

Week 1

Introductions were set and we learned a bit about operating systems. Take note of the 3 Fundamental Systems Principles we talked about, in addition to the three problems we outlined.

We also learned some new commands for the shell.

September 13: What Problem Do Operating Systems Solve?

Why Operating Systems?

Abstract OS concepts relative to a mature, real-world and complex piece of software
Translates concepts to hardware with real limitations

Fundamental Systems Principles

OS is a term about the software that operates some interesting piece of hardware

Measurement: its state, how fast it's doing it, how correctly it's doing it and ability to observe
Concurrency: how to make it do what you want it
Resource management: memory, CPU, persistent data -- make it coherent and separate

Main Problem

1. Manage time and space multiplexed resources

Multiplex: share expensive resources

Ex. Network (both, more time)
Ex. CPU (time, a bit of space because of cache and state)
Ex. Hard drive (space because of data storage)

2. Mediate hardware access for application programs

Higher level code does not need to worry about hardware level, so the Operating System 'distracts' us

3. Loads programs and manages execution

Turn programs from dead code to running it

What's a Program?

Source code: in any language, the algorithm in the source code
Assembly code: lower level code
Machine/binary code: code the machine understands directly
A process: program in execution

Example: `addi.c`

int main (int argc, char* argv[]) 
{
  int i =0;
  i = i++ + ++i;
  return i;
}

int argc: Number of arguments
char* argv[]: Contains a pointer of arguments
Depending on the platform and compiler you are on, i = 0 -> 4
An undefined behavior occurs: failure in abstraction because the compiler is making the decision

Commands

General Commands Presented in Class
Command	Purpose	Notes
`<text_editor> <file_name> &`	Opens text editor in background	The `&` runs programs in the background
`touch <file_name>.c`	Creates a text file
`emacs -nw <file_name>.c`	Creates a text file using emacs
`./file_name`	Runs file in the current directory	`<file_name>` alone is the machine code and usually ELF code
`file <file_name>`	Asks the terminal what the file type is

Notes

Commands to Know Based on Lecture Notes
Command	Purpose	Notes
`gcc -Wall -o <file_name> <file_name>.c`	Compiles and display any error	`Wall` displays any errors `-o` changes output file from `<file_name>.c` to `<file_name>`
`cat <file_name>`	Concatenates a file and displays to the terminal
`hexdump -c`	Displays information of an ELF file in three columns to the terminal	The columns are: offset, byte, then the ASCII translation
`objdump -d`	Displays the disassembly of code in three columns to the terminal	The columns are: virtual address, byte code, disassembled code in assembly

Back To Navigation

Week 2

A lot of new commands were introduced. You should understand what the kernel is, what's in it and what isn't. Understand the registers from x86 architecture in addition to the segment registers. Also know about the GDT and IDT, interrupts, and the DPL.

In tutorial, you should have set up your VM and be comfortable with using SVN.

September 16: From Programs to Processes

Introduction to the Command Line

Pipes are used when the user wants the output from one command to become the input of another command
Some commands are programs, while others are functionality built into the shell
Bash Builtin (commands compiled into the bash executable) commands displays a section of the manual and require you to search

Ex. Write, read

A way to exit loops is Ctrl+C, however it is not the real answer

It is basically a hardware signal from combining keyboard strokes. Because of this, the Ctrl+C sequence has data and tells the hardware that you should kill the current process because of this Ctrl+C data

Commands

Commands for `history`
Command	Purpose	Notes
`history`	Displays history of all commands typed
`history \| more`	Splits the display of `history` in screen-sized to the terminal	Hit `enter` to continue to next screen Hit `Q`/`q` to quit the process `history` produces frequency distribution of commands, a good way for testing purposes
`history \| gawk '{print $2}' \| wc`	Specifies the history display by printing the second argument plus word count	`gawk` parses the input into records and displays three columns: the number, word, and parameter `print $2` prints the second argument (word) `wc` is word count which displays three columns: lines, words and bytes
`history \| gawk '{print $2}' \| sort`	Displays the sorted history of commands to the terminal	Allows organization of duplication
`history \| gawk '{print $2}' \| sort \| uniq -c`	Displays the sorted history of commands and omits repeated lines to the terminal	`c` prints out the number for each entry
`history \| gawk '{print $2}' \| sort \| uniq -c \| sort -nr`	Displays a reverse numerical sort of history and omits repeated lines to the terminal	`n` sorts in numerical order `r` sorts in reverse order

General Commands Presented in Class
Command	Purpose	Notes
`ls`	Displays all contents in the current directory
`cd <directory_name>`	Changes to the directory explicitly written	`cd ..` moves to the parent of the current directory
`clear`	Adds multiple lines to the terminal screen to make it appear like a fresh new terminal screen
`make <file_name>`	A make file controls the process of building a program	You do not have to use `bash` and press up repeatedly
`exit`	Built-in shell command that terminates a process
`su`	Switches the user and makes a new sub-shell	Without arguments, the command automatically switches to `root`
`stat`	Gets the statuses of files and provides more information
`df`	Displays information of persistent storage (disk usage)
`ifconfig`	Displays information about the state of network adapters
`diff <file_name1> <file_name2>`	Produces a patch, or displays the difference between two files	Displays the difference from the previous source to the new source
`yes`	Outputs a string repeatedly until killed	Useful for test cases, especially for loops

Notes

Commands to Know Based on Lecture Notes
Command	Purpose	Notes
`man command`	Displays the UNIX Programmer's Manual section for the given command
`ps`	Displays the processes that are currently running	The number is unpredictable and depends on the role of the machine
`pstree`	Displays the relationship between all the software running on the machine	The descendant from a single process is called `init`
`top`	Displays a list of what programs are running and what resources are being used	In class, the Windows environment and the terminal were running
`grep yes`	Provides information like the process ID	Obtaining this information allows you to control the process
`kill -<signal_number> <process_id>`	Kill sends a signal to a process	Signal numbers are basic events that the OS can generate on behalf of hardware, and deliver it to the machine Use for general purpose emergency for programs List of UNIX Signal traps
`strace ./<file_name>`	Records what the program is asking the Operating System to do
`objdump -d`	Displays the disassembly of code in three columns to the terminal	The columns are: virtual address, byte code, disassembled code in assembly
`readelf -t <file_name>`	Displays the major portions of the file	ELF is a container of the different sections (contract between compiler and the Operating System)

Example: `addi.c`

int main (int argc, char* argv[]) 
{
  int i =0;
  i = i++ + ++i;
  return i;
}

Intel style: Data gets stored from the right hand side to the left
AT&T style: Data gets stored from the left hand side to the right

When this example is in assembly code, the AT&T syntax is used
If unsure whether a program is Intel or AT&T style, look at the next lines
In class, we saw that the destination was at the right hand side--thus data being stored into the right side

Example: `hello.c`

#include <stdio.h>
int main (int argc, char* argv[])
{
  printf("Hello, 457!\n");
  return 0;
}

makefile

all: hello
hello: hello.c
       gcc -Wall -o h hello.c
clean: ...

Typing make in the command line does the work (what you coded) for you

More Commands

gcc -S hello.c
more hello.s

printf: formats trace output conversion
puts: output of characters and string
Compiler changes the behavior (as we wrote printf, but the hello.s file has puts)

readelf -t h

The .text section of the ELF file is where the program code actually goes

objdump -d h

In main, a call to <HEX><puts@plt> happens

puts@plt: procedure linking table

The compiler doesn't know the puts command; but it is linked to the C library
It jumps to the table of location that you can use in the program (common way of calling libraries)

strace ./h

Check to see if the right system call has been made (It has, which is write())

Back To Navigation

September 18: What is an Operating System?

Question Protocol

Get Michael's attention - politely yell something like "question!"

Think of it like: The lecture is the CPU and we are the interrupts

Michael likes conversation more than a one-sided diatribe

Readings

Readings can be read before or after lectures
Tiny Guide to Programming in 32-bit x86 Assembly Language

Not a computer architecture class (recall CPSC 325/359: Computing Machinery II), but it is an API (application programming interface) for the Operating System
These are the control bits and pieces of the hardware that allow the Operating System to do things

A Whirlwind Tutorial on Creating Really Teensy ELF Executables for Linux

Illustration of all the layers right down to the Operating System, and a nice tour of the loading mechanism (how a program becomes a process)
Good introduction of how system calls are made
Very amusing. Try to code along with it

MOS: 1.3 - Computer Hardware Review, 1.6 - System Calls, 1.7 - Operating System Structure

Finish Chapter 1 to reinforce lecture information

Recap: September 16 lecture

Command line agility and transformation of src code to an ELF file

Looked at command line introduction to look at how the source code is transformed into a binary

Learn a bit about a few topics (breadth first)

Example of Cross-Layer Learning, which is a gradual progress on several inter-related topics at once

Reasons

a) Gives overview of some commands from the command line

b) Gives an idea of the rich variety of commands (from ps to sftp) of what the OS supports

c) Demonstrates the use of the UNIX manual (man(1) command)

d) Uses it to drive a discussion of processes (ps, pstree, top)

* Leads to what the OS does (which basically manages processes and allows multiple processes seemingly at the same time)

e) Observes the transformation of code to binary

f) Observes the coarse-level detail of an ELF file (i.e. sections)

Learning Objectives

Intuition for what an OS kernel is, and what its roles and defining characteristics are, by placing it in the context of managing the execution of processes on 'top' of some hardware

Manage several processes

Operating System Definition

Authors

MOS: Hard to pin down, other than saying it is the software running in kernel mode (the core internals)
LDK: Parts of the system responsible for basic use and administration. Including the kernel and device drivers, boot loader, command shell or other user interface, and basic file and system utilities. It is the stuff you need--not a web browser or music players.

a) A pile of source code (LXR)

b) A program running directly on the hardware

c) Resource manager (multiplex resources, hardware)

d) An execution target (abstraction layer, API)

e) Privilege supervisor (LDK page 4)

f) Userland/kernel split

System Diagram

Bigger image

The Operating System Kernel

What is the Kernel

Piece of software

Part of it is machine language specific to what particular CPU you are running on (Linux runs on many CPUs, so many parts of it are specialized)
It is a pile of C code and a pile of assembly code

Interface to hardware

The kernel has the subsystem "driver manager" which are self contained pieces of software that talk to a particular piece of hardware
Mediates access and service requests

Supervisor

Central location for managing important events
Ex. The Micro kernel is a different style than Linux (where it is a small piece of core code and uses various user level processes to get the work done)
A piece of privileged software that manages important devices

Abstraction

Exports some layer of abstraction to programs
System Call API: set of system calls (in one sense it's function calls to the kernel but it's actually not)
Set of system calls: exports a stable interface to the application programs

Ex. A generic operation like write, where it writes some number of bytes to a file
OS: important piece of functionality so it implements some number of system calls and will be part of the public interface

Services exported to processes

Protection

The isolation between processes
What if processes want to simultaneously run? They cannot have it all at the same time. So who gets ownership of the CPU?
The OS kernel decides! It has to somehow enforce isolation/protection between processes
Also regards the example: "should process1 manipulate/read/write the state of process2?"

Typically not...but sometimes yes

The OS can't do it alone. It must rely on the architectural layers (which will be talked about later)

What Does Not Belong in the Kernel

The shell isn't because it doesn't have to be

The shell is nothing more than a program, which becomes a user level process and gets equal time as everything else

What is in the Kernel

To decide what needs to be in the kernel, consider:

Complexity of the kernel

You can shove all services and major components (fast because of quick access and no separation between kernel components, plus can do anything since it's privileged) but is unpredictable, and difficult to understand and debug
Typically do not want to add code in the kernel -- restrain yourself

The mistakes in the code you write inside the kernel (despite how good you are)

These mistakes affect everyone else, especially the scheduler
Kernel must be able to handle failure conditions. It needs to be stable

Ex. Bugs in the drivers with vulnerabilities; can take down the machine due to errors in kernel

Note: No hard or fast rule of what components make up the kernel

Still need small pieces of assembly code that manages parts of the CPU
Make a fundamental design decision: Monolithic Kernel vs Micro Kernel

Monolithic Kernel: All major subsystems (driver management, etc.) goes into the same executable image. It shares code and data, and there is no isolation
Micro Kernel: Makes the division explicit (like processes between different users)

Example: From ELF to a Running Process

./h "Hello, 457!"

How does the pile of code (from objdump) become an executable/a running entity?

strace ./h

Tool for measuring the state of a process, particularly measuring the interaction between user level processes with the kernel via a set of system calls
Allows us to observe that interface in action
Sees all the system calls of the life of a process

Output of strace from h Example File

Each line is a system call record (an event)
Lines are represented as such

(1) Name of system call

(2) Arguments

(3) Return value/result

First thing the ./h does is call execve("./h", ["h"], []) = 0

execve()

(1) ./h: Execute program (takes contents of the ELF file and writes into a portion of memory) that takes the location of the ELF file to load in memory (2) ["h"]: Array of pointers that points to command arguments (3) []: Pointer to the environment (check using env on the command line)

The shell has forked (a system call) itself and created an exact duplicate of itself

In its exact duplicate, it's still called execve

fork(2)

Creates a new address space

execve(2) in the child and loads into the new address space

brk(0)

Manipulates the process address space -- manipulates the memory available to a program
System calls are sensitive information. Do not want another program to call execve and load a new image; so the sensitive set of system calls is a set of privileged functionality with special functionality
Maps in various libraries into process code

write(1, "hello, 457!\n", 12hello, 457!)
exit_group(0) = ?

To understand write(), type in man 2 write in the command line instead of man write
Arguments:

(1) Writes a file descriptor -fd

(2) Pointer to the message you want to output

(3) Number of bytes

The extra hello, 457! after the 12 is just the output repeated again

Implication: If this is the functionality the kernel is offering, then somewhere inside there has to be the code that satisfies the functionality (later we'll learn why it's special and privileged, etc.)

Question & Answer

Q:

What does write(2) mean?

A:

It means the system call write, which is in section 2 of the UNIX manual

The notation is not a call function write with a parameter of 2; it's the system call write

Q:

How do you know the system call (2) is in the right section?

A:

For reference, click the Wikipedia page on [[1] man]]

Section 1: General commands, command line utilities

Section 2: System calls

Section 3: Library functions, covering the C standard library

Q:

What's a shell?

A:

The answer has been written above, but to reiterate:

It is nothing more than a program, which becomes a user level process and gets equal time as everything else

Back To Navigation

September 20: System Architecture

The relationship between the kernel and the system hardware environment

News

Homework 1 released, due October7

Before you post on Piazza, please read previous posts or ensure there isn't already an ongoing discussion

Difficulty: intermediate
Homework 2 and 3 will be more difficult and be released before October 7

SVN, virtualbox

Everyone should have a CPSC account and SVN
USBs are available. Need to install software on it

Looking Forward from September 18 lecture

How does a kernel enforce these divisions?

Divisions from imaginary (diagram) to something real
Architectural support that allows us to enforce the separation. If it didn't occur, every process would be as privileged as the kernel (bad situation) for a general purpose multi-user system
There is a cost for enforcing the division. If you're in Mars Rover, it's unlikely that Linux is being run; but if you're on a desktop (isolation between users and kernel) it's a reasonable thing to do
What's the physical support from CPU to make it a physical reality?

What is the necessary architectural support for basic operating system functionality?

How do processes ask the operating system to do things?

The kernel, which is a pile of software mostly in C code and various forms of assembly code. It contains critical functionality, mostly for managing hardware
When you use the command | command, the kernel is involved: two processes can't communicate without the kernel (nature of separation)
The kernel is layered (which is a design consideration and makes it modular)
Interaction: the kernel isn't executing processes, it sets conditions for Process1 to get a little bit of time to run on the CPU

Process1's code gets run by the CPU and the OS says "you've had enough time"

Learning Objectives

Look at the IA-32 architecture, which is a bit different from the Tiny Guide to Programming in 32-bit x86 Assembly Language
Distinguish between execution environment for userland processes and the programmable resources available to the system supervisor (i.e. the kernel)
Understand that when the CPU is in protected mode, the privileged control infrastructure and instructions, are available to a piece of system software running in 'ring 0'
Understand the role that interrupts and exceptions (managed via the IDT) play in managing context switching between userland and kernel space, and understand the role that the GDT and segment selectors play in supporting the isolation between the kernel and user space

The Kernel

Example: Viewing the kernel and Linux

Kernel -2.6 is the actual kernel source, which was about 86 MB
Linux 2.6.32 is 62 MB in zipped format, and 365 MB otherwise

Different Linux versions change some parts of the kernel
There are a bunch of directories (this is the root of the Linux tree)
As you can see, a lot of it is in C code
arch: architecture (Ex. x86)
crypto: crypto api
kernel: main kernel. Has things like the scheduler
mm: memory management code - manages address spaces of process
fs: file system - lots of them
net: networking code - shared memory
virt: virtualization support
scripts: scripts for controlling to build the kernel (which is large)
drivers: devices, etc.
To view the immense code: more net/netfilter/nf conntrack <FILE_NAME>.c
LXR guide

Take the source code and dig through it; lxr is a nice tool

Navigatable. It uses the source code as input, and resolves all types so it is clickable
Encouraged to use for homework

Refresh of x86 Execution Environment

What do programs think they have access to when they execute? We'll connect it to what the kernel has access to

Example: `silly.c`

#include<stdio.h>
int main(int argc, char* argv[])
{
   int i = 42;
   fprintf(stdout, "Hello, there\n");
   return 0;
}

makefile

all: silly
silly: silly.c
       gcc -Wall -g o silly silly.c
clean: @/bin/rm silly *.o

More steps

gdb ./silly
disassemble

System Diagram 2.0

Bigger image

Segment Registers and GDT Diagram

Bigger image

Registers

General Registers, Index, Pointers and Indicator

eax, ebx, ecx, edi, esi, ebp, esp, eip, eflags

General purpose registers except for eip, ebp, esp
eax: return value register
esp: contains address that points to top of stack
ebp: keeps track of the frame pointer for the stack -- points to middle of the activation records that help structure the stack
eip: instruction pointer

User-level programs cannot access
Ex. Must use call instruction or branch. Can't write directly into it from user space

eflags: implicit side operands. States of what's happening
Code generated from silly should use registers except ebp, esp, eip, eflags

Stack is not in the CPU. The CPU knows where it is with esp

Segment Registers

What is stored is ss
Use of these things is one of the mechanisms that supports the kernel and user space
Can see the content of things -- they look different (address space from general purpose registers vs decimal values for segment registers)

ss: holds the segment selectors of the stack
cs: segment selector of code safe
ds: segment selector of data segment
gs, fs: general purpose

Global Descriptor Table

Content: offsets into a data structure in memory. In particular, the Global Descriptor Table (GDT)
GDT are segment descriptors

A segment is a portion of memory

Part of the CPU's job is to provide access (or interface)

Segments

Segments have important properties

Start (base)
Limit (how large is the segment)
Descriptor Privilege Level (DPL): 2-bits define the essence of a segment

It is part of the information in the segment descriptor in the GDT
Descriptor is privilege level
It is the only thing standing between the kernel and the user level
Major part of the architecture to protect the kernel from user level programs
Recap: These two bits located in some table in memory, pointed to by some registers, are the only thing that separates the kernel and user-level processes (the physical reality)

If you want to write a small kernel, it has two jobs:

1. Set of GDT

Provides different memory segments and address spaces to split off from the user level programs and kernel (enforces isolation/separation)

2. Set up a second table called Interrupt Descriptor Table (IDT)

Kernel services events from users and hardware
Bombarded by interrupts from the user and hardware underneath so the kernel does something interesting about it

Interrupts

The CPU can service interrupts from both hardware devices and software
Pure interrupts come from devices that need to tell the kernel they have data to deliver to a process
Software interrupts: inititated interrupts, often called exceptions

Exceptions can happen because of programming errors (divide-by-zero) or important conditions (executing an x86 int3 instruction)

The network doesn't issue an interrupt to the kernel, but to the CPU (this is an interrupt)
The CPU doesn't know what to do -- it does the fetch, decode, execute cycle

So the OS gives it an answer: depending on the interrupt vector (identifier of interrupt), it will look it up in kernel memory: the entry for that particular entry value (has address of function in kernel that knows how to handle that particular interrupt)

1. Can be invoked from userland (Ex. int3: software interrupt)

2. Can come from programs, the same process as before

0x80: software interrupt, a pointer to a routine that handles system calls (more detail next session)

3. Interrupts that can happen because the program code executing has a program (Ex. divide-by-zero)

The CPU attempts to do it, then raises an interrupt which has to be serviced somehow

Allows the kernel to do interesting things in the hardware and software

Enforces some isolation between the kerl and user level processes
IDT and GDT are tables that live in kernel memory and contain entries that:

Are pointers to kernel function (IDT)
Are addresses of portions of memory (and size) and privilege level (GDT)

Descriptor Privilege Level (DPL)

Two DPL bits allow separation (four bits: recall privilege rings

0: OS kernel
1,2: OS services
3: user level, applications
No rings: they don't exist, just the bits in the descriptors do

The kernel's address space (which is another program) has code and data, the memory segments that contain that code/data have descriptors that have bit 00 in the DPL
User level is mapped to descriptors that have 11
So if the user level program attempts to run in the kernel, the memory containing that card marks to 11 (the processor observes the information because it knows where the GDT is stored)

Let's say it tries to access the kernel memory mapped to 00
As the CPU attempts to execute the instruction, it sees the difference in privilege level and does not allow the user program to execute in the kernel

Concluding Points

Interrupt Descriptor Table Registers (IDTR) holds the address in kernel memory of the IDT

The IDT contains 256 entries of 8 bytes each(2048 bytes)

Global Descriptor Table Registers (GDTR) holds the location of GDT
The Operating System's ability to function as multi-user system (may be tested on diagram)
Understand the organization of IDT, GDT
The CPL bits in CPU, and DPL bits in segment descriptors separate the user and kernel mode

Continuation of silly.c

break *0x80483fc
run
info reg

The address chosen to break at is a mov instruction
info reg: current state of CPU (contents of user-visible registers) -- see the location of the stack

Back To Navigation

Tutorial 1: Introduction to VirtualBox

Robin's website
Find the Virtual Machine image in /home/courses/457/f13
Will receive a USB key and you have to download "CPSC Linux" on it (see Piazza)

Do not modify anything for the VM
"Read access" only VM

Back To Navigation

Tutorial 2: Introduction to SVN

Subversion revision control
Documentation for SVN from Apache
Purpose: a version of a software in the repository and be able to control the versions of softwares, documentation, or files

If you're in a team and several people are modifying the code, this is the perfect environment to control each version

Use SVN to submit the assignment inside the VM

Directory for the SVN

https:// forge.cpsc.ucalgary.ca/svn/courses/c457/<YOUR_USERNAME>

If you need help, e-mail cpschelp[at]ucalgary[dot]ca

Most Used Commands

To check if SVN is installed on your computer, type in svn in the terminal and a message should appear
Create a folder (Ex. SVN), as it will be the mission control

svn status

Structure of the repository

svn info

Information about the repository

svn log https:// forge.cpsc.ucalgary.ca/svn/courses/c457/<YOUR_USERNAME>

Shows your comment, the number of the revision control (important when you forget the number and you want to restore a version) and the time

svn checkout https:// forge.cpsc.ucalgary.ca/svn/courses/c457/<YOUR_USERNAME>

Set up the SVN

Upon success, you will get Checked out revision #

You now have communication with the repository--remember the folder you created is the one communicating with the repository (Ex. SVN)

svn add <FILE_NAME_TO_ADD_TO_REPOSITORY>

Adds to the repository (usually use before commit)
You will get A <FILE_NAME> when added successfully

svn commit

Uploads the file into the SVN repository
If you get an error about an editor, type svn commit -m "<INSERT_TEXT_TO_EXPLAIN_ACTION>"

Upon success, you will get Commited revision #

Guide for SVN

Beginning Steps

svn mkdir https:// forge.cpsc.ucalgary.ca/svn/courses/c457/<YOUR_USERNAME>/<DIRECTORY_NAME> 
  --username <YOUR_USERNAME> -m "<MESSAGE_TO_DISPLAY>"

Now, cd on your local drive to the folder you want to add to repository

svn import <DIRECTORY_TO_ADD_FROM_LOCAL_DRIVE> 
  https:// forge.cpsc.ucalgary.ca/svn/courses/c457/<YOUR_USERNAME>/<DIRECTORY_NAME>/<DIRECTORY_TO_ADD_FROM_LOCAL_DRIVE> 
  --username <YOUR_USERNAME> -m "<MESSAGE_TO_DISPLAY>"

This is now uploaded to the repository

Now you need a "Working Copy," which is a copy on the local drive that is used to compare the version to the one stored online; and this is how commit knows that changes have been made

You should be in a directory that contains all your projects on SVN

svn checkout 
  https:// forge.cpsc.ucalgary.ca/svn/courses/c457/<YOUR_USERNAME>/<DIRECTORY_NAME>/<DIRECTORY_TO_ADD_FROM_LOCAL_DRIVE>  
  --username <YOUR_USERNAME>

Now your "Working Copy" is downloaded into the repository

Additional Steps (most used)

svn add *

Any files not already under version control (i.e. added into the repository) will be added to the repository; after this, you can just use commit

svn commit -m "<MESSAGE_TO_DISPLAY>"

Any changes made to the files that already exist in the repository will be updated. If you have created new files, renamed or deleted them, then you have to add them.
If you work on "Working Copy" and commit from the folder from the local drive, then the changes will be made directly to your repository!

Alternatively, you can disregard all <DIRECTORY_TO_ADD_FROM_LOCAL_DRIVE> and when you checkout, you should be working in the folder that was transferred to your local drive

Back To Navigation

Week 3

September 23: System Calls

If system calls are privileged functionality, then HOW can applications invoke them?

Key Questions

How does a kernel enforce division between userland and kernel space?

Motivation: without a clear transition mechanism and line, then any program will be free to do anything. Doesn't go well for a safe and stable OS

What is the necessary architectural support for basic OS functionality?
How do processes ask the OS to do stuff?

Basic architecture support made available by a CPU helps isolate the kernel from processes by:

1. providing memory separation and

2. providing a mechanism for processes to invoke well-defined OS functionality in a safe and supervised fashion

System Call API: Roles

Processes cannot do useful work without invoking a system call
In protected mode, only ring zero code can access certain CPU functionality and state (Ex. I/O)
OS implements common functionality and makes it available via the syscall interface (application programs do not need to be privileged and they do not need to reimplement common functionality)

1. An API: stable set of well-known service endpoints

2. As a "privilege transition" mechanism via trapping

System Call API Diagram

Bigger image

System call API

A set of well defined send points
Mechanism to transition across the invisible virtual line

Front half of API

Look like a set of function calls
Ex. write(2), fork(2), execve(2), brk(2)

Back half of API

User level programs aren't worried about what device they're writing to
Kernel has details of how to accomplish that for a specific file, in a specific file system, on a specific device

Written in "Pk", has to jump to kernel but now allowed to do it

Since memory page permissions is marked 11 and goal is to move to page marked 00

So how do you actually accomplish it?

The Set of System Calls: Functionality of System Calls

About 300 system calls on Linux. Actual definitions are sprinkled in kernel source code

/usr/include/asm/unistd_32.h

Review: Calling Functions on x86

Stack Diagram

Bigger image

Example: `silly.c` version 1

#include<stdio.h>

int foobar(int a, int b)
{ 
  return a+b;
}

main (int argc, char* argv[])
{
  int i = 42;
  i = foobar(7, 8); // Function call
  fprintf(stdout, "Hello, there\n");
  return 0;
}

Running the program

gcc -Wall -o silly silly.c
./silly
objdump -d silly

Where is main calling foobar(7, 8)?

Call instruction to address (0x8048412). It's a jump to that location

Look at instructions before the call. Agreed semantics for a caller to invoke callee

main sets up some activation records, foobar sets the rest
To chain activation records, save previous ebp on stack

Get current value of esp and store in ebp

Get arguments in reverse order. First one is 8

each 'box' is 4 bytes, so 12 bytes in total

Add 7, 8 and eax stores 15
Pop ebp from stack and return

Calling a function branches to that location and executes

Example: `silly.c` version 2

#include<stdio.h>
#include<unistd.h> // Used for syscall

int foobar(int a, int b)
{ 
  return a+b;
}

main (int argc, char* argv[])
{
  int i = 42;
  i = foobar(7, 8); // Function call
  write(1, "hi from syscall\n", 16); // System call
  fprintf(stdout, "Hello, there, %d \n");
  return 0;
}

Call to write in main in address 0x8048462

Not actually invoking it directly. We can, but we're only users and does not have high enough privilege level

Code compiler generated is jump to this plt

Invokes a library in glibc called write(2) and redirects to it

If we strace, we would see a call to write(2)

Eventually glibc is C code and has to invoke the system call

Can't just call kernel sys write. Has to do something else

Speak to machine (literally to CPU and OS without all the junk)

Example: `write.asm`

;; write a small assembly program that calls write(2)
;; write(1, "hello, 457\n", 11
;; eax/4 ebx/1	ecx/X edx/0xb
;; 4 for write, 1 for 1, pointers to addresses, 11 for 11
;; Where to get string? Data section!

BITS 32
GLOBAL _start ;; declare where it starts. Need entry label to program as no main

SECTION .data ;; create for memory address
mesg	db		"hello, 457", 0xa	;; NASM doesn't tolerate \n	
					
SECTION .text ;; thinking back to ELF; this is the key part
_start:
  mov	eax, 	4 				;; holds number of system call
                                                ;; How to know what numbers to use? 
                                                ;; more /usr/include/asm/unistd 32.h 
  mov 	ebx, 	1 ;; stdout
  mov 	ecx, 	mesg				;; memory address to String
  mov 	edx, 	0xb
  int 	0x80

How to invoke system call? Can't do call write

Convention: somehow pass arguments so OS can access, and then invoke control flow transition
Goal: need DPL 11 to become DPL 00
How it is accomplished: invoke an interrupt instruction

INT is in a code page at ring 3, but CPU is going to execute it
When CPU sees the vector number (INT N), CPU knows it's an interrupt
Upon bootup, kernel creates IDT with a list of registered interrupts (function pointer that deals with system calls, code inside kernel)
CPU services event and changes privilege to 0 and transfers control to kernel

Kernel runs on behalf of the process

No direct control transfer from program or glibc

Some exceptional condition happens, which is what CPU notices

Build and Run Program

nasm -f elf32 write.asm
file write.o
ld -o mywrite write.o
./mywrite

What system call should we invoke, and what arguments to use? Put it into registers
Goal: make an executable, which can run through gcc but want to try without
From this, we get an error: Segmentation fault
We need to execute the right system call and also a way to exit a process

Add to End of Program

mov     ebx,    eax
mov 	eax, 	1 		;; exit system call
int 	0x80

int 0x80: nothing special

Form is int, then some arg (an interrupt vector)

Some are reserved for handling errors but this is an arbitrary number

Assembly Program with System Calls

Run program more efficiently

Get direct control of machine - no need for libraries, etc. that normal programming gives you

Make small programs
Don't have a programming environment with programming infrastructure
Want to talk directly to machine
Answer: writing malware

Injecting code (system call) into somebody's process
Hopefully have program execute it
Since you're talking directly to machine you can do interesting things

Incorrect Answers

Write drivers

Drivers are nothing more than a set of functions in data in the kernel
Typically you write drivers in C: parts of it speak to machine code, but no system calls in driver as you're already in the kernel

Make interrupts

This is userland code. Interrupts are kernel functionality (interrupt service handler) and eventually gets translated in x86

Example: `mydel.asm`

;; try to call unlink on ?
;; unlink("secret.txt") is our goal

BITS 32 ;; 32--bit program

GLOBAL _start ;; defines symbol

SECTION .data
filename	db	"secret.txt"

SECTION .text
_start:
  mov		eax, 	0xA
  mov 		ebx, 	filename
  int 		0x80
  mov 		ebx, 	eax
  mov 		eax, 	1
  int 		0x80

echo "secret" > secret.txt
man 2 unlink
locate unistd.h 
more /usr/include/asm/unistd_32.h

Question & Answer

Q:

Isn't add a system call?

A:

It's not a privileged functionality nor does it change memory size

Instructions using memory location is in the process address space, which can be used internally. Some are privileged but user space cannot have access to it

Back To Navigation

September 25: Process Creation and the Process Control Block

Main Observation

Needs to be some way for OS to expose CPU to program (process abstraction)
OS exist to load and run user programs and software applications

Continuing Discussion Themes

Consider completion of the procedure (how ELF becomes something OS can read)
Look at process and observe how they are created and destroyed from the system call layer (recall destruction from last lecture)

Terminology

Task

Generic term referring to an instance of on of the other things
Commonly used terminology refer to a running process

Process

Program in execution, with program counter

CPU records what's happening at current instruction

Has instruction pointer (what instruction are you executing next), CPU context (state of CPU that talks about current computation)
Set of resources, such as open files, meta data about the process state

Files stored on hard disk; OS can help keep track of which file they're currently accessing

Memory space (i.e. process address space)

Thread

Does not have all the heavy weight stuff that process does
Some number of threads of control within a process
Maybe its own stack and instruction pointer

Lightweight process

Linux doesn't make distinction between Process and Thread

Difference: lightweight processes may share access to the same process address space

Process Control Block

Meta data records state of execution, CPU context, memory data, etc. in kernel memory (number of data structures)

Task struct: C type in kernel

Essence of process control block
Set of metadata about a running process/program

Refer to LXR: sched.h (1215-1542)
Instantiation of struct task_struct is normal C syntax for declaring a complex data type

The method holds a number of fields that are the meta data

Notable: not nicely commented on purpose because it dirties source code (usually comment the why and what, not how)
1216: volatile long state;

Key pieces of meta data; needs to know whether this process is allowed to have the CPU

Physically impossible for all processes to use the CPU at once as it cannot do more than one thing at a time

Not all processes are capable of running or may be ready to execute

CPU might be waiting on a different process (inter-process communication)
Could be some form of I/O (disk or even slow user)

1271: struct mm_struct *mm, *active_mm;

Pointer to set of memory regions, thus memory pages that define a process address space (i.e. where code is stored, dynamic data)

1274: int exit_state;

Returns -1 or 0, or write a system call for system exit
Value goes to the kernel in the system call layer, and it lands in one of these places

1288: pid_t pid;

Process identifier

1296-1308: task_struct parents and list_head children and siblings

Relationships between processes

1345: const struct cred *real_cred;

Ensures a user can't manipulate another user's processes

Process States

Recall line 1216: volatile long state;
Data memory is a piece of information that helps OS decide whether a process should be given to CPU
Existence of this field implies that somehow there's a life cycle of a process (set of different states)

(a) TASK_RUNNING: process eligible to have CPU (i.e. run)

(b) TASK_INTERRUPTIBLE: process sleeping (Ex. waiting for I/O)

(c) TASK_UNINTERRUPTIBLE: same as above, ignores signals

(d) misc: traced, stopped, zombie

Process Diagram

Bigger image

A process is created through the fork system call
Shell is running with ID 1234
In its existence it calls fork(2), a duplicate of the parent process

Process ID 1235 is its child and bash runs

When created process in stack of kernel code is completed, the result is in a runnable state
Semantics of state: process is eligible to get CPU, but doesn't necessarily have it
List of runnable processes stores runnable processes: switch processes

Kernel scheduler goes in process control block 123, holds context and loads in CPU (registers and instruction pointers)

OS no longer in control and process is running on CPU
Death occurs and state terminates, giving up control and killing itself
Voluntarily wait/sleep through system calls that does I/O; transfers control of kernel
When a process asks OS to do something, the decision of eligibility occurs

Process may be waiting for it to get back to kernel

Process Relationships

Why does pstree exist? Why isn't it a flat list?

Process creation: system starts up and one of the results of start up is the init process

Process ID of 1, and roots process tree

Every other process is the child of init

init runs start up scripts (implication of ELF file)

It's some program that exists somewhere (Ex. /bin/silly)
If init initiates silly with execve, init would not even exist

execve Overwrites current address space with image
Convention for spawning processes: fork(), then in the child do an execve()

Duplicate process but immediately replaces with code

One of init's children is {gnome-terminal}
bash session has super user shell and pstree

Its parent of pstree is some bash shell, which is a known terminal, and has init parent

Process Creation in Linux

Processes come from processes, via fork(2) (actually clone(2))

This system call creates a (near) copy of the "parent" process
(parent) /bin/bash -> fork(2)
-> (child) /bin/bash

Isn't it wasteful to duplicate, then replace with execve?

Should be quick as quick creation is important

Three mechanisms make process creation cheap:

Copy On Write (read same physical page)

When you read memory, you can share; but if you write you can get your own copy

Lightweight processes (share kernel data structures)
vfork: child processes that uses parent PAS, parent blocked

fork-execve pair (sometimes not) so do the bare minimum

Duplicate task_struct, patches control block (not ID)

struct mm struct doesn't actually duplicate those memory pages; keeps the same reference

To create a process/thread, just invoke fork and both the parent/child will share mm struct reference (set of memory data)
What happens when child dies, but parent needs to gather final info on it?

Case of diagram. Situations occur when parent spawns child and wants to wait for exit status
If kernel services a system call from the child and erases it, the parent never gets an answer to its question
Processes that call exit() is known as zombie state (this does bookkeeping for the parent)

What about when the parent dies but the children live on?

Children get different parents -- re-parent the process to init

Example: `myfork.c`

#include<stdio.h>
#include<unisd.h>

int main(int arg, char* argv[]) {
  pid_t pid = 0;
  pid = fork();

  if (0 ==pid) {
    // Child
    fprintf(stdout, "Hey, I'm the child");
  } else if (pid > 0) {
    // Parent
    fprintf(stdout, "Hey, I'm the parent. Child's pid is %d\n");
    } else {
    // Fail
  }
return 0;
}

pid gets result from fork

If it succeeds, it will have two versions running side by side
To distinguish the two, look at what it returns

0 returned to child, -1 to parent

Output

Hey, I'm the parent, the child's pid is 4668
Hey, I'm the child

Two processes are running

Parent and child is both instances of a.out
Parent of a.out is bash

Maintain relationships through use of fork

Even with fork, it can do interesting computations

Back To Navigation

September 27: The Process Address Space (Memory Regions)

Thoughts

1. Much of what an OS does is:

Namespace management: allocate, protect, cleanup/recycle
CPU, memory, files, users
If we expect these services, then the kernel must contain code and data supporting it

2. What is a process?

PCB (task_struct)

CPU context (Ex. eip (not shared))

Virtual CPU

Miscellaneous (files, signals, user/id)
Memory space, mm_struct (possibly shared)

Virtual memory

3. What makes the lines real?

P1 | P2 | P3 | ... | Pn

Process Address Abstraction

Multiplex Memory Diagram

Bigger image

Processes of different sizes: simplistic approach of dynamically allocated memory can't happen, so the virtualization is supported (diagram is a fabrication)
Contract for user level programs - reserve top-most gig of memory for your own code in data structures
User level code/data is able to map from 0 to 3 GB
Kernel data goes below the line
Above the line: process code
Address range - name space with 3 billion things in it
When a process is created through fork, and you want to execute:

The OS takes contents and writes into the space

Heap: dynamically allocated memory, vs Stack: statically allocated memory

Bottom of Diagram (Figure 3.1 in slides)

For finding stuff in an address range
Left: segmentation

Causes CPU to talk to a particular segment
Linux says there are 4 segments; only difference is the DPL

Right: paging

Support for virtual memory
Input is a particular address or number in the range
Every program believes it has unique and independent access to the virtual range

int x = 0x1000;

vs

main(...)
{
  x = 0x300;
}

x can be overwritten
No conflict, but what makes it a different program? OS sets up distinct address spaces
Note: Know the difference between an ELF file and objdump

more one.c
objdump -t one | grep michael

Basically the name stores 0xDEADBEEF
Symbol michael has an address; by copying one to two, the same variable of the same name is at the same address

Same address, but there's supposed to be no conflict

Example: Look in hex editor for `one.c`

Letters swapped for DEADBEEF (EF BE AD DE)

Replace with FECA01C0

Side note: Endianness

This is a side note I have added that I hope will be of assistance to you
Different systems store information in memory in different ways
The two ways would be Big-Endian and Little-Endian
Gulliver's Travels

Memory Offset	0x0	0x1	0x2	0x3
Big-Endian	DE	AD	BE	EF
Little-Endian	EF	BE	AD	DE

Hacked the ELF and run the two programs at the same time
They were at same addresses but now they're not

Recall the diagram. A translation process occurs

Processes given their own space: physical memory is different even though the address may be same

Internally, a virtual address is mapped to some physical location

Two programs don't conflict, they do not even need to know they have to exist

OS has to allow this to happen

Imagine you have 4 gigs of RAM with 140 odd some processes

Consider that each process believes it can address 4 billion things

P1 believes it can do everything, but P2 would have a subset of it
Even if all processes are equal, they would overwrite data
So can you allocate memory? Not necessarily. What if address space is full of processes? Where are you going to get your new memory?

Why have a PAS abstraction?

Giving userland programs direct access to physical memory involves risk

If you give every process raw physical memory, they can read and write arbitrarily
Share multiplex between multiple processes, don't want them to have direct access

Hard-coding physical addresses in programs creates obstacles for concurrency

If you didn't have PAS, and you gave programs direct access, the implication is in the code, as you're talking about hardware
This is problematic. What if you pull a RAM chip out? Gotta change the program and this isn't good

Want to write a program and have it run for some years, you don't want to change it

Most programs have an insatiable appetite for memory, so segmenting available physical memory still leads to overcrowding

Virtual memory size: 240 GB

Even though it has only 8 GB memory, it can support more

PAS Preliminaries

Physical memory is organized as page frames
A PAS is a set of pages

Pages are collections of bytes
MOS: "PAS is an abstraction of memory just like the process is an abstraction of the CPU"

Page translation mechanism and virtual memory enable a flexible mapping between pages and page frames

Process Address Space

PAS is a namespace
PAS is a collection of pages
PAS is really a set of linear address ranges
This set of virtual address ranges need not be "complete"
In Linux, "linear address range" is memory region, and reflected in kernel code. Memory regions do not overlap

How to Multiplex Memory

OS is using most of its memory, most of its time

Example: two.c

pmap 12008

Print out the internal organization of a process' address space

Elements inside address space

cat /proc/self/maps

Can access to the raw data
Virtual file system that views into kernel data. Self is this process' ID

Stack, heap, etc. - relatively in the same space

Programs such as /bin/cat has internal code and data, but also depends on libraries

Two libraries: ld and libc

Why does ld appear 3 times, and libc 4 times? Look to the leftmost columns
2nd column: permission markings for the particular entry/component of process address space

PAS is composed of memory regions, which map to a data structure in kernel
PAS is made up of things that are made up of types and code in data

libc

r-xp: code
r--p: no write - global, data (constants)
rw-p: static data

Heap and stack match to where you expect it to be: leftmost column (start, end address)

Don't overlap - memory regions are distinct from each other
Can't bleed with each other, and the middle is where dynamic libraries go

Text can't bleed in data

Values of those addresses: low number to a high number

Have to write 3 billion in hex: can deduce answer from this

Max value of start of stack: 3 GB - 1 = 0xbfffffff
+ 1 = 0xc0000000 can load into pointer, r/w, etc.

Not allowed to touch these in user-level program

PAS Operations

fork(): create a PAS
execve(): replace PAS contents
_exit(): destroy PAS
brk(): modify heap region size
mmap(): grow the PAS; add a new region

Create or operate on memory regions (components of PAS)

munmap(): shrink PAS; remove region
mprotect(): set permissions

break.c

Uses mmap() to get a new memory area to be executable, readable, and private (not a file mapping, just raw memory)
pid = 12029

cat /proc/12029/maps

PAS: sleep, then call mmap() which modifies the picture. Returns pointer to an address (new memory region)

It's not in the list. Do it again

Marked, but doesn't have a file mapping

May be on homework. Particularly, generate code in run time (this is how it's supported)

lxr 1065-1097

struct mm_struct is declared here
Pointer to PAS - where all the data is
Logically split it

Help OS keep track of where PAS is and where all memory regions is (in a list)
rest: where memory regions are located
struct: a container for all memory regions
vm_area_struct *mmap: first data member with particular memory

Has a start, end and maintained in a list as a set of flags
Kernel gets data (that was displayed) from here

Question & Answer

Q:

(From class example) Left is 140 x 4 GB, right is 4 g. So how do you access it all?

A:

Use the hard disk to expand stuff (stuff you're not using is on hard disk), it's how you store

How does kernel store a virtual range? We'll get to this later

Q:

(From the r-xp in libc) What does the p stand for?

A:

p is private, and sometimes there's s which means shared

Lightweight processes can share the same address space, which is defined as a thread

Note that /bin/cat is not sharing anything

Q:

Why is there a gap in the addresses between libld and liblc?

A:

When you put them directly next to each other, it doesn't allow you to load things in between. In particular, put pages there (in case there's overwrite, or an attempt to read variables or write and go past it), which C and Assembly allows. The OS can do this too, and it's before you interfere with the code

If you ask the OS to allocate a memory region, and allocate it next to each other; they are in essence the same thing

Q:

libc has ---p which is not read, write or execute. What's the deal?

A:

Theorize! Particularly, look at the OS and load the glibc to find out

Q:

Can you tell us more about the test?

A:

Read the textbook (mainly MOS)

No sample exam or previous ones (review provided for the final)

Can ask people who took the course before, but the questions won't be the same

Style: similar to the survey at the beginning of the course -- short answer questions

Likely no writing code (not fair without a compiler)

3-4 questions as it's a 50 minute exam. IT won't be a long, laborious test or have lots of questions

Conceptual question

Monday's content included

Back To Navigation

Tutorial 3: Introduction to C

C is a procedural small programming language, hence fast
Used for:

Embeded softwares
Open-source projects
Operating systems

C is not a low-level language (machine languages like x86. ARM), it's a high-level language

Simple Data Types

char (1 byte)
short int (2 bytes)
int (either 2 or 4 bytes)

Depends on the compiler, but for most compilers it would be 4 bytes

long int (either 4 or 8 bytes)

Most compilers would be 8 bytes

float (4 bytes)
double (8 bytes)
Size of variables important because you're allocating memory (next tutorial)

Arithmetic Operations

Similar to Java and C++
Arithmetic: + - * / = ++ -- %
Conditional: M M- -- !- <- <
Boolean: && || !
Bitwise: ~ & | ^ << >>

Use any IDE you feel comfortable with

EMacs, vim, nano, gedit, sublime text

Text Replacement Pre-processing

Define macros by using #define, and used as form of simplification

# define MAX_SIZE 10

Other text replacement pre-processing are header files (.h). C library declares its standard functions in header files

Header files included

stdio.h: standard input and output

printf, scanf functions
CPlusPlus has a lot of references

Compiler: GCC

Review manual
gcc -v or man gcc for compiler options

Makefile

Use to automate the compilation of programs
Make up of rules

target: source
  (recipe to make target from source)

all: example1
  example1: example1.c
                    gcc -o example1 example1.c

Back To Navigation

Tutorial 4: Introduction to C

Create a makefile, and understand structures
Learn about pointers, what they mean and how you use them

Back To Navigation

Week 4

September 30: Memory Addressing in the Process Address Space

Concept of memory addresses
Very process-based
Don't get overly-tempted on this particular mechanism as a possible test question (it's very technical)
Important to know the translation, as it informs our understanding of virtual memory

Recall

Process address space: place where it stores its code and data

Abstract notion in Linux, is made up of memory areas ("memory regions" in Linux terminology)
Memory areas are collections of memory pages belonging to a particular address range
PAS has dynamic memory areas not present in the ELF, and a process can dynamically grow and shrink

A page is a basic unit of (virtual) memory; a common page size is 4 KB

Small contiguous range of memory

Memory areas and regions are composed of pages, which have data

How does your program address a piece of memory?

Naturally it includes a memory expression (Ex. mov eax, 0904abcd;) and finds wherever "0904abcd" is, moves to eax and executes

Memory Addressing Diagram

Bigger image

When the CPU is fetching from the .text section (in the RAM)

What mediates the communication between CPU and RAM is the Memory Management Unit (MMU)

Asks RAM to "read these bytes at this address"
Tricky, because OS space multiplexes resource, like RAM, and permit multiple processes to use the single resource
Causes problems:

1. If able to have a magic substance to expand without zero cost; wouldn't need any mapping -- just allocate new addresses

2. A single byte of the memory in the computer (clearly a byte isn't going to fit the PAS into physical memory)

Somehow OS, in cooperating with CPU, need to look up what's at the location

Example: one.c

Refer to Today's lecture notes

x_michael variable lives at an address (0x8049858)

Apparently these variables are located at the same address
Particularly objdump -t one | grep michael

See Figure 3.1

Part of internals of the MMU

Primary task is to translate the "address" into an actual physical location in RAM

This circuitry enables the OS to have different processes occurring

Definitions

1. Logical address

Parts: [ss]:[offset]

Segment selector, 32-bit offset
Valid ss stored in segment registers
ss is an index to GDT (a data structure the OS sets up when it boots, which keeps track of all the memory segments)

Mapping an address implicitly, is prefixed by the code segment (Ex. [cs]:0x8049858)

cs points to a piece of memory where it describes its beginning/ending and privilege level (segment descriptor user level code)

starts at x -> limit is y (look for the address)

In Linux, x is 0 and y is 4 usually, so you look for the 32-bit offset

See address: ignore logical address, and start at linear/virtual address

2. Linear address/virtual address

Virtual address space is simply a range
Can be split up and placed in different physical locations
Address and bytes next to it don't have to be physically contiguous in RAM, because there's a mapping process (just logically is contiguous)

Not an address -- looks like an address (32-bit number) but isn't treated like an index into one contiguous space

Can treat it like a collection of offsets or addresses (just a resource), so that you can split up the physical memory

Dir, Table, Offset
Relationship between a 32-bit number is mapped to a physical location (using pages)

3. Physical address

Actual location in real RAM
Expect it to get recycled and reused, but is not visible to compiler

Good thing as processes shouldn't be hard coded

Compiler believes there is a logical address and linear/virtual address

MMU translates from 1 (logical address) -> 2 (linear/virtual address) -> 3 (physical address)

Example: Translate an address to a physical one

How does this process work?

Start with a linear address: 0x08049858 which is split up into three different parts

Bits depend on the paging scheme you'll be using

Pages are typically 4 KB (4096 bytes = 2^12 = 12 bits)

Offset: 12 bytes, Table: 10 bytes, Dir: 10 bytes

Contract between MMU and hardware mechanism

MMU does translation and fetches the correct data at the correct place in memory (supported by OS)
OS needs to know the Page Directory and the Page Table because the calculation is indexing into the data structures

Live in kernel memory, and kernel has Page Table

pgd_t * pgd; (lxr 214)

Pointer to where page diretory lives

Know where page directory is

Example Diagram

Bigger image

How many entries are in it? 1024

Use the first 10 bits of linear address to find the entry (which is a location of a page table belonging to the process) referred to
Next 10 bits used to actually tell what the actual pages are referring to
Last bits used to actually index into

Enables MMU to translate in machine speed, to find the virtual address

Actually comparatively slow: read/write memory into three data structures
Solution: instead of MMU translating the process, it's to cache the process
Translation lookaside buffer (TLB) maps virtual address to physical

It's at the beginning of the procedure: MMU looks into it to ensure it hasn't seen the process before
TLBs are small, and can't cache all of memory translations, but it speeds the procedure
Caching: program does 90% of work, only 10% of code executed

Numbers aren't a physical address

cr3 maintains the location of pgd, but each entry is physical location of the next one

Question & Answer

Q:

The Page Directory is specific for every process, is that true for TLB as well?

A:

Yes, the page directory is unique to every process, as is for the TLB. In particular, when you change processes, the context switch (executes processes off the CPU and puts a new one) is partly evaluating the TLB, flushing out contents, and adding new code

Don't want to swap too quickly, or overhead happens

Some TLBs have support for marking/tagging TLB lines on a per process basis

Q:

Virtual address maps to a unique physical location. How do you actually get a unique location?

A:

Part 1 of that question is not entirely correct as there is freedom to change mapping. Virtual address is an illusion, it makes processes believe it has access to the full 4 GB range

Part of the OS that changes the Page Tables and underlying mapping is paging, which is part of virtual memory address

OS changes mapping when appropriate (makes room for processes)

Q:

What happens in a context switch?

A:

Data structures are maintained for every process -- they're kernel data structures, which are alive for the lifetime of a process

The CPU and MMU know which Page Table to did the translation

Q:

Page Tables are for every set of processes. Are the Page Directory and Tables sitting in the physical memory?

A:

Yes. It's part of kernel memory, which there's a design trade-off: kernel sets up an execution environment and let the processes run

What if you don't want to use half a gig of memory? This procedure lets you have sparse addresses as not all processes use the full 4 GB of memory

Q:

You switch out processes enough to write RAM on disk. What about Page Tables?

A:

Some OS does that, but Linux does not. Particularly on kernel, the processes are pinned into memory and do not leave it

Q:

In disc swap, do you have to update the table?

A:

Has to update the last mapping and potentially update page tables and directories. You only need to swap out discrete chunks and the mapping doesn't change

Q:

The kernel memory space gets a certain amount. When a new process is created, is a new page directory and table created as well? What if so many processes exist, wouldn't the kernel not exist as well?

A:

When a process is created, data structures have to be created. We know how they're created (fork(2)) where address space is copied. Replicate the process control block, and mm_struct points to the parents'. So you write on demand

What happens when you create so many processes? Old days, it hangs the machine as it tries to allocate memory to every single process. Some kernels run out of memory, and it can't do anything

Response varies, as kernel could crash (early UNIX days) or self-aware enough (like Linux) that it's running out of memory, so it tries to kill processes. Also, resource limits prevents it from happening

Back To Navigation

October 2: Test Review Material

I wrote this section to lay out what I need to understand (note: does not cover everything in the notes). I hope this is helpful for you regardless

Week 1

Fundamental Systems Principles: measurement, concurrency, resource management
Problems: time and space multiplexed resources, hardware, load programs and manage execution
Difference between hexdump and objdump

Week 2

Pipes: output from one command is input for another
gawk: parses input into records
pstree: displays relationships of processes
strace: records system calls and signals used in a program
AT&T style: source is to the left-hand side

OS definition: software (source code) in kernel; program running on hardware; resource manager; execution target; privilege supervisor; userland/kernel split
What is the kernel

Software: C and assembly code
Hardware interface: driver manager talks to hardware
Supervisor: manages important events
Abstraction: system calls, services export to processes
Protection: isolate processes to ensure a process isn't being incorrectly manipulated

Not in kernel: shell because it's a userland program
What to put in kernel

Complexity: hard to debug, so avoid adding code despite being faster
Mistakes: affect compiler, happen despite how good you are
Monolithic (sub systems in an image and no isolation) vs Micro Kernel (division for different users)

General Registers, Index, Pointers and Indicator (eax, ebx, ecx, edi, esi, ebp, esp, eip, eflags)
Segment registers: supports kernel and user space, decimal values vs address space for general purpose registers (ss, cs, ds, gs, fs)
Global Descriptor Table: offsets into a data structure in memory, has segment descriptors
Segments: portion of memory with start, limit, DPL (Descriptor Privilege Level)

DPL: 2-bit definition for segment, only thing protecting kernel from userland

To write a small kernel

GDT: enforce separation
IDT: services events from users and hardware

Interrupts from devices (data to deliver to process) and software (initiated interrupts, often exceptions)
Interrupt sent to CPU, OS looks up interrupt vector in kernel memory

Can be invoked from userland, come from programs, programming errors

IDT: pointers to kernel function
GDT: addresses of portions of memory and privilege level
CPU checks for the DPL of a program to ensure privilege level is appropriate

Week 3

System call invokes software interrupts, sent to the CPU, and transfers control to kernel
System call API: well defined send points to transition across invisible line
Privilege level is not high enough, so example write(2) jumps to plt, invokes library in glibc and invokes system call
To get DPL 11 to 00, invoke interrupt:

CPU services event, changes privilege to 0 and transfers control to kernel

System calls in programs: efficient programs, no programming environment, talk to machine, write malware to inject code

Processes: running software application vs Userland Processes: programs in execution
Task: instance of other things (refer to ruing process)
Process: program in execution with program counter and other elements
Thread: threads in a process
Lightweight process: thread is a process, and vice versa. Difference is that lightweight processes may share access to some process address space
volatile long state: is process allowed to have CPU?

Process states (running, interruptible, uninterruptible)

struct mm_struct: pointer for memory regions
Process created through fork(2)

Duplicates parent process and result is in runnable state

init is root of tree, has start-up scripts
execve: overwrites current address space with image

Spawns process by fork(2) then execve(2) from the child

Copy on Write (reads same page)

struct mm: parent and child process (through fork(2)) shares the reference (sets of memory data)

If a child dies and parent needs information, it will get no answer; vice versa, child gets a new parent

Child is in zombie state, as the process is killed and information is still required of it

Process abstraction: OS exports view of virtual CPU (snapshot of executing state of a program)
Physical memory has page frames
Process Address Space: set of pages (bytes)

PAS abstraction of memory like process is abstraction of CPU

PAS linear address ranges (memory region don't overlap)
Linear address space -> memory regions -> memory descriptor
Memory region: set of consecutive pages
Page: set of linear addresses and the data
fork(): OS takes content and writes in space
Processes given own space: physical memory (virtual address -> physical location)
PAS abstraction: Do not give programs direct access to physical memory
Multiplex memory: heap, stack doesn't bleed with each other
Use hard disk to expand memory

Week 4

Paging: allows process to see all of the virtual address space without using physical RAM to be installed
Memory Management Unit: transforms virtual address to physical address (reads bytes)
Logical address: [ss]:[offset]

ss index to GDT (keeps track of memory segments)
cs: maps address implicitly. Points to a piece of memory describing privilege level (0 -> 4, look for 32-bit offset)

Linear/virtual address: a range that can be split up and placed in different locations

Mapping process for the collection of offsets of addresses

Physical address: actual location in real RAM
Page Directory specific for every process
TLB also unique (it's at the beginning of MMU to cache memory and speeds up process)
Context switch (switches processes on/off CPU)
Page Directory and Table sits on physical memory

Back To Navigation

October 4: Virtual Memory

Recall

Process Address Space

Memory addressing (Eg. MMU's segmentation and paging support)
Virtual memory

Paging/page replacement algorithms

Example

objdump on an a.out file
See do_work and main
What is mov DWORD PTR ds:0x80496dc, 0x1?

Logical memory address because prefixed with it is a segment selector register, that points to GDT
Virtual address (series of bits): 0x80496dc
Where is it in physical memory? No idea

Physical location may change arbitrarily - do not care where variable winds up in the physical realm

Problem Origins

See Figure 3.1

Segmentation and paging break link between physical location and variables
Provide isolation between kernel and userspace programs (use of segment descriptors, etc.)
Independent process spaces become real because of each process gets virtual address space
Processes cannot access same physical (or virtual) memory location without asking kernel for help setting up IPC (threads, shared memory, sockets, or file I/O)

Focus Question

If all processes believe they have 4 GB of memory, but only 2 GB on the machine, the OS doesn't keep all memory spaces
All processes believes it has its own virtual address space
OS should support the illusion that you can run as many processes on the machine
Implication: Math doesn't add up

(1) Only have a limited amount of RAM

(2) On average, NNN processes

(3) Processes don't know how much memory they will actually need

(4) Each believes they can access at least 4 GB

The Gap and Problems

Total memory demands outstrip physical supply

# of processes * 4GB (worst case on 32-bit)
140 * 4GB (picture above) ~= 560 GB

But only 8 GB of physical RAM on the box

Internal fragmentation (segments)
External fragmentation (pages)
Heaps grow dynamically. How to keep the new physical addresses associated witht the contiguous virtual addresses?

Seeds of a Solution

Memory Hierarchy Diagram

Bigger image

Array of storage and capacities of devices
Hierarchy of storage location has properties that varies as you go up or down in this stack
Registers: Logically every variable declared should live in a register, but that's not true: CPUs only have a limited amount of registers

Small number of them, but they're fast (next to data path, access within 1-2 cycles)

CPU cache: frequently accessed instructions are close to CPU
Primary memory: inexpensive, cheap primary memory available, but needs are never satisfied
SSD: Need to go over a set of wires, a bit slower
Rotational disk, optical disk: slower, but larger

Logically want a variable to live in a register, but the variable can actually live anywhere in the stack (must get it into the right place at the right time)
Inside behind virtual memory
Sees as much variables as can physically address and knows where the data is

History/Previous Approaches

Swapping

Only have so much physical memory, so disk runs all of the processes; takes one process at a time, loads into physical memory that's available then stops it and swaps it and do the next job (i.e. on a context switch)
Space multiplexes resources
Limited size box, put a program at a time in a box
Problematic for dynamic memory allocation
Suboptimal - swap an entire process in and out of available physical memory which is slow and has lots of overhead
Programs need to be specific about memory, and programmers don't like that restriction: "Why should I care or worry about how much memory to allocate?"

Overlays

Programmers use program to split into fixed-sized chunks so OS knows how to deal with the unit at a time
Force you to define program as a linked list: run the first bit of code, swap, then so on
One overlay points to the next; the program occupies the same memory for its lifetime
Take small physical memory and run many different programs and knows what the next one is
Programmers need to do the work of splitting

Not satisfactory. Program does work and doesn't solve problem of unpredictable memory

May allow us to move in hierarchy in a predictable fashion

Virtual Memory

More elegant version of overlays
As an OS, memory can be addressed or manipulated in these fixed sized chunks called pages
OS has conversation with memory in terms of pages
Memory reference: talk to memory a page at a time
Allows amortized overhead (swapping in physical memory)
Particularly, there's a gap and a layer of indirection to insert
Pages have standard size, and maintain the collection of pages (subset of entire program) in memory as the "working set" of a process
If need to use physical page frame underlying any page, can swap that page out to disk and bring in the new, needed page
If working set is relatively stable, and most programs don't use their full virtual address space, virtual memory can multiplex primary memory enough to maintain the fiction implied by the PAS abstraction
Page address translation mapping between logical page and physical frame doesn't need to be consistent
Logical page can move around between physical frames, and the way the OS keeps track is by updating the corresponding page table entries
Adding this layer of indirection is fundamental for virtual memory. Logical pages that are absent from physical memory can spend time on some other storage medium (i.e. disk)

Virtual and Physical Memory Diagram

Bigger image

Primary memory can be organized into fixed sized chunks called pages

Limited amount. Using memory circuitry, break down a particular virtual and physical page

Any page from any process, could occupy the physical page at any point in time
Every process can have as many pages as it can address (store variables, etc.) and now we juggle: when a process runs, any code and data is referencing and paged, and lives in physical memory. If not, will get it for you (happens in start up)
ELF, OS reads and creates PAS and a set of virtual pages, and store content somewhere in physical memory
140 processes later, physical memory looks like a mish-mash of virtual pages from a bunch of different processes
Mostly works because not every process (even though it can address 4 GB or more) actually uses that memory -- greedy, but not that greedy
Processes access a relatively similar set of pages - notion of working set
OS help keep those hot virtual pages in primary physical memory
Illusion that virtual memory contains all code and data, and is readily accessible

But when you need to load in a virtual page and it's not in physical memory, a Page Fault occurs

MMU does that translation and uses it to index in the page tables, and the last translation from page table entry to physical backing page doesn't resolve -- entry is null

Error condition. OS notices this, and has other storage where you can store pages that's been swapped out of memory

In disk, storing virtual pages not in primary memory, but now you have to bring it back to a physical memory

Swapping a new page means servicing a page fault and reading the data in from disk (two costly procedures), but alternative is more expensive: full swapping or inflexible overlays

Have instruction (recall mov at beginning of class) which lands in CPU, and CPU executes instruction (fetch/decode cycle) and looks at address. Need to write 1 into this memory address (CPU attempts to do this). It takes that value and sends it to the MMU. MMU, do you know where it's in physical memory? Where should I put it? MMU obliges and does the translation process
Different outcomes:

(1) Virtual address is actually present in physical memory

Instruction completes: value 1 is sent over memory bus and written to some physical location (fast)

(2) Virtual address is not present in memory

MMU does look up and the pointer is null. CPU notices that it can't resolve the virtual address to a physical one: thus exceptional condition

Exceptions handled by OS kernel. CPU raises memory exception through IDT. Part of OS (function in kernel) deals with page faults that register in IDTs

Different types of page faults
May not be mapped into a physical page yet. In that case (1) free physical frame is available or (2) physical memory is used up

(1) OS goes to disk, grabs contents of virtual page, maps to free physical frame and is happy
(2) Full, so OS decides which of the physical pages to evict - takes contents to write to disk and swap new virtual page into recently evicted physical frame

Instruction like load memory address n - goes through page table translation

(1) Notice no pointer of physical page, so traps OS (i.e. CPU raises exception)

(2) OS gets involved and the page is in either physical memory or not. If not, SIGSEGV (illegal memory reference) occurs. If on hard disk, get it and read it in

(3) Try to find a free page. If so, write it in and update data structures that describe the mapping between virtual and physical, then re-execute instruction in userland

Linux OOM Killer

May not have any memory left, which is a bad situation

OS allow programs to execute. If it can't, it's bad because no memory can be given back. If it's stuck, no progress can be made

Picking a victim is heuristic. Avoid:

Processes that might be sensitive (i.e. owned by root)
0, 1, or kernel thread directly accessing hardware
init - would make it unstable
Processes that access hardware - causes damage to hardware as it's sensitive
Long running job - can lose lots of progress despite its accumulation of states

Pick:

Victim with lots of pages to reclaim (+children)
Possibly low static priority

Effective Access Time

How much do page faults degrade performance?
Can be slow: don't want to have a lot of page faults

Don't want to be common access case despite having support and programs not aware that data is living on disk (disks are slow relative to registers and memory)

Can calculate on this diagram. Particularly, get an idea on how poor systems perform by looking at page faults and the cost of it

EAT is sum of two components:

Normal performance: (1-p) * access_time
Worst case/fault performance: p * pf_overhead
Where p is the rate of page faults [0..1]

Generally gets idea of how well system is working

EAT Example

Overhead is slow to access time - invoking interrupt

Diagram overhead: disk read, service page fault including initial interrupt like 'no page table', potentially goes to disk and page frame, and restarts vs
Access time: normal memory access time
40x more(even at low rate): avoid this. Even if 1/1000 memory references wind up at page fault, it is costly

Don't want to get in a situation where system is servicing page faults.

Example: `greedy.c`

malloc forever allocates 1k at a time and check the result to see if it's null

Exits if it's not doing it forever (soaks up memory quickly)

Go to System Monitor

Memory + Swap History: 2 GB memory on VM, using less than 10% and swap has half a gig (file on disk somewhere) and not using any of it

Good state with no page fault services
How to make it use more? See code

#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
#include <string.h>

int main(int argc, char* argv[])
{
  char * c = null;
  for(;;) 
  {
    c = (char*) malloc(100 * 1024 * 1024);
    memset(c, 'X', 100 * 1024 * 1024);
    if (null == c)
    { exit (-1); } 
    else 
    { fprintf(stdout, "Address of c is: %p\n", c); }
  }
  sleep(1);
}

Memory goes up. What's going to happen? Swap will be used (still in usable state)
Eventually terminal says "killed" as it is a waste of its time (OS rescues)

Question & Answer

Q:

If there are no free frames available, what happens?

A:

OS has a decision: it looks at one of them and picks a victim

Kicks an occupied one out, and writes it (saves it for later)

Q:

What if the hard disk (reserved for extension of physical memory) and the physical memory are full?

A:

Problem: can't do anything - allocate more memory, create new processes, or make progress with current processes

OS fails all requests from memory. All system calls will return null, -1, or errors (not really a solution)

You have to make some space somehow. Pick a different victim: a process or a few to kill, and reclaim the space they're using in swap or primary

Ex. 150 processes, OS makes a decision on some heuristic/criteria and kills the process and reclaims the pages

If physical memory full, not a lot of space was made; bad state of computer to get in (and should avoid it)

Try to control this as a user: make swap reasonably big so it's several gigs

Buy more memory, but can't do a lot of rational things

For heavily loaded systems, usually there is redundancy or load balancing: if one goes down, the others can still handle requests. Hopefully the user system should never get into that

Q:

What if the OS dynamically increases the swap spaces?

A:

Swap usually has a fixed size

Potential downsides: If OS is willing to increase at run time, you'll just feed the addiction of the process (you give it more disk, eventually you run out again)

Q:

What happens if you write a page to a bad block?

A:

Physical devices aren't perfect. Sometimes disk has manufacturing flaws, so not all portions of it are valid

Deal with it as OS keeps track of disk blocks that are no longer responsive and tries to avoid writing data

Usually modern OS maintain the bad blocks and tries not to write data

Back To Navigation

Tutorial 5: Introduction to Writing Shell Code

Shell writing: provides text-only user interface for Unix-like OS

Primary function is to read commands that are typed into a terminal windows and then execute them

Input/Output

Can use FILE type for performing I/O operations
FILE type is a pointer to a particular file

Error handling

Always check for return values. rad man pages to know what function returns (some are null, some are -1)

Study the different signals when running into errors in programs

Writing UNIX Shell

Always use manual for functions not familiar with
strtok(): divide string into sub-strings according to a token or character

Treat each token as an entity (to get user input like cd <directory>

memset(): Way to put a value in an address

Use arrays

Back To Navigation

Tutorial 6: Introduction to Writing Shell Code

Finish shell coding example

Back To Navigation

Week 5

October 7: Page Replacement

Recall

Talked about virtual memory, what it is (illusion of having entire PAS in memory at same time)

Not true because limited physical amount of memory in machine

Saw how to put pages on the memory hierarchy

If physical memory is full, choose a page to evict to swap

Memory Management Recap

Process Address Space: How does the kernel support the ABI memory abstraction?

Virtual construct that an OS makes available to ever process in the machine
Provides a sense of isolation and protection between kernel and userland

Memory Addressing: Why doesn't programs' manipulation of variables conflict/collide?

Look-up mechanism - where in physical memory stores an instruction?
Maintain independent process address spacing
Map similar or same virtual addresses to physical

Virtual Memory: How does a kernel keep all processes in memory at once?

Access to PAS and also an unlimited amount of it

Figure 3.1: Entry in Page Directory

Paging: Page tables set up, and every entry has a special format

Information in the entry supports paging but also efficient page replacement

MMU as a Function

Look at pseudo-code in slides
CPU executing an instruction

Ex. MOV 0x123, 0x1

CPU gets a sequence of bits/bytes and activates certain parts of the CPU, and store value in variable
CPU begins to execute the instruction - needs to know where to write into memory
Systems without virtual memory - physical address

But this would be a logical address. CPU must convert to a physical location
Takes ds:0x1234 and needs MMU to translate it

logical_addr.segsel: look up in register list, get logical address (ss: offset)

Value in ds - gdt_idx used to look up into the GDT (in kernel memory)
Fetch ss from GDT (in process, may talk to memory) and store in segdesc

privcheck(segdesc.dpl, cpl): DPL (limit to check for privilege - make sure the CPL matches with DPL) checks if instruction can access memory. Hopefully yes. If not, CPU generates exception and a segmentation violation

virtual_addr: if successful, logical address translates to virtual address (take base attribute descriptor, add offset of logical address -> virtual address)

physical_addr: take virtual address, pass through page tables (function that maps virtual address to physical location) and return a physical address

isvalid(physical_addr): CPU checks if it's valid. If not (points to null page - no allocation to an address), a page fault may be thrown

Checks privileges, etc., important in servicing page faults that help us with page replacement

tlb.save(physical_addr, virtual_addr): If we are, derive mapping from virtual to physical and ask TLB part of MMU to save and cache the relationship. This ensured future instructions don't have to talk to memory to resolve the physical address of the instruction

memory.do(physical_addr, op, N): finally, put out to the memory bus - read/write to/from the physical address

Memory addressing supports the notion of separate process address spacing - OS needs to set up data structures that page table can rely on to execute successfully
Need MMU support to provide abstractions what programmers believe they have access to on modern OS

Bits: Contents of Page Table Entries

Paging Diagram

Bigger image

Translation has to do with looking up entries in the data structures
Each 10/10/12 index in each of the page tables
pgd -> page directory for particular process

First 10 bits help index into the page directory (points to page table)
Second 10 bits index into that particular page table
Page table points to an actual page, and last 12 bits index somewhere into the page (Some byte)

Keep track of metadata important to OS to communicate about the state of the page

Valid page? Recently accessed/written?
Hardware can set some of the bits, so it can make an informed decision whether to kick a page out or not

20 bits: page frame index
Present bit: is this logical page in the frame?
Protection bits: R/W/X (sometimes R implies X) eg. code page, may have r/w bits set, etc.
Dirty bit: was this page recently written to?
Accessed bit: hardware sets this; OS must unset

Used to determine candidates for swapping

User/Supervisor: ring 3 or everything else

Page Replacement

Virtual pages that belong to different processes
Virtual memory allows us to divorce each virtual page from the actual physical frame
RAM has a limited number of physical page frames and the idea of virtual memory is to manage the mapping
Virtual memory allows us to expand the memory to other devices (hard disks, potentially network)
If all spots are taken, should VP5 be on disk? No, does no good. Want to put in memory so you can reference it

Which to take out of memory and store into disk?

Picking a Destination Frame

Free Frame vs No Free Frame
Free Frame	No Free Frame
Free frame page OS selects it, fetches data from disk, and writes page into frame, updating the indexing as needed	OS will: (1) Select some frame (via page replacement algorithm) (2) Evicts its contents (if it's dirty) to disk (and mark its indexing information in the page tables as 'absent') (3) Write the new page contents to frame

No free frame

Select one of the frames, break the mapping and replace with the one to add

Keep data and write into disk

Informed vs uninformed to execute decision

Informed: use extra 12 bits to decide which page to note (which page recently written to? Can't overwrite data; must write to disk first)
Replace old one if not recently written to

Hardware doing OS a favor. OS reads the bit, and does not delete recently accessed bit.

Approaches and Explanations
Name	Approach	Pros	Cons	Notes
Simple Approach: Random	Select random frame to evict	Simple to implement and understand	May select heavily-used frame (causes page fault or a page that needs to be run)	Simple, but wrong decision made 50% of the time Servicing page faults to bring back code you want to reference is costly
Best Approach: Optimal	Select frame that will be referenced furthest in the future	Optimal; maximally delays each page fault	Fantasy, cannot happen	How to know if frame isn't going to be referenced for a long time? Benchmark: compare instances of other approaches to it If you know when each page is referenced to in the future, can generate minimal number of page faults and compare to other functions
Simple Approach: FIFO (First In, First Out)	Evict frame with 'oldest' contents	Simple to implement and understand, low overhead	May select an important frame	Given memory reference string, keep a linked list of numbers and keep track of oldest reference OS tracks when a page has been last referenced to (hardware has information) Evict the page at the front of the list (may not be needed if it hasn't been referenced to for a long time) Sub-optimal: Frame may be important and heavily referenced to (not being referred to for a while doesn't imply unimportance) Process states (real time running time) for priority queue is difficult and costly When looking at scheduling, information isn't actually kept (expensive since process is independent of OS)
Second Chance Approach: Improve FIFO	Evict in FIFO order, but check if page is in use before evicting (page gets second chance)	Vastly improves FIFO	Degenerates to FIFO; shuffles pages on list	Mostly equivalent to "clock" (clock doesn't shuffle pages around a list); relies on hardware providing a "referenced" bit Using FIFO, look at first one (oldest page) and check if it's 0 or 1 If it has been referenced (1), want to give it a second chance Write value to 0, move to back of list, and move on to next value R gets updated asynchronously Hardware updates the bits independently of the procedure (mark a page when accessed. Only when it's serviced as a page fault would it be looked in software) Problem: If all frames are 1, it's an overly inefficient way to get to FIFO Improvement: circular buffer and pointer (update pointer, no need to move nodes)
Good Approach: Least Recently Used	Evict frame that has not been referenced for a long time	Semantics closely match Optimal Approach	Costly implementation; hardware requirements	Keep track of an ordered list of what page frames has been least recently used Lots of hardware support than a single bit. In reality, most algorithms are just approximations or models Ex. See Slides pg. 22 4 physical page frames; make a particular memory reference string (Page 0, 2, 1, 4 3 9 1 1 2 0) First 4 frames are page faults - assume empty memory Next memory reference is 3 - write on Frame 1 (least recently used) 1+ in Frame 3: there's a hit, since page 1 is already in physical memory Service 0, Frame 1 is least recently used

Question & Answer

Q:

What is a CPL?

A:

Two bits stored in the CPU. The control register tells you what process is executed

Q:

What if another process accesses another virtual address? Would that caching still point to the same physical address and then skip it?

A:

It's a different process, so you don't want this mapping to happen. On context switch, which maps virtual addresses to some physical one, are no longer valid and TLB is flushed. Certain TLB supports whether the entry is valid or not

Q:

How many bits are in the metadata?

A:

Depends on the hardware and OS. Refer to the Bits slide.

Q:

Why leave 20 bits? If you're only accessing 4096 entries, can't we allocate more?

A:

First part of the entry page address is 20, as it accesses it to a page

In each scheme, the page table has 2^10, 4024 entries. Want every entry to point to any physical page

This is the beauty of the mechanism - address is virtually contiguous, not physically contiguous

Q:

Every entry in the page table should identify a page, but with 4k pages, do we need the last 12 bits of the entry?

Entry is 32 bits, and it's supposed to point to a particular page, so do I need the last 12 bits? Or can they all be 0?

A:

They can all be 0, because it uniquely identifies a page and pages occur only at a 4k boundary, so the last 12 bits can be anything

An entry page address holds the actual page being spoken about. The rest is used however we want

Q:

How do we get what was referenced last?

A:

You don't. This is the problem that real solutions run into. Somehow, you want an unlimited history of every page, and then efficiently compare; but good luck with that

One solution is to take advantage of the bits (some number)

Back To Navigation

October 9: Recap

Test 1

Satisfied, because rare to have a normal distribution
Points add up to 1000 -- don't panic about this grade
Stop by office hours for test results (~1 week we will talk about it)
Contact instructors more; can explain more in depth

If Michael doesn't see an interrupt, he can't service it

Homework 2

Three programming problems (some have questions that you shouldn't neglect in README file)
66% of them are kernel-level programming
Problem 1: userlevel program that does something interesting

How to manipulate PAS and memory regions
Generate code at run time and execute the code
Fairly easy if you get the right system calls involved
Brush up on NASM, x86

Problem 2: kernel that does something interesting

Questions are to get you looking in the right direction
Have access to Linux Source Code -- valuable tool you should use (purpose is to look at kernel code)
Problem asks you to profile the location of the stack in every PAS on the machine (in different ways)

From userspace, write whatever to collect that information (write a command to start the stack - hint in Q1)
Collect information from the kernel itself by writing code that is dynamically loaded in kernel and extracted
Summary: compare the two approaches: which is better, more reliable, more efficient; what is the best way to do it?

Problem 3: change the kernel

Extend functionality of kernel to implement a new mode that a process can be in
Particularly, add a system call that marks a process in a certain way; if it's marked like that, the kernel starts to fail that system call
Build user-level utility that uses the system call (add to kernel which is described in LDK)
Implement user level in kernel functionality (add system call and implementation) and assess what you did; show that it works

Assessment Exercise

(Q1) If we didn't have computers (devices that implement algorithms), would we still need operating systems?

Politics: resources (tax dollars) and a set of programs (social programs, rules, etc.) and need to decide how to multiplex tax dollars to maximize value => Multiplex resources

Users don't need to know details of subsystems (superuser is PM/president)

Industry: resources (people) and management (manage people) => Abstraction

File folders of information on certain projects, and only certain projects can look at certain information => Data protection files

Libraries: multi-user book OS (concept of multiple users of a resource, checking in/out, etc.) => Scheduling
Point is that concepts and principles behind OS apply to other disciplines

(Q2) What is the difference between:

Protected mode

Recall System Architecture slide
Architecture mode (collection of functionality, x86 -> CPU)
Supports isolation between PAS
Start-up: real mode - no virtual memory, address translation, etc., and switching bits become protected mode

Kernel Space Recap Diagram

Bigger image

Kernel space

Space in which kernel exists
Virtual address space available in CPU

Kernel resides in the same virtual address space (i.e. range) as user level programs
Is separate and distinct from userland (because of DPL bits that don't allow userland to touch kernel)
What's in the kernel: data structure that is the task list (elements are task struct <- PCB)
GDT, IDT also in kernel: IDT - interrupt vectors are indexes of the table and function pointers point to the code struct
Kernel has some number of structures (just an ELF) and one of the functions is something that implements a syscall dispatch
Another table that matches syscall numbers to function pointers that point to kernel code
Schedulers and page allocators (code)

Supervisor mode

Ring 0, have access to privileged instructions

Root user

User account (super user, highly privileged)

Do root's processes run in ring 0?

Trick question -- non-sensible. Processes don't normally run in ring 0 (it's supervisor mode, kernel's mode) and most userlevel processes (included those in root) make a system call, and OS runs in ring 0 on their behalf
Kernel threads are run in root user, and actually run in ring 0

Question & Answer

Q:

What is the correlation between exit time and score on the test?

A:

No correlation. Lesson: don't rush and don't panic, it's okay to take a long time

Q:

How much of the assignment are we able to do now?

A:

Problem 1: Everyone can probably do

Problem 2: Do user level script, LKM based on tutorials, but details of how to get information is a learning exercise

Problem 3: A learning exercise. Straightforward for writing system calls, user level utility to write system call will be in tutorial or on the web, but the hard part is what you have to do in the kernel to start filtering system calls on a per-process basis. It can be straightforward: a data structure that's a system call table that does a system call dispatch

No point in waiting. Learn by yourself. You still get pieces of the assignment in lectures (system call, implementation), but lecture/tutorials are more a safety net, not something you should depend/wait on

Q:

Compared to the first one, how straightforward is this and how long will it take?

A:

Fairly straightforward. Harder than HW1, but this is an activity you're familiar with. Learning semantics and working in an unfamiliar environment will be harder

Q:

How thorough should our man pages be?

A:

Expect to find on the web; doesn't have to look glorious. Just be familiar with how it's documented, as programming involves: 1/3 writing, 1/3 maintaining, 1/3 documenting.

Q:

Can you clarify on the differences between kernel space and paging, and each process which has 4 GB of memory? Does each process have 3 GB, and does the kernel live on each process?

A:

This is just a virtual space, and the model we're talking about

No, processes cannot access the virtual addresses since it belongs to kernel (mismatch of CPL bits)

The kernel isn't replicated in PAS, it simply is. Each is a process running user space - kernel keeps track of a process using a data structure (list)

One data member -> MM struct -> describes the layout (memory regions) of a PAS -> page tables (index mechanism - describes where in physical memory the virtual parts in address space are) -> pages (for the kernel itself)

Every process has a process to pages that belong to the kernel in/from mm_struct

When processes make a system call, the kernel executes on behalf of some process

Kernel needs pages as well (code and data need to be in physical memory)

Rather than replicating, pages are embedded in mm_struct for every process

When it tries to execute it, that will be translated to the MMU and the DPL check will occur

Mappings are all there to allow kernel to execute, but processes don't have access to it

Q:

Why does the process need to keep track of the 1 GB of kernel pointers?

A:

The kernel does this, not processes. MMU knows to reference. Implicitly puts into everyone's mm_struct so you don't have to need several

Processes don't need it and if it tries to access the virtual address range, MMU doesn't allow

Q:

Is supervisor in real mode?

A:

Supervisor mode is a subset of protected mode. Real mode is code that's executing with ultimate access to hardware

Q:

What is the root user and how does it run in ring 0?

A:

Root is just a user (like any other) but the kernel makes an exception for it - can invoke certain system calls, checks are more relaxed

Root's programs run in ring 0 generally

Back To Navigation

October 11: User--level Memory Management

The API

Most systems export to user-level programmers for growing/shrinking memory (i.e. dynamically allocating memory)

Involved in creation/deletion of objects

Approaches/algorithms and data structures to support memory management

Recall

Processes have address space supported by memory regions

Supported by virtual memory address translation, then physical memory
Stack: (API) -> PAS -(memory regions)-> VM (memory address translation) -> Physical

Example: Modified Exercise

Ask for more memory using malloc(3)
Give back memory using free(3) (1 free for every malloc)
calloc: provides us with clear memory (same with malloc)
realloc: resizes the memory block

OS willing to adjust properties of PAS

Write a simple program for malloc(10KB)

Where is 10k appearing in PAS? What part of PAS is responsible for holding, by default, dynamically allocated memory?
The heap: With 0 at top, it grows down (gets bigger)

Starts out at a position in memory (address that defines absolute end of heap) and when more heap is needed, call malloc for some number of bytes

malloc(1024) -> increases heap by 1 KB

Example: llist.c

Defines node structure and continuously builds a linked list (runs forever; every second it takes onto the list) and memory is unbounded
How many bytes is this structure?

3 fields
Ask the machine itself: 12
For future reference: struct is 4 bytes because of address (pointer is 4 bytes)
data: 1 byte
num: 4 byes
link: 4 bytes
Access isn't a line (data | num | link)

It pads the entire struct with 3 bytes (nice to index into)

Dynamically allocate this thing: 12 bytes - increase heap by 12 bytes

Heap doesn't actually, though. Particularly the OS deals with memory by pages (4 KB)

Disconnect of implementation of malloc and what the OS will do

In class, recall commands:

pmap PID: gives structure address space for process
cat /proc/PID/maps gives heap

strace -o llist.strace -e trace=mmap2.brk ./llist

Look at the behaviour of the program as it executes, and as it calls malloc
What does it do in terms of system calls? Sees the relationship between malloc and processes

tail -f llist.strace

Program starts up and maps into libraries
Calls to brk, then nothing else
Suggests you can malloc as much as you want, and the OS doesn't get involved

Beyond scheduling this process, OS is not involved in the execution of llist

Where is malloc getting its memory? It's not increasing the heap

malloc doesn't directly manipulate the heap
Occasionally it invokes brk or mmap to ask for more memory in terms of some number of pages, but glibc malloc implementation doesn't own internal memory management (program is interacting with it)
Doesn't reach to the point of asking for more space (no need to increase heap)

The Problem

Problem: support growing and shrinking memory allocations

(1) Disconnect between mapping and the OS: doesn't want to ask OS to allocate a 12 byte struct, and OS shouldn't worry about it

Give a few dozen pages, and malloc will do the rest

OS passes job on user-level utility. Doesn't need to make it hard with memory management algorithms
System call is clean interface

(2) Want to grow in variable sized chunks and shrink them

Let us assume what's going on in memory management:

Has some number of bytes; memory manager needs to give away 12 bytes from 0

Memory Allocation Diagram

Bigger image

Give 12 bytes, return 0 (node is at address 0)
Continue to give more memory in 12 byte chunks
Ex. P1: 12, 12, 12, 12, 1024 (just give it) -> 1048

P1: Free(P2), then another program wants 12 bytes

Can give next 12 bytes, or use the free spot

Solutions

Assume you have some pile of memory and it's at time T. Already given away some blocks of memory (fragmented heap with some free spots)
When another process comes along, we can make a decision according to these strategies
First-fit: start at 0, take the first open hole that satisfies the memory

But always going through zero--not likely to find the right hole size at the beginning

Next-fit: keep track of where you left off last time

Next time the process wants memory, start at the last pointer as the following space probably has a hole

Best fit: looks at how much memory is available from all the available holes

Checks what the memory request size is and whether it can fit in a hole
If it does fit, consider it as a candidate (c1)
Collects all the holes that fits the request (all c's), and checks for the tightest fit

Worst fit: doesn't care to find best fit, just want to find the largest hole and put in there (greedy)

glibc might be doing this

Fragmentation

Problems: wasting time running back and forth

Best/worst: takes time looking for holes, comparing, etc.
Problem: listens to the user. Asks for weird sizes of memory, and we're arbitrarily putting memory into chunks the user wants to deal with, not the OS
Solution: deal with memory in fixed sized chunks of memory (like OS)

Problem:

(1) Internal fragmentation

(2) External fragmentation

Ex. A tremendously fragmented situation. If you collect all the holes, you find little free space (might be tight or smaller than likely requests are going to be)

There might be a GB of memory, but the individual spaces are too small -> problem
If continuous in memory, can satisfy future requests

Most programs ask for small, medium or large (few 10s of bytes, 1 KB or a few dozen MB) chunks

Memory maintains indexes and a series of pointers to the actual memory holes of the size and maintains memory chunks of that size

Problem: may exhaust a certain bucket; you have a choice to deny the request, or an even harder choice to co-collect some resources and hand it out (but OS does internally)

What about if the program requests size for large size?

Internal fragmentation: across all sizes, internally may have enough memory to satisfy memory but it's already been allocated

Make the best trade off. It is easier for memory managers to deal with fixed sized chunks to translate an external fragmentation problem to internal

malloc

Amount of memory activity didn't touch or modify anything of the OS
What if it did? Rapidly talking to heap

Example: Find the `brk` System Call in LXR

LXR guide

Where to find brk? Search for it

"brk": confusing since these three characters can appear anywhere in the kernel

LXR says 'found brk as a local variable, as an element of class structure, and the rest of text found in several places
Not a local variable (since it's a system call)

System call implementatations are usually named "sys_<NAME>"

"sys_brk": function prototype or declaration, and free text (not that one)

Click the first one: declaration with a signature,but where's the implementation? In the code!

Entry point for system breaks (sys_brk is a symbol and refers to a location in kernel text)

The location to find functionality is defined in this macro: syswrite (define 3 syswrite as it has 3 args)
"define1(brk)"

Which of these five hits are we looking for? Not dealing with spark or alpha. nommu (machine has mmu, so not that one) -> thus mmap (memory management portions of kernel in memory)

A macro that expands to a function signature

Can read this code mostly without anyone's help as it's C code
unsigned long: declaring global variables (may be return value or otherwise)
unsigned long newbrk, oldbrk: newbrk increases heap from oldbrk
struct mm_struct *mm = current->mm: talks about and maintains memory regions of the current processes
down_write(&mm->mmap_sem): understand semantics later, but basically is a primitive to ensure we have exclusive access to the memory region of this process (may have to do in HW2)
Pre-processor stuff:

#ifdef is a symbol: do this, otherwise do the other thing (kernel can make config choices and one might be this functionality; either min_brk is set to end_code or start_brk)
if (brk < min_brk): brk is syscall, and supposed to be in address; so if requested access is in queue, don't do anything

goto out: quite disciplined, minimize code replication, and particularly useful for a common exit point

Question & Answer

Q:

Can you have a 'bucket' for every power of 2?

A:

Yes, but buckets are a design decision, so you have to consider where do you cut it off?

Q:

Can you be clever with virtual memory and have a fragmented virtual memory, but have it physically contingent? Perhaps it would be more compact

A:

RAM chips don't care where the bits are. It will satisfy more-or-less a consistent window of time for any address

Don't have to worry about keeping things contiguous in memory chips (not worried about fragmentation either)

Fragmentation -> worried about the time it takes the memory manager to scan indexes that it contains

Q:

Does malloc always retrieve memory?

A:

It's a virtual lie: runs out of memory (i.e. OS not willing to give us) then malloc will fail; so test the return value (null -> no memory given)

Back To Navigation

Tutorial 7: Introduction to Linux Kernel Modules

Base Kernel Code vs Modules
User level vs Kernel Level code
Exercise on Robin's website

Advantages

Can modify kernel without need to recompile base kernel
Debugging and maintenance is easier

More constrained version of code; adding a function - only need to debug that section isolated from kernel

No performance penalties
Adding new functionality

Base kernel <- LKM (new functions adding to kernel that are isolated from it)

Adds new functionalities to our OS)

Disadvantages

Won't be loaded until the system boots
Might cause performance penalty
Could potentially break your system

Even though you're in userspace, you're adding code to kernel (interaction with OS)

LKMs Uses

Device drivers
Filesystem drivers
System calls (add new ones or override existing)
Network drivers

Back To Navigation

Tutorial 8: Introduction to Linux Kernel Modules

Finish exercise

Back To Navigation

Week 6

October 16: Kernel Memory Management

Recall brk() system call in LXR from last lecture
Do not be intimidated by the code
246: SYSCALL DEFINE1(): macro that defines entry point we can jump to

Entry point held in the system call table, which is accessed by system call dispatcher team, which is reached via the IDT

brk: end of the dynamic data section (address keeps track of end of where valid allcoated memory is in PAS) - oldbrk and newbrk -> expand heap and need to keep track
250: mm_struct *mm = current->mm: gets value of current (current process symbol is a macro that refers to the currently executed process)
253: down_write(&mm->mmap_sem): on a member of mm_struct

Controls concurrency, as many processes could be executing brk() at the same time so you want a way to serialize the access to these things
Control concurrent access to the rest of the function

297: up_write(&mm->mmap_sem): releases lock of this critical section (which does some sensitive work)
255: #ifdef: inheritance occurs here - preprocessor language

If define, code does one thing else does another; basically semantically same
min_brk set to end of code or start of break (basically same values)

Recall stack (code -> data -> heap -> stack)

260: if (brk < min_brk)

brk is the third arg of SYSCALL_DEFINE1(): refers to argument of syscall
man 2 brk: argument to brk - PAS address in and asks to expand the heap to the address given

Ask heap to expand (kernel free to listen or not)

If brk is less than min_brk, ask heap size to expand to where it already covers

Asking for something you already have

269: rlim: local variable = current (executing task struct) which has a signal member that has a member of an array of resource limits (probably a macro that refers to a slot in the array) which has a rlim_cur member (how much data the process can have)
270: (brk - mm->start_brk) + (mm->end_data - mm->start_data): Requested amount of memory minus the requested size (start and end) + the current start_data

If either of conditions are true, goto out

274: newbrk = PAGE_ALIGN(brk): kernel changes to fixed sized chunks

Macros typically capitalized

275: oldbrk = PAGE_ALIGN(mm->brk): page align to current

If they're equal, goto set_brk
Request something you already have (label)

Lines 279 - 299

281: do_munmap for memory unmapping

Main work (291): if do_brk checks if brk changes, and if it doesn't then it's equal to old_brk -> failure

Look at it by clicking in Function -> mmap
Allocates new memory region or expands current
1993: arguments are address and length, which returns an int value
1998: struct rb_node: data structure embedded in memory region mappings
2002: len = PAGE_ALIGN(len): adjusts to ensure a sane value
2003: if (!len): if = 0, returns (if brk is 0 then you get current value of brk back)
2063: vma = vma_merge(...)

Could be the case that heap is expanding into or over existing anonymous memory mapping, so you can just absorb it (merging virtual memory regions)
2071: kmem_cache_zalloc(...): if not, the kernel allocates memory (its own memory management)

Lines 2077-2083

Standard manipulation

Later success, you do 2085: mm -> total and return new address

295: out: retval returns new brk value
Kernel can't depend on user-level programmers to get it right, so lots of checks happen
Pattern in kernel frequently: does a little work then delegates to other functions

Example: DEFINE3(read)

System called dispatch when called
Userspace: print(f) -> write (glibc) -> int 0x80 -> kernel (CPU service)
374: struct file used so kernel can manipulate them

Based on file descriptor, get a handle on it

379: if (file): file not null (file structure that corresponds to a fd) then do work for user, else bail
380: file_pos_read(file): first find what offset you are inside the file
381: vfs_read: in kernel, start with sys_read -> vfs_read -> read -> fs -> device

If file structure has f_op and has a read member, then go ahead and get function pointers (gets address)
Click read() and get a lot of results

Function pointer's value is valid at run time - unsure statically

Delegating work - sys read, vfs_read (generic) then get to a function pointer which is delegated to (many different file systems on machine) a particular implementation for a particular file system

File system must find physical system

Lesson: don't get frustrated, just keep digging

Example: `grab.c`

In main:

Defines struct aircraft a data type with three variables
Fields of aircraft are set
Prints various sizes
Where is aircraft
Use glibc_test

Receives a pointer
malloc size of aircraft
memcpy a variable, the parameter
Cast a into c, and rewind by 8 bytes (8 bytes before what malloc returned to)
Start incrementing pointer byte by byte, and printing what's there

Output:

Sees various sizes
Address of a: local variable on stack (addresses on stack usually 0xbf...)
glibc allocates memory off the heap and confirm it's on the heap
Rewinded 8 bytes, so beginning of temporarily allocated structure -> malloc returns to a location (the args/variables)
Padding happens
Shows how glibc malloc works: allocates chunk of memory and returns a location (the data before it refers to other things) - can touch that memory and manipulate it

Data is embedded into a data structure that malloc used to maintain
Nothing preventing to modify it

Question & Answer

Q:

How to know if it's a macro or system call?

A:

Macros are typically capitalized, and also by learning, and it doesn't really matter because macros are ugly but the effect is the same as system calls (transfers control to code)

Kernel doesn't want a full function call even if they're split

Q:

With constant transferring of code, what's the bottom layer? Where do you end this delegation?

A:

Often find bottom layer in device driver. Can't delegate forever, but it's a nice pattern - chunks up and separates out responsibilities in byte-sized chunks

If you have one function that implements what system read does, plus (...), it's totally unreadable, full of bugs, unmaintainable

Nice software engineering path - sys_read is a nice API -> VFS provides choices for features of what file system you'd like. Doesn't care about how data is organized (CD-ROM, memory, etc.)

Division of responsibility is nicely reflected in the code organization

Eventually grab a bit off the physical reader

October 18: Kernel Memory Management (cont.)

Kernel is just an ELF (big program)

Kernel Diagram (with brk and mmap)

Bigger image

Kernel calls syscall section, and part of the interface has to do with memory management (brk, mmap)

Two pieces of functionality that kernel exposes to user-level programs (i.e. they can invoke syscalls to somehow get more memory)
brk and mmap are semantically different:

brk is responsible for increasing the heap size (a particular memory area in PAS)
mmap is more generic - it maps files into address space (can create anonymous memory regions)

mmap manipulates any part of the PAS you want to, and brk is for heap
API that kernel exports in terms of memory management
Simple interface - giving away memory in terms of pages (want to increase the heap by 1 byte but you get a page instead), because kernel prefers to deal with memory in paged chunks (simplifies implementation for system calls)
Internal implementation of brk: change the characteristics/properties of the heap memory region (adjust start, mostly end of heap)

Lookup memory region in current PAS, adjust some properties, etc.
Relatively simple job - a design decision to keep kernel simple
Hand you pages and not worry about memory management and what you do with the pages

Simple interface not naturally used by a client
Don't use brk because you already have malloc

malloc: library routine

User-level programs call these
No special privileges to invoke brk - hopefully kernel is doing the right amount and in the appropriate fashion
brk returns new location of the end of heap (valid address where memory extends to)
Don't need to bother with brk and mmap - user doesn't care what's happening in memory as long as it gets the end result - a further abstraction

Don't need to deal with looking at memory and error checking

One of malloc's aspects: don't need to know the underlying system (malloc in theory should work in windows box) -> abstraction
In essence: want to know where in memory you are

Signature: void *malloc(size_t n)

brk deals with memory in page sized chunks, but user level programs don't want to deal with pages, they deal with pieces of memory that are particular types of sizes
malloc offers specialization - interface in memory in arbitrary sized chunks (fine grain manipulation of memory)

Example: aircraft

Code and and details on lecture page

What's the relationship between malloc and brk?

Does malloc ever call brk? When?

Running the program, a process loop 'runs forever' which copies fields into a newly created instance in struct aircraft, and iterates over memory byte by byte

If you let a program run a loop forever, the program crashes because it will reach the last byte, then try to add one when there isn't one

The CPU adds 1 to current value of c (go fetch content at that address)
The MMU looks it up in the page tables, and finds no entry there; it means the page was swapped out to disk (but not this time)
CPU services the page fault, but it's not even valid. It's not in the address range that defines heap
SIGSEGV occurs

Lessons:

malloc would like to think its allocating an object for us, but it's not. It's sauce on top of brk

It gives us some number of bytes at some address, and says that the bytes (beginning at a, goes to n) ranging from a -> a+n is valid and can be safely accessed. But it's just some pointer in somewhere in the middle of the heap and can go back or forward if we want to (also might be other objects in the space)
Given any other address space, we can access any address we want to
Implication: can access or manipulate content of object in the heap that we want
Interesting objects, but the only thing that stops us is if we happen to walk outside the heap address range

The CPU is not actively interfering
The OS is not because at this level, the distinction and classification system does not exist. The compiler checks that you have the right source code and it produces x86 assembly
The only memory region and heap or other anonymous memory mappings are created in mmap
The only time part of the system is not going to let you do something is if you run outside the bounds, or you try to access memory area outside of where the bits actually let you

void pointer: address, so a holds 32-bit address in memory region heap (because malloc is accessing the heap, which is default memory area)
No explicit resource limit

If you invoked malloc enough where every byte in the heap is allocated, malloc notices that it has no more free blocks and asks the OS for help

Invokes brk and gets more pages
Effect is that the heap end increases
The OS doesn't have to worry about keeping track of where individual objects are in heap, it just grows/shrinks it
Up to you to provide efficient dynamic memory allocation

The kernel doesn't want to implement glibc malloc inside itself. It's buggy, messy and error-prone code

Example: `give()` and `recycle()` Pseudo-Code

Implementation of malloc: takes some number of bytes and returns a pointer to you

Imagine you have the task of implementing your own memory manager:

void *give(size_t n); // malloc
void recycle(void *addr); // free

Allocate bytes and return some bytes (malloc and free are similar)

void *give(size_t n)
  // Obtain memory (stores pages so it gives pointers out for programs)
  // Keep track of bounds (brk does this for you as well)
  // Need a data structure that tracks allocated blocks
  // Need a data structure (could be the same) that tracks free blocks (holes)
  brk(); // Check that you're uninitialized, get memory, keep track of it, etc.
  // Heuristic code goes here which informs the approach of what memory to give away

Use the heap or own memory management? No heap - more control, change permissions, etc. and would not interfere with other objects

a= give(4); five(100); give(5); ...free(a); give(3);
// Find 4 contiguous 'available' bytes
// return address if there are 'available' bytes

What happens when you give(4)? It gives away 4 bytes
The decision to make what the page look like uses the heuristics we talked about (first, best, next, worst, fixed)

void recycle(void *a)
  // Error checking, etc...
  add_to_freelist(a,n); // This address for some n bytes is added to a list - perhaps an index with pairs in give()

Perhaps a homework question

Bucket Allocation

Deals with free memory chunks

How does the kernel allocate memory for itself?

Not malloc since it's for allocating bytes in a PAS, not the kernel

The kernel has its own internal memory management

Ex. The user fires up a new process

Runs a new program (fork and execve) and it allocates space in PCB where kernel does on its own behalf

The kernel has different implementations of this functionality: kalloc
Refer to Kernel Diagram

Zone allocator - physical implementation of memory
Talks to buddy algorithm, which maintains the bucket approach to allocate memory
Particular memory allocator SLAB, SLOB, SLUB (more details on this blog post)

More information on census labs

Buckets contain pages, so any time the kernel needs small ones, they get the size, otherwise they get chunks

Can borrow from another level for memory

Question & Answer

Q:

You're using data structures give and recycle. The implication is that not a lot of data structures are used. How about malloc data structures?

A:

There are static data structures, but it's not a solution

There is an API for dynamic memory allocation. So information about what blocks are free/allocated is embedded in the heap

Print out 'around' a (in Example) (a little before, a little after a -> other numbers that glibc uses to find the next block)

Q:

The kernel's memory is set in physical memory. What happens when all dynamic memory is used up?

A:

When it's all gone, you have to pick somebody, but there's nothing to reclaim

Memory allocation forced to do it by user space (if it can believe it runs into infinity. Kernel does as much as it can; if it does something ridiculous, it fails)

Back To Navigation

Tutorial 9: Building a Kernel from Source

Kernel vs Distribution

Distribution is built on top of the kernel
The more stable the kernel of the version is, the less likely your distribution is going to crash

Find it on kernel.org
Latest version is 3.11 and we are using 2.6.32
Download source code -> configure -> compile base kernel code -> compile modules -> [install modules -> install kernel files - do at home] -> modify bootloader (GRUB)

Finish before next tutorial as you're going to add more functionality to it by next tutorial

sudo yum install <PACKAGE>: installs anything you're missing
make menuconfig: configures the kernel with an interface
make <SPECIFIC SECTION> (like modules_install): minimizes the work make does

Back To Navigation

Tutorial 10: Building a Kernel from Source

Frequently use Snapshots (in case you mess up anything)
Continue exercise; don't concern with logic, just follow steps

Back To Navigation

Week 7

October 21: System Startup

Recall

What processes are, how they come into being, how they are given address space
Utility of modern OS is predicated to do many things at once
How do you enable multiple processes to execute concurrently on the same hardware on the same CPU?

Test 1 Comments

Question 1: system call API and semantics, how it operates compared to regular system calls
Question 2: 5 fields of process control block, the metadata
Question 3: process life cycle
Question 4: fork important to system call API and process life cycle

Homework 2

How OS handles system call - when you call, it links to the system call table - does it directly jump to code, or is there an intermediary function?

Interrupt happens, system calls dispatch routine (looks up in table) then it goes to entry point to that particular system

Can't write in UNIX unless you have a modified kernel

Problem of Question 1: not to test x86 assembly, but how system calls are invoked (particular way it happens)

System Startup

In bootup: Start VM - exposes virtual CPU, screen blinks, logo appears, then the GNU GRUB starts up

GRUB: small program that runs that gives you a menu to select

Can select any OS you want to as long as GRUB knows where to find it

Hand off between hardware initializing itself and the OS actually being able to run and take over

Pick a kernel - screen is blank, then you are watching the details of boot

When you see the output, the OS is already running - it's initializing services (i.e. creating and running user level processes)
Not booting, it's already booted and is running a number of programs (sets up various services)

What happens in between the time of selecting the virtual machine? Why does it happen?

How to take hardware from an unpredictable state and get something useful out of it?
Between selecting and the process (page 3 and 4 on the Slides) is a lot of work

Startup is interesting:

(1) Transition from sequential to concurrent execution

Likely not doing two things at once - code does one thing at a time
Hardware does one thing at a time, it literally cannot do more than one thing at a time

(2) Bootstrapping

If you're executing instruction after each other, how do you index them to do different things or switch between them?
CPU multiplex resource - sharing insequential resources among many constructs/virtual CPU (a process)

(3) Fundamental OS job of hardware initialization (servicing interrupts and exceptions)

Multiplex state so it's useful

Bridges startup to interrupts, scheduling, concurrency, processes

Low level hardware events - OS is an event handler
How to take state of hardware to get into position to user level processes

Startup takes hardware in unknown state and imposes enough order on it so that the kernel can run (and spawn userspace processes)

Before thinking about memory management (PAS), first job is to: turn hardware on and get into a useful, known, predictable state
Want to run mode but need to get hardware in a state

Bootstrapping: relationship between ROM, BIOS, bootloader and kernel startup code

Known code in read only memory and when it powers on, it transfers control to that code (small size, may know a little about hardware it's dealing with )
Minimal code to full blown multi user OS - chain of handoffs

Startup Overview

Initial startup program in ROM: hardware reset, load BIOS

BIOS comes up and its jobs is to make sure the devices attached are in a good state (make sure everything works) and then hands off control

BIOS program: initialize devices, transfer control to bootloader

Chain of trust where OS doesn't believe what the BIOS does since it was written a long time ago

Linux reinitializes all devices

Bootloader (LILO, GRUB): choose OS, boot options, transfer control to kernel load return

Grub: choose OS and maybe pass parameters, and from there kernel is alone

Each handoff is minimal work to get to the next stage
Main idea:

Load OS kernel into memory
Initialize kernel data structures
Create tasks
Transfer control to them

Setting up the Execution Environment

Initialize hardware (IRQ lines, etc.), doesn't trust BIOS results
Prepare IDT and GDT

IDT: intimately related to state of hardware (once you initialize it, you need to map code that handles it to the interrupts)
GDT: describes internal memory organization (if you start computing without memory management, not going to do well)

Create OS data structures (task list and whatnot) for kernel operations
Initializes core OS modules

Scheduler, memory manager, etc.

Enable interrupts

Implication is we've been executing without interrupts (good thing because you don't want to be interrupted when setting up key data structures)
Don't have concurrency until this point - no ability to do multiple things at once
Willing to receive signals from hardware and software interrupts from processes running to deal with concurrency

Create null process (i.e. swapper), create init process

Details differ between OS, but a few processes are created
init: process 1, and null: process 0 (does nothing)

Null: be present and be prepared for scheduling

If OS can do nothing else because all processes are sleeping, the OS can switch to this one

Two processes that exist as well as the scheduler, where it can switch between them

* Switching between the two is how you get concurrency out of a sequential procedure

System can do a little bit of one job, then stop and so forth. It does it quickly enough that it looks like it does multiple things at once

Timeline (for Linux Boot)

Setup function with control flow
Start in something sequential in assembly code, and kernel_thread (creates a kernel process) executes init first
BIOS (no OS, random RAM)

-> CPU.RESET pin (cs, eip = 0xfffffff0)(ROM)

-> BIOS is here

-> Real address mode

-> no GDT, no IDT

Bootloader (LILO, GRUB)

-> MBR holds bootloader loader

-> Main bootloader reads map of available OS kernels

Assembly language: setup() "function"
Assembly language: startup_32() (decompress)
Assembly language: startup_32() (setup PID 0)
The start_kernel() function

-> ...kernel_thread() (create PID 1 /sbin/init)

Main job: locate and retrieve kernel image
Other jobs: POST, copy 1st sector to 0x00007c00

Linux Startup Code Chain

Setup()

startup_32()

start_kernel()

+sched_init()

+kernel_thread()

/sbin/init

/etc/rc.d/init.d/...

An Ode to Real Mode Setup Code poem by Michael

If you can come away from today (+ reading) and understand it all, it's a good sign
Poem describes the control flow and phases of what's happening
In details of the [poem is the relationship between hardware and OS as it comes into being
Early half of procedure is intimately tied to details of architecture

LXR
301: calll main: entry point of where kernel is
128: void main(void)

Code you execute in beginning has to be relevant to hardware (you're running x86)
176: go_to_protected_mode()

Tied to hardware. If you look at x86 CPU, you can be put into different modes (startup in real address mode) and protected mode (no memory protection) means you've got paging
From real to protected -> separate user and kernel code

104: go_to_protected_mode(void) sets up IDT and GDT

95: static void setup_idt(void) loads IDT

startup_32: sets up procedure that uncompresses the kernel
__init_startup_kernel: scheduler is created, most subsystems exist. Kernel is in a state that can do things
init_refork rest_init(void): creates kernel thread flags, and then kernel_thread(kernel_init)
kernel_init: invokes init_post()
kernel_execve: invokes system call from kernel space
kernel_thread: implication of do_fork which creates a bootstrap (context for a single process to exist)
sys_execve: basically replaces anything that executes

Conclusion

Kernel boot code creates control thread and process is responsible for running kernel init function which calls this procedure

We have a process living on the machine. Kernel init tries to run programs which tries to create another process so the OS can switch between them

Try to get hardware into a state that executes more complicated code

Once it sets up its data structures (stop here, OS not very useful as it won't be doing anything) the kernel then creates internal code via call to kernel thread (equivalent to a fork) and creates two processes

Uses those to invoke existing startup scripts on machine, and job is to look at all the startup programs
Job of init looks into directly and executes it with fork_exec (/etc/rc.d/init.d/)

User level functionality - code doesn't care about or need it but makes machine useful to us

Now it's their job to start other processes on the machine

Create concurrency out of nothing by 'manually' constructing two special tasks after environment setup (and loading of kernel code) has been done)
The transition happens when (0) these tasks exist and (1) turn on interrupts and (2) invoke the scheduler
Memorize major phases of what happens

Question & Answer

Q:

Even without the process, do we have concurrency because we have interrupts?

A:

Between the time of interrupts and creating processes - no, you're switching away to something else. What do you come back to?

By creating processes, by default we have options: Particularly, let's say we're interrupted in the sequential procedure. You return by making sure everything is initialized

Q:

What happens if the OS isn't found?

A:

The BIOS starts running, and transfers to bootloader. GRUB believes it should jump to somewhere, but there's nothing there so the machine won't start (gets no functionality). Can't do anything because there's no code there

Back To Navigation

October 23: HW1 Review and HW2 Hints

Perception: somehow material in lectures, tests, assignments and tutorials are not related

In Michael's shoes: Taught college-level courses for 10 years, and has a clue of how to teach. If you accept that as a pre-condition, what is Michael thinking?

Not OS, giving theory homework. Draw a connection

What major concepts have we talked about?

What is OS
What is process
Kernel roles

Illusion of unlimited CPU/memory

Memory management (userlevel, kernel)

Internal, external fragmentation

Virtual memory

Paging

System calls
Protection
Isolation
System architecture

Wiki, man pages, professors, textbooks, LXR, Google, etc. for resources
Learning process: reflect on what you did, and what Michae expects you to do

Homework 1

Problem 1: where to find authoritative information? Important skill to have

Does not care if you memorized facts, cares that you know where to find stuff (man pages)
Form of this exercise is to get you to look and search

Use man pages and the "see also" section

Problem 2: learning by observing what the system does

var log stores important messages
System log: what system is doing, and hw2 (output from kernel module to this file)
Manage device interaction
strace prints out system calls

Problem 3: what critical components of an OS are there?

OS supports concurrent processes
What's important, what's not; what belongs in the kernel
Distinction of OpenBSD (default processes) and Linux

Problem 4: give real experience with what kinds of functionality there are

Mystery: data, tell what's going on
Used to writing code, but that's not everyone's thing - use prowess to analyze data

Assignments are a tool to asses how much you've absorbed

Homework 2

Recall previous hints on October 9: Recap and October 21: System Startup
Clapclap to Michael who made HW3 easier

Problem 1

Userlevel program and consulting mmap man page
Should not take that long. mmap is what you want
Understand how to create memory area at run time where you can put code (implication is that memory area has set of permissions - r-x [read-execute])
Invoke 2 system calls
Demonstrate understanding of system call calling convention for x86 Linux
System calls different from function calls, related to protection
Implementation should be straightforward - file with sequence of bytes (10-20 bytes) and they represent x86 machine code (not ELF); just a file of raw bytes
Produce in hex editor (Under Programming tab, Bless is installed) or command line (Check Piazza post by Michael)

Echo hex bytes to some files, if you know hex values of x86 commands you can fire up NASM and copy program as a basis

getpid is wonderful
Exit doesn't work, but there are other system calls

Problem 2

OS does ASLR, but the point is knowing how OS keeps track of memory regions, virtual memory and memory process on a machine

Related to 'where to look for info for processes and PAS?'

LXR (code), file for memory regions - /proc/PID/maps

Looks like a file system, but it's an image of various pieces of kernel memory so user doesn't have to modify kernel source code

Question 0:

Write a program that forks and look at process memory regions to see if their addresses are different
Terminal and background processes - examine the pid

Question 1:

Intent is to get eyeballs on LXR

Hint: name directories in base directory of Linux (mm, usr, arc, documentation, fs - file system code)

In this directory, it may talk about process list

Code that accesses data structures created to memory regions

Need to do something? Has kernel done it already? Look at that code

Question 2:

Memory regions as Linux defines them, talks about data structures that keep track of memory regions namely mm struct0, precisely vm_area_struct
Does the data structure actually contain code?

Question 3:

proc/pid/maps: user-level script - posted on Piazza
Collect info using script and do analysis on it

Question 4:

LKM: practical skill and contrast for modifying kernel source code
Extend kernel functionality, write LKM that walks process list - task structs, PCB in kernel that keeps track of all processes running

Attributes of PAS - collect them to provide you can read kernel source code and understand how to access them (concurrency is hard)

Hint: observe how other parts of kernel walks across list (OOM killer, proc/pid/maps)
LKM tutorials on Wiki

Question 5 (6 in the question):

From userspace, observe information then compare the approaches - both are accessing same data and data structure. Which is the better approach?

Problem 3

Question A:

Add a system call to the kernel (doing in tutorial) that has particular signature and particular effect. What's that effect? Simply marks a process in a certain way (in a POD)
Add a field to task struct that keeps track of how long you're in POD

Question B:

System calls - write value in a new attribute of task struct and PCB and have another part of kernel interpret that state (modify behaviour based on this mark)
Demonstrates that you're able to access documentation and code related to PCB, general OS concept, that helps keep track of processes

Question & Answer

Q:

Is there any similar system calls for last question?

A:

Use reeducate(2) and there are other system calls that fetch or set values (one way to determine is to look at syscall, unistd.h and read man pages) but in tutorials, you will implement part of reeducate.

System call itself is simple - write a few fields

Back To Navigation

October 25: Process Scheduling

Links to various pieces of source code on Wiki
Chapter 4 more detailed (add to readings)

System Startup

OS can transition from sequential code execution to concurrent execution. It:

After sequential procedure of setting up bytes, it initializes scheduler
Gives scheduler two processes to operate via do_fork and sys_execve(swapper process 0, and init process 1)

Does it through standard system calls that allow you to replicate PAS

Enables interrupts
Allows newly created scheduler to begin choosing between these two processes - scheduler has work to do
What happens immediately is that 1 forks and execs some number of processes

One of the processes (init) creates children via do_fork() and sys_execve()

Problem of Scheduling

As more processes are created, and some hundreds or thousands of processes come to exist, how does the OS choose which process should be given to the CPU?
Ex. Have 5 graduate students and want Michael's attention. They all show up at the same time at the door, so who to pick first?

How to make decision? How to fit unlimited tasks in limited amount of resources?

Pick one at random
First come first serve
Priority / Urgency / Favorites
Estimate of workload, most work last
Alphabetically
Save history of service, note progress

Work on newly complete
Progress on incomplete

Equal slots

Round robin

Clone self and deal with others (physically impossible)

Scheduling is a mechanism that creates the illusion of simultaneous execution of multiple processes

Important to remember. At a human time scale, say you have 50 minutes free and give each student 10 minutes by starting alphabetically. Then all resources are exhausted
Can't service all at once - give them 3 seconds and make them feel like they're being serviced at once

Modern CPU operates quickly - it can do roughly a billion processes per second (gigahertz)

Instructions that use a single clock cycle

Considerations for Scheduling

Scheduling metrics:

Time
Work type
Responsive (delay)
Consistent (delay variance)
man 1 nice, man 2 nice

Progress is #1

Process that is running actually progresses through its code (i.e. does stuff and gets things done)
Goal of scheduler takes that collection of work and takes it to ensure that it makes progress

Fairness

Priority

One process is more important than the other (ranking)

How to balance fairness and priority?

Tightly linked is the notion of efficiency of the scheduler

Efficiency (of scheduler)

Scheduler should be fast to decide which process to pick
Overhead to making this decision and keeping track of all the data necessary
How to balance time it does stuff and allocate to get work done?

Quantum

Time scales, slice of time you want to give a process to make progress (should be reasonable)

Pre-emption

Quantum

Default amount of time given to processes to accomplish progress
Designing a scheduler, you have to pick a value for quantum
Implications for too small/short: context switch overhead is high
Implications for too long: destroys illusion that scheduler wants to provide (concurrency)
From any program's perspective:

my_stopped_time = N * quantum.length

Where N is number of processes on the machine

What else might affect the selection of quantum?

Short quantum and long ones are bad, but relatively long quantum should be a good one. A long quantum does not make interactive applications unresponsive, they have a relatively high priority so they can pre-empt other (batch) processes

Relatively long quantum: does not make processes slow, because we can fix it

We have control over other features that can distinguish between background/non-interactive tasks and interactive (priority should be boosted)

Pre-emption

Ability for OS to stop a program from running

Kernel can interrupt a running process at any point in time or when its quantum expires

Enables: "Do something else more useful"
Pre-emption does not imply suspension
Pre-emption in Linux 2.6 can happen either in user mode or kernel mode execution
Algorithm: fetch code, decode and execute driven by the value of instruction pointer

Most CPUs fetch, execute, and check for interrupts

Ex. Clock - how much time (quantum) happened since the last time the scheduler was actually invoked?

OS steps in, stops the current process, invokes scheduling priority, picks the task, and allows it to continue executing
Pre-emption: allows it to pre-empt a currently executing process

Process List vs Run Queue

Process state, life cycle, etc.
Logical conditions for candidates and processes actually on the machine
List of all processes is logically distinct from the data structure used to organize processes into groups ready to be scheduled
Concepts are different, but implementations may share a data structure or be two completely separate lists
Indexing into list is separated in Linux, data structure contains only those running (actually candidates)

No point in looking for entire task list (if not runnable) in the decision
You make a quicker decision if considering only the ones capable of using CPU blanket

Priority

Most scheduling approaches/algorithms can be boiled down to the generic concept of 'priority'
Different approaches calculate different absolute or relative priorities for the collection of runnable processes
We have control over this number by nice(1) and nice(2)
First come first serve - reasonable
Round robin - fair, give pension to everyone

Estimate of workload - knew how much time it could take up, so take care of short/long ones first and throughput increases (get rid of processes in minimal time) but work required is unbounded
How to know what work is required?

Boils down to priority selection - who is most important to run now?
Most important may be one of the above

Scheduler uses some image of round robin but it's quite hard to arrive at a priority value

Establish dynamically calculated priority on every runnable task on system
In priority calculation: what kind of task is it? (interactive - more responsive, batch, what bound, etc.), how much attention has been paid to this task already? (hard to know, OS plays guessing game)

Parameter that allow effect of priority - nice(1)

Nice: not greedy. Intent is if background processes can be done in two weeks, put low level long term computing that doesn't interfere with other users' shell program (lower its priority)

Or increase priority to root only, not to user
Nice affects process behavior

Types of Processes

Traditional classification

CPU-bound

Many CPU and memory operations: scientific computing, financial software, rendering
Is procedure primarily using CPU and nothing else? If you're doing math and using functions then you're CPU bound

I/O-bound

Much use of devices; may spend time waiting for input
Is process interactive? Is it a background procedure?
Linux: no deadline, but real time is labelled as such and higher priority

Every task comes with tight bounds and a set of deadlines, but it's not what a multi-user OS uses

Alternative classification

Interactive
Batch
Real-time

A batch process can be I/O-bound or CPU-bound (database, rendering)
Real-time programs are explicitly labelled as such

Scheduling Policy and Algorithms

Conflicting Objectives

Fast response time (processes seem responsive)
Good throughput for backgound processes
Avoid 'starvation'
Ensure all processes are getting some attention
Interpretation of set of rules to govern this behavior
Scheduling policy: rules for balancing these needs and helping decide what task gets CPU

Scheduling algorithms

Ex. The one with least attention goes first: high throughput but process with complex problem doesn't get serviced, starves and dies
Round robin: everyone gets equal share of time

Round Robin Scheduling Algorithms Diagram

Bigger image

Ex. R, A, B

Students have an arrival time with workload estimation (amount to do)
Time quantum could be 5 units and context switch overhead is 1 minute
Arrive all at same time, with different units of work
Timeline with first come first serve: who to pick now? Pick an arbitrary ordering

A, B, R - tiebreak order (alphabetical)

A runs for 3 units of time (but quantum is 5)

FCFS: A only has 3 units of work to be done. Typically we associate quantum only with round robin

When A is done, it will be done and invoke exit()

It's going to call the schedule function (executing inside itself so it can pick next one)

What about context switch? Quantum only means something in the round robin approach, and context switch can be ignored
FCFS: shortest job, only run task and complete its work

Context switch time - on average task, context switch includes calling scheduling, swapping
Compare structure of different schedulers because they're radically different approaches

Question & Answer

Q:

Is there a value that looks for priority? If I have a window in the foreground, will this get serviced more?

A:

Yes, Windows uses this approach. The current process that owns the screen get more relative time slices (distinguishing between task that's interactive and a background task)

Q:

What's the difference between nice and priority?

A:

Nice: data that goes into a calculation for overall priority. By default it has a nice value of 0, so it's not taken into account

Q:

Can you have two processes that come at the same time? Is this possible because of the bus?

A:

In a single CPU, you can't, because it would be executing fork and it can't do two things at once

In a multi processor system, there is no reason you cannot have it. One part of the OS can be forking and the other part can do it too

Q:

Does the scheduler look at the exact arrival time, or 'since that last time it ran'?

A:

No, processes come and go all the time. Basically you have a set of processes that do not care when they've arrived; it's just a collection

It does matter in a window, how much attention you pay to a process

If A has 50% progress, it's likely not to be scheduled right after

Q:

Is there a context switch time in the FCFS scenario? Since you only remember the context when you remember the last process

A:

To load the next process, there still needs to be some context switch

In the diagram, the ticks are calls to the schedule function and part of it is the context switch

Q:

Do you need a context switch from 0 to 1

A:

No, we could but on the test it will explicitly say to count context switch or not

Back To Navigation

Tutorial 11 & 12

Work on exercise that helps start Assignment 2: Problem 3

Back To Navigation

Week 8

October 28: Clocks, Timing, Interrupts, and Scheduling

Policy vs. mechanism

A scheduler structure
Easy to look at memory manager/scheduler and complaint notion of what it's doing and how it's doing it
Distinction in both 2.4 Scheduler (earlier) and 2.6
Linux version 2.6:

Data structures and internal organization, how it picks next task
Code that manages what should be next

Assigning priority vs picking next task

Mechanism: which process should run next?
Policy: drives or controls that

Different approaches for driving priority
Mechanism enforces the decision policy has made

Process Life cycle

ready <-> running -> sleep/wait -> ready

Popular Scheduling Approaches

FCFS: nice, but not much responsiveness because arbitrary ordering and long jobs get clogged up on queue, short jobs may not run

Run to completion

Short job next: high responsiveness, long processes get shuffled to end of list
Round robin: fair approach, every process gets equal time slice and if we pick quantum well enough there is equal footing in responsiveness

Round Robin Scheduling Algorithms Diagram

Bigger image

Context switch happens in black box
How to know it's a decent approach? How to compare?

Metrics: empirical (simulation)
This is an artificial way of emulating scheduling decisions - doesn't actually matter when they arrive
Actual decision OS uses is different than contrived examples, but there are a few metrics we can use

(1) Wait time: note when a process arrives, and when it first gets serviced

Look at metric and keep it low - when process arrives, we want to quickly service it

(2) Turnaround time: ability to make progress, and ability to finish the work you have

From arrival to completion time, how long does it take for process to get work done?

Turnaround difference when it arrives to when it finished
In general, low values are good

Linux Support for Scheduling

Flags for scheduling policy

Has mechanisms that support multiple approaches to scheduling

Normal scheduling and real-time scheduling
Linux can use both by defining flags

Lines 35-38: SCHED_NORMAL is normal policy, SCHED_FIFO and SCHED_RR are real-time variants
Linux fakes real-time scheduling
Processes marked SCHED_FIFO and SCHED_RR is given a drastic priority boost, and OS scheduler schedules them first
However most multiuser OS don't use these real-time approaches. Linux does something else

Gut feeling when creating function to prioritize processes:

FIFO
Sort array efficiently somehow (heap)

How to calculate priority?

Nice is a value pointing somehow, how to calculate relative priority?
Just look at it -- but how?

Think about for loops

Code: `sched.c`

n tasks runnable, which one goes next? Write a for loop and iterate over it

Not in LXR, on Wiki
Schedule function - previous, next is important to keep in mind of processes
Check to use real time
Repeat_schedule:

Next pointer to task_struct
idle_task(this_cpu): if processes are sleeping, who gets to run? Have a process that's always runnable

Swapper process from system startup - process 0, if OS can't decide who should run, it runs this
Implication: select next to be idle process

c = -1000: some counter
list_for_each(tmp, &runqueue_head): Linux has internal data structure and macros

Pointer to task_struct, runqueue_head: every CPU has a runqueue associated (i.e. set of runnable processes) and it will be iterated over, storing the task we consider in tmp

p: iteration pattern
if (can_schedule(p, this_cpu)): test if p can be scheduled for the CPU
int weight: consider how much goodness task p has

Goodness: passes in task_struct
Weight is number to return. If in scheduler, and wants to pick another process and is iterated over every other process but current is still schedulable, it will run it

goodness(p, this_cpu, prev->active_mm): metadata, history of priority of process
if (p->mm == this_mm||!p->mm): meat of the code

Don't swap out memory, and take into account weight += 20 - p->nice then return
If p is a real-time thing, boost by 1000 and selects for real time purposes

If we iterate over task, eventually pick highest priority

Order n scheduler considers every task on machine
Each client has server thread, loop is long every time you call schedule (lots of work to do so don't want context switch to be long

2.4 vs 2.6 Scheduler

Linux 2.4 is okay, but how to do better? O(n)

Just pick the next task
How to avoid iterating over n processes to pick the best one?

Needs some sort of priority queue

Why use an algorithm when we can use a data structure?

2.6 scheduler

O(1) procedure of answering who to run next, is running in constant time
I.e. no matter how many processes you have, answer will always be a particular length of time - a known constant

Heart is put_prev_task and literally fetch pointer of next task

Maintaining multi level feedback

Pick next task is straightforward. Making it more complicated is that sometimes it considers other scheduling classes (i.e. other real time processes on machine)

Basically delegate to another function that picks the next task of this data structure

Run Queue

Every CPU has a run queue

There are fields, which point to a couple of arrays

One array is active and another is expired

Active: processes that haven't had a chance to run in this time slot
Expired: processes that have already run
Run queue has these two lists. Who should run next? What is 0th entry of the active list?

As it finishes the work or makes a syscall, does counting (drops off list to some position in expiry list)

Array is 140 processes long. Easy to ask who is next - if no one is at position 0, go to the next one

For loop: why not O(N)? Because there are only 140! Bounded by a constant

Governed by policy, backhand

Completely fair scheduler

Heuristics that determine which slot to fall into
Scheduler takes round robin idea and flips on head: in RR you pick quantum and that's how process runs

RR gives same time to every process. What happens if you own 25 and I own the last one? He's getting more service than you are. Similar problem happens in between processes

Completely fair relax quantum

Squeezes some progress in well defined space based on number of processes on machine, relationship to time

More processes, their share of CPU goes to a minimum level - cost of scheduler running (cost must be accounted for)

Requires some fraction of processing time
More processes: set a floor on the minimum amount of time that each process gets to run
Go to Scribe 2's notes for more quantum notes

Back To Navigation

October 30: Concurrency, Communication and Synchronization

Readings for today are the most important in entire course
Concurrency as a problem, not a solution

Ex. Michael asks a question and everyone answers at once - can't discern an answer

Efficient, but concurrency has apparently all processes executing at same time - not all it's cracked up to be. Must do it safely
Problem: trying to access the same thing

Seeds of a Problem

OS wants to convince users that machine is doing as much useful work as possible and that all applications are making reasonable progress and have fair access to the underlying system resources
Problem of concurrency is Interprocess Communication (IPC) problem: applications want to communicate with each other and share state. IPC is killer app for techniques that manage concurrent execution

If you have two processes, and are communicating through some buffer, then there are problems if synchronization doesn't match up between the three elements

Focus Question

How to make sure this works safely and correctly?
How can two otherwise independent tasks coordinate their access to a shared resource?
Once we leave sequential, serial execution behind, we open up an opportunity for incoherent access and updates to data

Concepts

More information on Michael's Wiki page
Race condition

Situations where two or more tasks compete to update some shared or global state; such situations cause the final output to depend on the happenstance ordering of the tasks running rather than a semantically correct computation
Essence of interaction is you have a race between process1 and process2 to access the buffer. Results are not predicated on algorithm, it's predicated on interleaving of the two jobs involved in interaction
If result relies on arbitrary ordering of processes' race, it's a bad thing

Critical sections

Code regions or slices that contain instructions that access a shared resource (typically sharing some type of memory or variable)
Multiple tasks or threads entering the same critical region without some control or coordination mechanism will interleave with each other in ways that produce race conditions and (probably) incorrect output
In some sense a critical section is: (1) a code location or sequence of instructions (2) an environment and set of conditions
Concurrency problems: shared state i.e. variable, data structure, some resource that two or more processes are interested in reading/writing or checking state of
Instructions in processes that access the shared state make up a critical section
Set of instructions that operate on some shared state (Ex. mov [var] in p1, mov[...][var] in p2)
Tempted to think of it as code. That's fine, but not so much as instructions that are involved, but by the shared state

The code that manipulates the state are part of the critical state

"Is this a critical section?" Can't tell by sequence of instructions; is data involved in that sequence potentially manipulated by multiple threads?

Mutual exclusion

Only a single task should have access to a critical region at a time
Enforcing mutual exclusion helps serialize and normalize access to critical regions and the memory, data or state that the code of a critical region reads and writes
Don't let the shared state happen
Doesn't let multiple threads manipulate shared state at same time
If you allow multiple processes to access one critical section, implication is that they are all at different offsets (executed at different instructions in critical sections)

Bad, so need a way to enforce one task

Classic IPC Problems

Producer-consumer

Some shared buffer
You have something (program, device, etc.) producing signals, noise or data
Do something useful - you are capturing it and it's efficient to buffer the event
When process has events to generate, store in buffer big enough for purposes
Something else observing the events and reading out of the buffer
Two threads p and c (in a global shared int array)
p generates random values, writes to b, and c reading
If p attempts to write to buffer that is full, it goes to sleep

c wakes up and reads from the buffer. Hope that it's able to tell when it's empty (test condition)

Testing another shared state - both p and c are reading/writing that value so we need something to do the coordination

Dining philosophers

n-1 chopsticks, as someone is left without it

Readers - writers

Producer-consumer is a variation of this
Some threads want to read state, some of them want to write state; many threads can read at once but when you want to write, how to give it priority or not?
Many parts of kernel can affect the critical condition at once: readers can do their jobs, writers have to wait until the readers drop out

Practical Example

FIFO example
history | grep "awk"

OS has done things right. Two processes are executing at once - history and grep

Both are sharing a state

The pipe: magical utility that is a shared buffer

Whatever history produces, it is put into the shared buffer while grep attempts to read from that buffer

How does it work?

The OS is designed for this, and sets up and controls access to buffer
history runs for a bit, then takes off; grep runs for a little and reads, then CPU pre-empts grep and sometime later history runs and produces more output

Example of successful concurrency

If using OS to communicate, it is hopefully implementing correctly and enforces access to a shared state in appropriate fashion

history (write(2)) and grep (read(2)): ask OS to manage access state

Example: `object.c`

Simple program defines a data structure, with 3 fields

Job of program is to manipulate these fields and print something about the data structure

Sequential single-threaded program
Look at code manipulating an instance of this thing

do_work: manipulates a few fields of the data structure instance

What if you add more to do two threads? Thread 1 and thread 2 executes do_work at same time

At user level, hopefully kernel doesn't get affected
Executing the same tmp->fields is interesting

Look at the assembly code (Ex. X.field1 = 100)

Not atomic: one statement cannot be accomplished in the scope of one machine instruction
Potential exists for the CPU to start executing a line of code, be interrupted, suspend current processes and executes another sequence of instructions (e.g. like those from the child) that reads or writes the shared state
mov the value into memory address that is calculated by [ebp-4] and treating that as address: several cycles
Single instruction, kernel can decide that thread1 is no longer permitted to use the CPU; someone might be more important
Pointer dereferences, manipulating a field - not one atomic assembly instruction. Translates to 2-3. Uncontrolled interleaving of instructions
No guarantee of computational results which is a bad thing
Essence of critical section (multiple threads manipulating these memory locations) so there is a risk of inconsistent state or incorrect answers
Risking answers depending on the ordering of when things get to run

Heart of issue: when many things happen at once and the assembly doesn't happen automatically, you get chaos results

This is wrong!!!!! Concurrency introduces a problem, somehow we have to control/fix it (else you have incorrect program)

Simple Code Example: Family Squabble

Say the parent forks off child and they both proceed to try to update contents of 'X' with different and distinct values, each in an infinite loop
Imagine you have a parent and child (thread1, thread2) and access shared resource struct foo. How would you actually create two threads and execute code at same time?
fork() creates child process and we know that the second process executes code right after fork() returns. Boom, you have concurrency and you run into a critical section

You get two threads after fork executes
Two processes execute the same code: is there an IPC problem? Concurrency problem? No

Situation 1: they both try to update contents of foo with different and distinct values, each in an infinite loop.

Semantics of creating two processes is such that there are different PAS (COW - copy on write)
Execute the same instructions, but when they attempt to write into a memory address the OS says 'distinct address space, creating it separately' - by default, they're writing into different memory locations
fork() has no conflict: no critical section because it's not writing to same section

Situation 2: clone() with set of flags has concurrency issue

clone creates another thread of control that executes in the same PAS
Critical section manipulates shared space

fork() has no shared space even if they're executing the same instructions (not in the same state), but clone() has shared state
Lack mechanism for coordinating atomic access to a shared state

Potential Solutions

Lock variables (recursive problem)

Introduce some global shared space that processes can use (indicated writing/reading and stops other states)
Global flag variable is a piece of shared space that they're attempting to access - recursive problem

Disabling interrupts (not trustworthy, multi-CPU systems unaffected)

Don't let machine do more than one thing at once
Enter critical section, disable interrupts, do instructions, then enable interrupts
One problem: impolite. What if something else not accessing the PAS is doing a system call? It won't be able to if interrupts are disabled
Don't trust process to disable interrupts
Typically the architecture has a way to disable on one CPU; the second CPU can go in anyways

Spinlocks/busy waiting (feasible but wasteful since CPU is busy doing 'nothing')

Solution if done correctly
Spinlock: Ex. while(not.allowed)
Test a variable - in shape, it's the right solution. The implementation may be wrong as it still depends on a shared piece of state
How to get semantics of spinlock with tools on OS?
Problem: CPU jumps - tests and jumps
Also what if process sleeps? It takes up CPU space

Synchronization Primitive: Atomic Instructions

CPUs can embed some operations that support the basis of atomic operations e.g. single-instruction-cycle reads or writes cannot be interrupts, and the CPU can suport some serializing or locking primitives (such as instructions)

Peterson's Solution
Test-and-Set (MOS: pg 124)
Compare-Exchange (x86)
All the above rely on spinlocks/busy-waiting
Spinlocks have the potential for priority inversion (i.e. a higher-priority process is spinning while a lower priority process is executing with a lock held)

Concurrency Controls

Methods/approaches/techniques to enforce serialize access or safe access to critical sections locks

Mandatory locking
Advisory locking
See e.g. file-level locking,which we will discuss later in the semester ('man 3 flock')

Advisory locks allow cooperating processes to perform consistent operations on files, but do not guarantee consistency (i.e. processes may stll access files without using advisory locks possibly resulting in inconsistencies)

Semaphores

Up/down
P/V history (Proberen/try) and (Verhogen/raise)
Nice abstract ide, but needs to be atomic in a real system
Possible implementations:

System calls up and down
Disable interrupts
Protect each semaphore critical section with a lock / CMP-EXCHANGE (so other CPUs don't modify even if interrupts disabled)

Multexes

"Mutual exclusion": special case of semaphore (no need to maintain a count)
Example operations (from pthread library)

pthread_mutex_lock
pthread_mutex_unlock
pthread_mutex_trylock

What if multiple threads are blocked on a mutex? How to decide who gets into the critical section next?

Random wakeup, FCFS, perhaps a race condition for wakeup

Example Mutex Implementation

Code from MOS: pg 131
Not a bad idea to have a shared piece of state that indicates whether someone else is in critical section (mutual exclusion)
Not only want to check it, but if it's available, you want to seize it - how to do both things in an atomic, uninterruptible fashion?

Can't unless Intel has done it for you
No piece of code you can write that can do two things at once

Need architecture (supports separation) to have physical reality (instructions like test, set) in a single, atomic, uninterruptible fashion

Intel implements it where an instruction tests it
Slice of code that implements correct entry/exit from a critical section (i.e. mutex enforces mutual exclusion)
Copy mutex to r, set it at the same time
What's mutex? Compare to 0 - if it was, jump to a label OK
OK returns to wherever you were before 0 - first instruction of critical section

Question & Answer

Q:

Wouldn't mutex produce a problem of ordering if processes need to complete before others, and it doesn't run yet but another process comes in and the critical section has other processes?

A:

Not problem of ordering, but there is another problem: if you have a 'which is more important' situation in the critical section, it's a different problem. Ordering problem - statements intervene

Back To Navigation

November 1: Deadlock

Careful programming solution of meta seeding
See Slide 3 for diagram

Order in which things get acquired to is when deadlock arises
p1 holds r1, r2 and p2 wants r1

Doesn't get r1, and it tries to enter mutual exclusion and keeps testing which is fine
p1 finishes the job and releases r1 and r2; p2 gets access to r1 but falls back to a critical section
Problem: p1 gets access to r1, scheduler pre-empts p1, takes off the CPU and p2 gets a chance to run; gets r2 because it is free, but p1 comes back and attempts to get r2 but p2 has it...

Situation: multiple processes wait on each other but they can't give up on their own resources because no progress is happening

Deadlock definition: set of processes waiting for an event that only another process in the set can cause

Two or more processes are blocked waiting for access to the same set of resources in such a way that none can make progress

Seeds of the (Meta)-Problem

Safe concurrent access requires mutual exclusion
Processes typically need more than a single resource
Key issue: imposing logical order on a sequence of locking operations

Conditions Necessary for Deadlock

Read textbook and Wiki and memorize

(1) Mutual exclusion: each resource is only available via a protected critical section (single task)

Precondition - if you don't have this, you don't have deadlock

(2) Hold and wait: tasks holding resources can request others

Tasks can acquire resource or a subset and basically holds it while asking for it

(3) Circular waiting: must have two or more tasks depending on each other

Many threads and many resources

(4) No pre-emption: tasks must release resources

Tasks are responsible for releasing the resources
No third party to force to give up a resource

Have to hold to be a potential for deadlock; doesn't mean if all are present, you're going to get deadlock (just likely)

Depending when these are scheduled, you may not see it

Deadlock is a Manufactured Problem

Try to fix one problem, and another arises
Resources are usually locks in this context

Textbook discusses resources as physical devices, but in real systems the resources are like locks/etc. (constructs)
Resources that cause deadlock are the locks and semaphores protecting entry to the critical regions permitting access to these devices

Resources are usually locks in this context

Solution

For analyzing a system that is prone to deadlock
Resource graphs (read in textbook) - perhaps on test

A modelling approach to understanding deadlock
Processes are circles, resources are squares, directed graph and edges between process and resource -> process requesting resource
Edge between resource to process -> process holds the resource
Sequence of events: you can draw a resource graph

Detecting whether or not there's deadlock is easy - if there is a cycle in the graph, you have deadlock (recall Inception slides)

Approaches to Handling Deadlock

(1) Carry out a sequence of resource requests

(2) Update the resource graph with the corresponding edge

(3) Check if graph contains a cycle

Avoidance

Ignore it: who cares

OK in short term, but later hard to understand a bug (short-term gain, long-term bug)
Diagnosing this is hard because you need to simulate the set of exact ways

Implement a dynamic avoidance (e.g., speculative execution, resource graphs) approach to inform resource allocation sequencing

Takes resource graph and checks each request

Let deadlock happen; detect this condition and attempt to recover

Action

Real systems are large, overhead to track requests and do DFS, so let deadlock happen then draw a resource graph and do something to recover

Prevent it from happening in the first place (usually by scrupulous coding); attempt to make one of the four deadlock conditions impossible or untrue

Ensure it doesn't happen in the first place. Real solution. Let deadlock happen, and ensure the preconditions don't happen

Detect Deadlock

Detecting cycles

Done in many ways, such as if you have a linked list, send a pointer iterating over the list and compare with current node under consideration. If they are equal, there is a loop
Detecting cycles in graphs typically done through some form of depth-first search
Algorithm in the book describes depth-first approach to exploring the graph to see if any outgoing edges lead back to the "root" node

Assume a list of nodes
For each node N
Let root = N
Add root to temp list L
if(root is already in L) cycle();
Any explored outgoing edges?

Yes: pick edge. DST node becomes new root
No: back to previous node

Recover from Deadlock

Recovery tactics
Pre-emption: suspend and steal. Depends on resource semantics

Suspend tasks and take away some resources they hold -- cannot have that claim (approach)

Rollback: checkpoint and restart. Select a task to rollback, thereby releasing its held resource(s)

Periodic check point. Rollback to point of system before deadlock occurs then let it run again. Hopefully another process happens and you avoid deadlock that way
Cost means robust, inefficient checkpoint/restart mechanism

Terminate: (1) break cycle (2) make extra resources available

Terminate a few processes (involved in cycle), but resources become available to other processes (blunt approach)
Terminated process' work is wasted

Deadlock Avoidance

Banker's Algorithm

Dijkstra 1965
Art of abstraction hinders application to reality
Detect deadlock - elegant, but doesn't work
Need to know maximum resource request, the stuff in advance, but can't
Physical devices break
Number of new tasks hard to predict
Not tested
Safe states: there exists an ordering of resource allocations such that all tasks run to completion even if maximal demand made on the system instantly
Unsafe != deadlock, it's risky
Giving out credit: but delivered credit is unevenly distributed, and remaining credit may not allow the system to reach a safe path (i.e. allow some sequence of tasks to complete)

Deadlock Prevention

Break at least 1 of those 4 conditions
Program Structure

Introduce sequential processing (to avoid mutual exclusion)

Without mutual exclusion, we have unsafe concurrency or sequential processing (safe, no deadlocks but lose the benefits of concurrency)

Tough to break hold and wait (prediction again)

Can't know if we've done the right thing (taking away resources is bad)

No pre-emption: tough to kill or pre-empt because semantics and data correctness may rely on this

Wipe out work or put system in inconsistent state (what if they were in the middle of printing? The printer driver state gets affected)

Circular wait: sequentially enumerate resources; impose (code) rule of sequential access

May be able to break. If there's dependency (essence of deadlock), can compose a strict ordering of requests to avoid possibility of acquiring multiple processes
Requires good programming

Code: Pthreads to Cause Deadlocks

In userland, may be more complicated
Userlevel threading (pthreads)
Use library of illusion of threading at userlevel (no system calls)
Example code
Starts two threads, and they access two variables with shared resources (do it safely with mutual exclusion)

Access resources, then exit

Run program as many times as we want
Put /tie 1

Forced first thread to count to 1 billion then acquiring second one - but nothing is working
After first thread counts, it should attempt to get mutex-2, but it's stuck (constantly does a mutex lock)
Second thread grabbed the second mutex and tried to get mutex-1, but it's already held by the first thread
Have all four conditions of deadlock

Minimal delay (0.5) no problem even though conditions exist

No deadlock? VM is single CPU system, and pthreads is user level library (no real threads)
The threads created are not real processes, they are simulated inside (created anonymous regions that pthreads simulate)

Threads share the rest of the PAS

lib pthread is a simulated thread, with its own scheduler inside, not relying on OS scheduler

Without a delay, pthread scheduler picks each jobs quickly
Poke OS scheduler and pthread scheduler: a delay into the first thread, caused CPU to take process off and put to sleep; other processes run; OS runs the process and pthread wakes up and then deadlock happens

Just because conditions are there, doesn't mean deadlock happens - want to avoid this
What does it take to write a threading program?

Wouldn't it be nice to have the ability to spawn threads and deal with slightly independent subtasks?
If you want real concurrency, depend on clone

Example: `tie.c`

Code available on Wiki
Buffers and pthread_t
Handles for threads (job1, job2)
Mutexes, synchronizing primitives that jobs will use to have safe concurrent access (box1_lock, box2_lock) - no one global lock

Associated a particular mutex with a particular resource
Appropriate programming paradigm in kernel

Many data structures have a mutex or lock embedded in them

mm_struct has a mmap thing - controls access so that processes must access this first
Responsibility and wait on program

Note that there is nothing mandatory in the lock - can create a third one that pays no attention to the locks

If you're writing in a team, and a noob needs a variable and doesn't use the lock (nothing prevents them from doing this), you may have issues

Retrieves delay from command line, multiple 1 billion for delay and count
Initialize locks so code can use it appropriately
Create threads (order doesn't matter)
pthread library calls mmap to keep track of stack and context of the thread - CPU context and stack (record of execution) is important
Critical section - data rather than sequence of instructions
pthread join to collects threads
Once pthread_create is called, the scheduler inside pthread is free to begin execution of code
do_some_job

Signature follows the pthread_create syntax
mutex_lock, address of mutexes (lock them)
do_report prints status whether you locked it
lock box1 and now in critical section
Difference between job1 and job2 is do_job and do_some_job

do_job

Delay counter
Tries to acquire second lock after locking he first

Not entirely unrealistic - lots of kernel code locks something, calls a routine with work, then locks something else
Often do not lock sequentially

Hit critical section - box2 and box1 change
Where does it go wrong?

locked boxes in a different order
Users can only control the order of which you do thing

Lock box1, then box2; but in the other do_some_job it is reverse (potential for circle)
Solution: lock in same order. Must follow this discipline or else kernel will deadlock

Download the code, then try to fix it

pthread code useful for homework 3

MUST know conditions of deadlock
Should be able to draw resource graphs if asked

Back To Navigation

Tutorial 13: Bashes

A way to change the source code according to some modifications, in addition to: modification of the custom kernel

Use this file to update to the original kernel to apply to the custom kernel

Example: File1

hello
world
robin

Example: File2

hello
world
robin
bye

Bash applied from file1 to file 2 - file1 should add "bye"
Execute diff <FILE_TO_BE_BASHED> <FILE_THAT_WAS_CHANGED>

diff File1 File2
+ signs are additions to the source code
- signs are things taken away from the source code

Today we create a bash for the custom kernel: follow exercise
Send the diff file to SVN

Back To Navigation

Week 9

November 4: Kernel Synchronization Primitives

Announcements

HW3 is released

Range of due dates
Turn it in earlier, gives you your mark before final exam
5 problems listed, 3 problems to do
Choose problem 1 (more work) and 2 (if you do both then bonus) and should do problem 5

SVN time stamp to monitor submission

Submit when you're confident as to not confused TA

Problem 1 is easy, but lots of user-level programming

Implement a page replacement algorithm

Pthreads

Page threads i.e. user level concurrency
Recall pthread library, deadlock and part of API to get mutual exclusion
Deadlock: stick head in sand, ignore potential for damage in critical sections

Benefit: performance as you're not busy locking, checking mutex and so forth

Example: urace.c

One thread doing something
main has time, creates pthread which points to do_job and has a handle to job1, then prints statuses
do_job has for loop that increments a variable (substantial work)
See how fast it is

Example: srace.c

Imposes some locks, particularly in do_job before modifying global state, it locks the state
urace took barely any time (0 seconds)
srace took more time (4 seconds) - more overhead

Price is probably worth it

Design decision: what is a critical section, how large should it be?

Locking is "expensive" (4x cost)
If you're going to define a critical section, make it as small as possible and minimize number of sections in code
Design challenge: defining what's in the critical section and where/when to have them
Safe way to do things, but there's performance cost
As a designer, you have to decide how much parallelism and concurrency there is; may be dependencies between threads (critical section) so you have to protect those
If you're going to incur cost of critical section (modify state) you might as well do as much as you can

Critical section is slower than unsafe code, but unsafe code is unsafe!

Correctly Servicing Requests

Kernels are naturally concurrent - servicing requests in system calls from processes and interrupts from hardware; does it basically at same time
Requests not completely independent. If they were, there would be no need for synchronization
Kernel is naturally asynchronous

Startup - creates processes, enables interrupts, and kernel is fully concurrent

Sequential - well-defined order, easy to understand
Event handling system - don't have this. Any function or subset of them can run at any given time

Interleave, multiple paths - requires synchronization

Main Ideas

Atomic operations in x86

When kernel has to do something important, needs to rely on hardware platform to provide primitives
Ex. Servicing interrupts

Kernel locking/synchronization primitives

Mutexes

Kernel pre-emption

Kernel can pre-empt process - one CPU, n processes, stop any one of them and schedule another

Kernel interrupting user level processes, but Linux kernel can pre-empt itself (can stop its own threads)

Can have multiple concurrent control of multiple paths

"Big kernel lock"

In early days of kernel, there was one mutex
Corrects concurrency but is inelegant and allows people get lazy about defining critical sections
Use the semaphores of mutex appropriate task at hand (embedded in data structure)
If there was one big lock, you wouldn't get ideas like kernel pre-emption

Synchronization Primitives

Atomic operations

CPU can do things in one cycle, and ways to enforce it
Linux enforces ability to update automatically

Disable interrupts (cli/sti modify IF of eflags)

Assembly instructions, but on a per-case basis (not best solution)

Lock memory bus (x86 lock prefix)

x86 has prefix for every instruction that locks memory bus - enforces synchronization

Spin locks
Semaphores

Graceful version of spin locks (see differences later which is an important skill)

Sequence locks
Read-copy-update (RCU) (lock free)
Barriers

Spinlock or semaphores make sure one task or thread controls in a critical section, whereas barrier accumulates multiple threads
Machine instructions
Barriers are serializing operations that 'gather' and make operations sequential

Motivating Example: Using Semaphores in Kernel

Code that operates mm_struct

Uses part of kernel synchronization API (manipulates semaphores)
struct mm_struct: fields pgd, vm, etc. be intimately familiar with
Fields to help kernel maintain meta data about memory space and pages
Other data instances that help kernel code access memory in structure in safe way
Slides pg. 8 ->

m_count: reference counting; how many threads are sharing memory mapping

mm_count: how many control paths are accessing

down_read, up_read

Definition or boundaries of critical section
Marks entry of critical section (down_read(&current->mm->mmap_sem) : do something important to this memory, like read operation)

Shouldn't be writing or modifying state because critical section is reading it
Signal that you're going to write to a critical structure

Indication of kernel control path that you're a reader of the state

Enforces concurrency, and is voluntary (most locking is advisory; have to agree to participate in the scheme)

Without these, you live with consequences - uncomfortable thing

After manipulating these data structures, you understand more
If you want to touch data structure, use proper locking/unlocking, down_read/up_write for semaphores

Primitive One: Atomic Type and Operations

On x86, know that some things are atomic

Simple asm instructions that involve 0 or 1 aligned memory access (1 cycle)
read-modify-update in 1 clock cycle (Ex. inc, dec)
Anything prefixed by IA-32 'lock' prefix (happens in atomic fashion)

Kernel can define atomic_t

Definition in Slides pg. 13
static inline...: adds to atomic type

Translates to machine code, LOCK_PREFIX is macro defined to lock prefix for assembly instruction
Force memory bus to be locked, to export atomic manipulation to counter or some other variable

Declaring type not just atomic, has to invoke hardware primitive to enforce synchronization

Example: Reference Counters

Refcounts: atomic_t associated with resources, but keeps count of kernel control paths accessing the resource
atomic_t - avoid mutex/spinlock/upwrite/etc. as it's more overhead
atomic_t update a single variable instance (no overhead, elegant)

Primitive Two: Spinlocks

Exports a mutex (check state of mutex constantly)
Look at multiple wrappings and layers of definition

May be frustrating, but good because there is a generic locking API and semantics eventually have to be transformed for platform

spinlock_t: raw_spinlock type with rlock

raw_spinlock_t: arch_spinlock_t raw_lock

Indicates whether you're a reader or writer

Mutex flag indicating whether you go in critical section is wrapped by these things
Pass in spin_lock: raw_spin_lock calls and then calls another one and so forth

Combination of inline functions and macro defines

Not incurring function calls (locking to be as cheap as possible so you can delegate to an appropriate primitive)

Slide 18, Line 137: __raw_spin_lock(lock): Pass in lock variable and do three things

(1) Kernel is fully pre-emptable so want to disable it: critical, as you're in middle of acquiring mutex and don't want interruptions

Macro with a do-while loop
Barrier as been discussed before

(2) Kernel draws resource graphs to detect deadlock

(3) Slide 19, Line 144: LOCK_CONTENDED: macro, and locks

Passes first argument as reference to lock, and function pointer do_raw_spin_trylock and do_raw_spin_lock
Slide 19, Line 393: Call try to acquire lock and semantics of it; if you don't get it, try again

Line 394: lock_contende(): completes book-keeping and requests
Line 395: lock(_lock): pointer to do_raw_spin_lock and passes in
Slide 20, Line 136: do_raw_spin_trylock

arch_spin_trylock(): delegates again

7-8 layers down, and finally delegates a __ticket_spin_trylock() (Slide 21) which does assembly code stuff (closing loop)

Generic API to acquire spin lock, specific to architecture being run on (architecture only thing that is foundation for concurrency)
Physically need something to complete concurrency in one cycle

Slide 21, Line 88: cmpxchgw is like testing (same type of thing)

Publisher: know semantics - basically same as textbook (try to acquire mutex, if you have it go to critical section, otherwise yield and stay at top of critical section...keep trying to acquire)

Primitive Three: Semaphores

Primitive 2 is a bit of a waste: not elegant as you don't know difference between readers/writers (may have multiple readers, but do not want to let in a writer)
Waste to constantly spinning, checking - scheduling another process to try to acquire mutex (if it can't it sleeps)

Wasted CPU

Solution is semaphores. Difference is it is a list of processes on semaphores that are asleep, and doesn't waste CPU time

Important Caveats about Kernel Semaphores

Semaphores are not like spinlocks in the sense that the invoking process is put to sleep rather than busy waits
As a result, kernel semaphores should only be used by functions that can safely sleep (i.e. not interrupt handlers)
Slide 25, Line 79: rw_sempahore

Line 80: rw_sem_count_t count: semaphore

Semaphores produce code by Dikstra and basically are counting mutex - counts how many threads are currently in critical section
Semantics is try and release
A counting mutex - count of things available/how many threads can enter critical section

Line 81: spinlock_t wait_lock: has nothing to do with semaphore, but protecting access to wait - list (protected by spinlock as it's a shared state)

Many processes may arrive to this wait_list, waiting to get it (critical section, so acquire a lock to put self on list)

Line 82: struct list_head wait_list: list of processes/task that wish to acquire semaphore so kernel can refer to it

Effect of down/up_write is to register yourself to the list

Data structure that tells difference between spinlocks, semaphores and what kernel is doing
Acquire semaphore for mmap_sem and calls down_read (read a few variables out of the stack)

Basically same thing - down_read and up_read has same layers of concurrency

Slide 25, Line 21: might_sleep calls schedule and might be put to sleep (kernel knows you may/may not get it)

Boils down to assembly code: get to the machine

Slide 28: task leaves critical section 0 then calls call_rwsem_wake (line 111) that does book keeping and calls rwsem_wake

Who gets to enter critical section next?
May have collection of readers and writers waiting on wait list - same pattern where kernel knows someone elf critical section
If there's writer in front of queue, we will let them know (FIFO); readers try to race for it (rwsem_do_wake)
If writer on queue, get to go first; if not, wake everybody up and let them make progress

Advanced Techniques

Sequence locks

Solution to multiple readers=writer problem in that a writer is permitted to advance even if readers are in the critical section
Readers must check both an entry and exit flag to see if data has been modified underneath

Read-Copy-Update: synchronization primitive

Designed to protect data structures accessed by multiple CPUs; allows many readers and writers
Readers access data structure via a pointer; writers initially act as readers and create a copy to modify. 'Writing' is just a matter of updating the pointer
RC update: treats readers and writers initially, particularly in circumstances where pointer to data collection
Enters critical section that manipulates complex data, and if someone indicates writer - enter critical section with no problem, but copies the data section to update the copy, then update pointer (atomic operation) - lockless way to get readers/writers to manipulate
Only for kernel control paths; disables pre-emption
Used to protect data structures accessed through a pointer
By adding a layer of indirection, we can reduce wholesale writes/updates to a single atomic write/update
Heavy restrictions: RCU tasks cannot sleep
Readers do little work
Writers act as readers, make a copy, then update copy. Finally they rewrite pointer
Cleanup is correspondence complicated

RCU Example: getppid(2)

Kernel needs to do synchronization safely so it needs own primitives
Semantically no different than locks and semaphores and foundation is based on machine architecture exposes

Question & Answer

Q:

Wait list vs FIFO queue. Why isn't this FIFO queue?

A:

Link list is doubly, and kernel's generic doubly lined list structure

We will see as someone exits critical section and calls up_read, what does kernel do? Chooses to implement FIFO or wake them all at once and let them race to acquire mutex. More of a scheduling problem

Back To Navigation

November 6: Test Review Material

Back To Navigation

November 8: Threads (User--level Concurrency)

GDB

Powerful tool that may be daunting, look further than what system calls are doing
Lets you recover contents at what point your program crashes at

Example: loader code

Successful run, but let's look at gdb

gdb -q ./<program_name>

Start gdb:

run

Run the program
Where to see when program crashes? GDB intercepts segmentation faults and stop program from there
Where to set a breakpoint? Line numbers suck

disassemble

Ignore source for now, but this assembly code is what's really going on
Look at call *%eax, you can find he address there

break *0x80486a3

When program is run in gdb, stops at this point
At this point, new memory region should be set up; let's examine valid memory address

x/FMT ADDRESS
x/16i $eip

Looks at 16 instructions starting from eip

x/16x $eip

This is in hex

stepi

Execute next line

Pthreads

May be useful for HW3
User-level threading libraries are easier to use than calling kernels and stuff

Example pc.c

Code available on Wiki

Encouraged to download it

Shares access to global states: buffer and position

int buffer[BUF_SIZE]: Integer buffer of some spots long
Ex. 5 spots in buffer

void* produce(void* arg): Producer generates an integer at a time to place into end of buffer list
void* consume(void* arg): Consumer reads integer from beginning of buffer list and deletes that item

Must protect this shared state (pos); must control/synchronize access

One thread touching x, no concurrent access - need not protect it with lock
Inside critical section, calling a few functions

Too much work being done?
No need to lock it if you don't need to do it in the next few instructions

Manipulating shared state: print_buffer which iterates over the whole thing

Even though it's just reading, must ensure values in pos and buffer remain stable
Producers and Consumers use this method

insert should be in critical section because of all the changes it makes to pos

Just because it's a function call, doesn't mean it's the wrong choice for making critical section size

As a code designer: does his piece of work belong in critical section (i.e. reading/writing shared state)?

What about the if statement?

Not shared, no need to put it in critical section

Output

Produced/removing elements
Insert failed, dropping value #
Buffer is size 5
Producer runs first
Overwriting elements may not be best; design choice is to drop an item when buffer is full

Happens in packets - can't force something physically impossible

Producer, Consumer execute at different rates; buffer is controlled

What would happen if we took reigns off? No sleep

Fails to remove any items from the list
Execution as fast as possible

Remove pthread_mutex_lock and pthread_mutex_unlock in Producer and Consumer

Consumer runs first, same pattern

Concurrency problems hard to find and reproduce

Challenges

Producer, Consumer inserting/removing one element at a time in controlled conditions
Try to get them to insert/remove random different elements at the same time
Unless you introduce critical section, things run parallelism

Back To Navigation

Tutorial 14: Time Measurement

Many functions to measure computation - time(1), time(2), gettimeofday(2)

Programs from UNIX (go to terminal and type time followed by any command you want to measure time of)

How to Use time(2)

Returns wall-clock time
Used as time_t start = time(NULL)

Execute whatever you want to measure

To get the starting time and then: time_t end = time(NULL)
To get number of second, subtract values

How to use gettimeofday(2)

Another way to return wall-clock time
Used as:

struct timeval start, end;

gettimeofday(&start, NULL);

To get starting time and then: gettimeofday(&start, NULL);
More precise, can get measurement in microseconds, etc.

Can measure any operation, need to provide start and end point for getting the time (subtract it)

Back To Navigation

Week 10

November 13: Signals

Signals offer a way to send short, pre-defined 'message' to processes
OS supports generating and delivering signals
Simple API for sending messages

Easy to know semantics because it's small

Number delivered by OS between processes
Source of signal: user, another process, kernel, kernel on behalf of hardware
Can implement reasonable response to most signals
Signals provide a way to:

Notify a process that an event has occurred
Allow the process to run specific code in response to that event

Example: `kill` signal

If you didn't know what signal to send, look at man pages
Tempted to be root for SIGTERM, SIGHUP
SIGUSR1
Most processes don't have custom definition for signal handlers from both signals
Signals are exceptional conditions - wouldn't build a network server from these
Signals can be sent, see a list, and can catch some signals

Basic Signal Workflow

Signal s is generated (may be in response to an illegal memory reference, or processes agreeing to send signals as a way of process communication)

Generated through system calls which involves the kernel

Looks up target involved and deliberate the signal

Attempts signal delivery to current process - may be current process generating signal

If target is not current process, the signal is pending

Once OS looks at these filters, it delivers signal (writes small number in list to task struct, then when process can execute the OS forcibly jumps to event handler)
Else kernel delivers signal to process
But process may have blocked this signal
Process may also be masking this signal
If delivered, execute default action or sig handler

Signal generation

Kernel data structure is updated to register the presence of this new signal instance
Each generated signal is delivered at most once
Generated signals may not be delivered (with delay, or at all)

Signal delivery

A generated signal may be forcibly delivered by the kernel by either changing the task's state (e.g. STOPPED) or executing a specific signal handler
Delivery may be delayed; such a signal is termed 'pending.' Additional pending signals are discarded

Blocking/masking signals

A process may temporarily block or mask a certain set of signals
SIGPROCMASK(2): manipulate current signal mask
SIGPENDING(2): get pending signals

Actions on signal delivery

A process can ignore the signal
The default action occurs (terminate, dumpcore and terminate, ignore the signal, stop he process, continue the process)
Catch the signal via a registered signal handler
Ignoring != blocking

Catching a signal

A signal is an asynchronous interrupt from the point of view of the user code
User process has an opportunity to run a signal handler
Signal handler invocation (if present) is prepared by the kernel

Diagram: Figure 11-2

Slide pg 15
Normal program flow (user mode) -> do_signal / handle_signal() / setup_frame() (kernel mode) -> signal handler -> return code on the stack -> system_call() / sys_sigreturn() / restore_sigcontext() -> normal program flow
OS notices initiation of event, which is a call do_signal() - job is to look at target of signal and who to deliver it to; will there be mask of signals to receive?

Jumps asynchronously to signal handler, which does small amount of work (it's just a function) and control flow jumps to end position of program

Entering, leaving signal handler is complex - OS needs to ensure context is correct and restore as much of original context it can
Issue: signal is not a normal message, it's an exceptional condition

Default intuition for mode of computing - weird conditions, terminate and restart program. Lose lots of accumulative state

What to put in signal handler?

Using `ptrace(2)`

Enables one process to trace another (examine its state, control its execution read/write memory)
Relies mostly on signals for STOPPING and CONTINUING the traced child's execution
Bad system call because it does a lot of stuff
Exceptional because it purposefully breaks the isolation between processes, particularly ptrace'd programs give access (can rewrite return values, change things, etc.)

Versatile, but weird and unlike other kernel functions

Normal interface has powerful requests - read data from process address spacepeektext, peekdata)

Same request: Linux uses flat segmentation model, so text and data is basically mapped to same segment

Programming using ptrace is a pain - responsible for driving execution of process, and when you stop, examine or changing process then you are responsible for continuing execution

Continues until some other signal happens. Probably avoid - need PTRACE_SYSCALL, PTRACE_SINGLESTEP which allows process to go fom one system call to another

Example: synfer.c

Source
Mimicks ptrace
init_attach(char *tpid)

Convert char to long and do ptrace(PTRACE_ATTACH, tr_pid)

Step to system call - stops for entry and exit
Value of eax is system call write

Checks flag if entering or exiting
Entering: prints data
Exiting: check return value

Question & Answer

Q:

Can't you use ptrace for malware? How can you prevent abuse?

A:

Defense of ptrace: compares ID of issuing and target process - notably, if take over Firefox then can ptrace other processes (once I'm you, I can do whatever I want)

Used for malware, debugging and security models

Back To Navigation

November 15: Files and File I/O

Announcements

USRI on Dec. 2
Final exam on Dec. 18 at 8 AM

Perspective

Traverse layers of abstraction, and begin with familiar place: files

User-level tools for manipulating files (content, metadata)

See how files are managed and manipulated - instead of clicking them, see how OS supports the data
C library functions and structures that represent and manipulate a file stream as a buffered sequence of bytes
Use System Call API to manipulate a file, illustrating three ways to ieract with file content:

Have our experience mediated by software applications
Have our experience mediated by the C library
Ask the kernel directly for service

Familiar with write(2)

Interface: Write to file descriptor, address, how many bytes to write to
Generic, doesn't say how OS is locating bytes and putting in the file descriptor - magically appears in standard out

Question: how does kernel provide support for managing persistent data?

We've seen how kernel supports storage of volatile data in primary memory, how it's addressed, organized in pages, and extended across memory hierarchy as virtual memory

Manage non-volatile data

Answer: 'file' abstraction, generic wisdom that everything in UNIX is a file

Powerful analogy: managing, manipulating most resources (rather than just files alone)

Class Experiment

Everyone writes favorite sentence (from MOS!)

Individuals pass it to the center

Those in the center, pass it down to the front

Who encrypted or compressed their message?

Piles of paper

Find one of the sentence, where is it?

Somewhere in one of the piles

Look at each paper, trying to find the contents in the paper
Take paper out of the pile to look at it

Find another sentence: what to do with the current paper?

Put it back into the pile - anywhere you want

Not easy to find file - no identifiers (different color, etc.) and no names associated to it

Time consuming

Need for disciplined approach for managing data

If we left for today, where would papers go?

Challenge for file system

Hard disk fail

Tower of Backup - do you backup your backups?

Need for effective way to manage files
If we had to find paper and label it, chaos ensues - not really a solution

Label, but how to find specific contents? Must still open each file and sift through it

Questions

Some questions were covered above

(1) Tell me how to find your piece of paper

Look at contents
Need for naming, indexing

(2) Tell me how to retrieve the 3r word of your sentence. The 75th

Find file (perhaps linearly)
Look up some byte in the file (may/may not be there)
Scan linearly byte by byte? Wouldn't it be nice if there was an operation where it jumps to that byte?
Solving this problem is the entire genesis of why file systems exist

(3) How do I handle duplicate sentences?

Compress files into one

How to manage representation or storage?
Access data at once?

Chaos, OS needs to synchronize or serialize some access to files

(7) Who can see your paper? Your sentence? Who can modify it/them? Who can delete them?

Concurrency issues

(9) What if we have 1450 students in the class? 14500? 1.5 million?

What if data were physically distributed?

Questions are system design questions

Half of our discipline isn't just inventing the wheel, you look at existing solutions/components and see if they meet requirements of problem
See what existing file systems do for you - what guarantees do they provide?

Write behind it, maybe data distribution happens, maybe a file system that talks to SSD, RAM, etc.
Understand top to bottom what guarantees the file system provides

Problem of Files

Why files?

Virtual memory isn't sufficient - it's volatile when you power down a machine
Not big enough - size and scale of data increased rapidly, need to manipulate in a reasonably efficient fashion

Understand problem of persistent storage as analogy of non persistent file

Useful data needs to persist beyond power cycle
How to find what file you need? Indexing into virtual addresses, memory addressing
How to asset ownership and control of files, protection issue, segmentation DPL, page protections
How to dynamically grow a file's content and not litter copies of old data? How to give resources back to OS? Memory allocation
How to keep 'hot' data close to CPU/registers? Virtual memory
How to support performance in efficient fashion? To keep data you're reading/writing a working set of data?

Linux tries to use physical memory - not just for processes, but for file contents

Liunx reads off disc and stores it in memory so access to it is fast

File Content

What is a file?

Relationship between bytes worth capturing by delineating bytes from each other

Data, bytes grouped together under some context/associated with other data

Container

Manipulated in its own right. Some set of operations on the contents

What is in a file?

Student answers:

Distinctly different information
Bytes (hopefully)
Metadata
Header and list of allocation blocks

Address to pointer/Identifier

How to find file, point to bytes inside the file

Physical location

May find bytes of contents
All bytes of file is not beside each other - why shouldn't they be contiguous?

Virtual memory issue: notion to expand, and need to allocate space, which implies a need to handle collection of bytes in less than file-sized chunks
Collection of blocks, which are the bytes of the contents

Pointing to a file - point to physical location of fix sized subsets of the file

File Type

How to manipulate bytes? Is there a value to see it as a different type?

Don't want to just see all files as a sequence of raw bytes - not useful as we manipulate persistent storage meant to interact in certain ways (images, ELF files..)
Lots of file types. If given A, B, C, how to know what file it is?
Bill Gates added .123 to say what the file is - pointless but keeps track of file type which is useful
Users expected to know which program could read and write a particular file, extensions not used

File extensions not necessary, but file type is necessary (what program to run it on)
Many developing file formats keep track internally of what the file type is: the first few bytes (along with some other structure) often describe the file format (like the ELF)

Don't mess with file name, do something internally in file itself

man 1 file: determines file type

Ex. File A with 4 MB - what file type is it?

To determine it, can look at the sequence of bytes
To check if it's an image - parse it and if it works, great! Can try an interpreter
Seemingly simple problem is a fundamentally hard problem - problem of language recognition

Here is a stream of bytes, recognize the language for me - build a parser that recognizes the parser that exactly recognizes it
Magical bytes that tells you what type of file it is
Ex. ELF in beginning - OS believes it is an ELF because of beginning of file

Binary programs have readable ACII content
UNIX files contain pure data, not markup

Manipulating Files as a User

Any program that understands the file format plus commands you should recognize:

ls, cd, chmod, lsof, pwd, chown, chattr, lsattr, cat, namei, more, hexdump, touch, df, ln, file, stat, rm, cp, lsusb, unlink, lsmod, lspci, mv, mkdir, dd, chgrp, basename, grep, rmdir and so forth

File Attributes

UNIX files typically do not embed metadata about the file ('metadata' is different than file format structure)

Metadata is expressed via the type of file system the file resides on, or via the CFS delegation of device or file system specific kernel functions actually implement open, read, write, seek...

To write programs, C library has facility built on system calls and allows user to manipulate file/attributes associated to it
Manipulation of metadata of file as UNIX files do have small amount of metadata associated with them

chattr(1): standard set of attributes a file has where files are marked

Even though space of metadata is there, requires OS to understand attributes

lsattr(1): most files don't use attributes - either not useful or uses them
Can change attributes to be immutable - good to prevent accidental deletion

File Permissions and Ownership

Files, like memory, contain data. Users expect to be able to express ownership and access control rules about these collections of data
Secure, isolate, protect data in the file. UNIX has two major approaches:

Mandatory Access Control (MAC)
Discretionary Access Control (DAC) - use for this course

Owners of file has discretion to set permissions

Where do these permissions get stored?
Can use chmod(1), chgrp(1), chown(1)
C Library: File-related glibc calls

FILE* stdin/stdout/stderr;

Physical I/O characteristics concealed - man stdio(3)

Example: `ll`

Should be ls -l
First columns describe a policy/accessing a particular file
3/4 column describe ownership of file - primary user that belongs to a group
Collection of letters r-w-x (read, write, execute)

d (directory) - more letters like s (sticky bit), l (link between files)

Ability to manipulate bits - important metadata of machine
chmod changes permission
Collection of permissions is a number
First group for eye, second is cpsc, third is everyone else

rwx rwx rwx (user group other)

If each of these is a bit, can represent in 2^3 = 8 different permissions
0 permission doesn't exist, 1 does
For r: 5 - chmod 555 <file_name>

Directories

Files are themselves a way of grouping or associating a collection of data. The internal semantics of a file can be almost arbitrary (indeed, you can 'hide' an entire file system or even virtual machine inside a single file)
Collecting and grouping files can be done via directories
Directories are a way of providing structure to a namespace, and directory names are the elements of that namespace
In UNIX, directories are files

Paths

Path expresses one way to traverse the directory namespace. A path is composed f all the directories 'along the way' to a file or directory
Relative and absolute file pathnames
'Special' directories and path elements: '.' and '..'
Two different paths can refer to the same file: ln -s /home/michael/foo and /tmp/bar

Conclusion

Files are an abstraction that allows programmers, users and the system to manipulate arbitrary collections of persistent data
Files go beyond this use case to provide a convenient and flexible interface to manipulating many different types of data
You can manipulate files with command-line tools, with C library functions and 'directly' with system calls

Back To Navigation

Week 11

November 18: File Systems

Recall

Files are virtual construct that defines name space in terms of sequence for a range of bytes
Parts of that file is scattered across a physical medium
A programmer sees a file as a name, path, sequence of characters, but really is an attribute of a file
Used to thinking it's a collection of bytes in a file - largely contiguous but physically scattered across machine
If we can access files in system calls or glib, machine does a lot of work to give us illusion of contiguous storage
We know size, data, permissions, etc.
File system keeps track of files and their content. In Linux, a deeply layered approach is taken to defining a hierarchy of implementation of various services

Function calls fopen("...")

-> open ("...")

-> SYSCALL layer (sys_open(...))

-> (VFS -> file system -> buffer/block I/C -> device driver)

File system is logical arrangement of data; kernel has other layers to talk directly to hardware
Performance layer to keep file data as close to CPU as possible (buffer)

-> I/O container

Ideal storage unit: unlimited

Within it, might have another construct: file F from position k -> j

Can access any byte efficiently and in constant time
How large should we allow file F to get? Should we allow it to grow and shrink?

Motivation

As OS or file system developer, responsible for managing file content: storing actual parts of the file onto some medium and providing lookup and indexing facilities to support read(2) and write(2) of that file
What sort of flexibility and power does a file system need? What are the types of files it is likely to deal with? What are the design requirements?

Around 448549 files on a typical desktop box
Relatively many small files, only a few large files
Most files range from 1KB-5KB

Where are the files? (Content not name)
How can you keep track of space needed for growing these files?

OS system call API (Eg. open, read, write) hides such detail from us

Question: how does kernel provide support for a similar translation between some byte offset in a file and actual content on a physical device like a hard disk, compact flash, optical media, or other storage?
Answer: file systems. To design and build an effective one, you should try to understand the likely workload and typical environment - this open-ended design requirements leads to quite a number of different file systems

Example: seek.c

Open file, write message, seek 1 GB of file and write again
Disk space doesn't change
Content of new file (moralcodecastinsteel.txt*) - where is it?
A hole in the middle
When reading some number of bytes, the middle is just a bunch of zeroes - OS generates them on the fly
Virtual space in file (address) ranges from 1 GB, but physical space it takes up is smaller

Implies OS is doing some sort of mapping/lookup in the virtual space and particular byte number

Design of mapping is heart of a file system - whether or not it's efficient or appropriate

Types of File Systems

How to implement lookup mechanism? Each different answer is a different file system

Blue directory names are different file systems

Directory structure

Often think about directory structure. In Linux systems, you have 'root' of tree denoted by "/"

"usr", "bin", "etc", "opt", "local", "home"...

Tree can be as deep as you want
Tree defining set of relative lookup operations
Logical lookup, but nothing about arrangement on disk
open("/home/eye/457/lectures/files/seek.c", O_RDONLY) might be an absolute path -- uniquely defines this seek.c. Useful human level tool

Given this, seek.c is a collection of bytes. OS has task of associating data structures that point to a physical space
File system is NOT a directory

Perceptions

Files (and their names and the glibc and syscalls that manipulate them) may be the most visible interface to your persistent data (and thus define your experience and intuition)
Directories (and their names) may be the most visible facet of a file system, but these concepts are far from the same thing
Directory tree namespace is a hierarchy of names that can help: organize your data into related groups and help impose some kind of access control on that data
In UNIX, since most resources can be represented as a file and the directory tree offers a unified naming scheme and semantics, it can be difficult to tell what role a file system (or several of them) play behind the scenes

Mounted File Systems

Directory tree name space can have many file systems attached to it at the same time

Experience: Windows exposes different devices with C:, D:, E:, etc. drive specifier. In a sense, it's multiple directory trees. Can mount a remote disk drive or directory tree via network (NFS, CIFS/Samba). Plugging in a USB disk causes another drive designator to appear
UNIX hides this kind of thing: no need to create a separately rooted directory tree for each new device. Instead, physical devices (and file systems present on them) can be joined to almost arbitrary points in the existing directory tree naming structure

Can attach file systems to directories - entirely seamless to indicate what file system your directory is stored on

Code in kernel designates what the right set of routines and data structures are for a respective collection of files in virtual name space
mount: mount a file system

Can mount a file system to a (sub)directory of another file system
Can mount multiple file systems to a single mount point, effectively stacking multiple file systems on to one point. Previous file system is partly hidden

Each process can have its own namespace (i.e. its own directory tree)

What is a File System?

(1) Collection of mode in kernel

Collection of data structures and functions that specify how the kernel should perform file-related operations
Supporting the 2nd point

(2) File format or network protocol specification

Specifies the details of the lookup mechanism
Defines structure for distributing files over a physical medium
Choosing how you do allocation of this data, and how you index certain bytes in a file, optimization imply the choice of this format/replacement

Format is essence of file system and functions of allocation, indexing, pre-fetching, pre-allocation, defragmentation, etc.

Are foregone conclusion of how the files themselves are placed and how the indexing information is represented

Same responsibilities as memory management - allocating storage, and talking to actual physical media (same activities)

Roles of a File System

Most important job of a file system is keeping track of data
Hard job since 'virutal' file space abstraction needs to be mapped down to a physical device - particularly when asking the device for more free space
Roles of file system:

Managing the need to allocate more space (and all that goes with this, including performance management, deferring writes, etc.)
Understanding and talking to physical media (usually through a device driver)

What is in a File System?

While files contain (better word: name) data, file systems contain the indexing structure that point to files and directories (file system data structures also typically contain files' associated meta-data)
For many (but not all) file system formats, the in-kernel memory version of these data structures is a direct reflection of the information stored on the disk

Much like page tables are stored in physical memory, this metadata must physically reside somewhere

File systems typically have more than one superblock, although only one is active at a time

Where is File Metadata Stored?

Metadata and attributes of the stat output was split across several things such as the directory and the data structures of the file system. The inode (index node) is the solution as it is a one-to-one mapping of the inode (data structure) and inode number to a file
Ability to keep track of size and permissions - kernel storing these attributes
stat <file_name>: provides information
Storage is combination of places -- all directories are also files
On Unix-like system, metadata and file itself - data structure that remembers it

inode (index node)

Inodes have unique identifier that remembers File F starts at some offset k
Lesson: given a key/identifier in directory structure (virtual namespace of keys), how does OS take this info and begin to look up information? The answer is inodes

Problem: pathname lookup
Starting from current->fs->root or from current->fs->pwd, for each directory entry in path, get that inode, extract info' entry matching next path component is used to derive next inode

Must check permissions along the way at each directory, expand symbolic links along the way, traverse file systems (if path component is a mount point), handle namespaces (same file path string in two different namespaces may resolve to different files)
Goal: wind up with file's inode

If you misspell it, OS checks if identifier is valid
ID valid but unmounted - paths are split into directories and OS may have to seamlessly mount it all
Similar to memory management - given a logical address, it does a bunch of things to find physical space
Gives data structure to inode of that particular file

Doesn't hand it to you, but hands it to struct file that keeps track of metadata

Inodes are a useful solution to a number of challenges:

Convenient place to store metadata
Collections of them are smaller than FAT tables
Sharing files (i.e. setting up hard or soft links) is easier to support updates

Problem: Which Disk Blocks Belong to File F?

Once we have the inode, how to find which disk blocks contain the actual file content? How does the OS go about associating a particular set of blocks with a particular range of bytes for an arbitrary file?
Considerations: performance, seek time, simplicity of implementation
Most algorithms assume a fixed-size block of data. Reminiscent of paging vs segmentation
Contiguous allocation

Start at position 0, continuously allocate space - doesn't allow you to do graceful expansion - shuffle/copy whole thing
Pros: easy to implement; reads are efficient - a single seek to beginning
Con: fragmentation (files grow and shrink and require shuffling around)

Linked list allocation

End of every block of File F is a pointer to the next File F - blocks can be anywhere in storage medium
What if you have to index into half a gig? Must be constant time operation; linked list requires you to follow it
Pros: conceptually simple: a file is composed of blocks that point to each other
Con: random access is slow because we must traverse the list

FAT Allocation

Table that is a collection of the pointers - tweaked 'link list' approach to factor out all 'list' pointers embedded in each block. Keep pointers in a single coherent table
Can look up block number in table - random access easier as whole block can be dedicated to data
Need to spend more RAM to hold the entire table
Need to have entire table in memory to be constant time operation; for large disks, you have a large table even if entries in tables aren't used

Flat inode

The collection of inodes at beginning of the disk each hold a number of pointers to the disk locations of the blocks containing their file data

Multi level index scheme

At beginning of disk, you have collection of inodes that are pointers of blocks of each file. You have one inode per file, and inside the inode is an array that does the indexing similar to page tables (small amount of memory that indexes a large amount of space)
Common file sizes notion - first few pointers direct to data.

Questions & Answers

Q:

Do you do this analysis (of allocation approaches) before building a system? How do you go about it?

A:

The real world is concerned about what we did yesterday. Measure in high fidelity way, and see if there's a disconnect between the way we are doing things and how we should be

Measure how people are using files today to write a file system

Q:

When you do a 'hard link' (way to reference directory from another) is there only one inode?

A:

Want to make /data/seek.c be a shortcut to /home/eye/457/lectures/files/seek.c

OS makes links - hard link and soft links

Do you need two different inodes to refer to the same file?

The identifier is being duplicated, not the creation of another file

Soft link: entry in the directory that stores the file you want to duplicate

Inode pointer to file name because you're creating another file (allocating another inode to point to file name rather than adding another entry to whatever directory that is)

Different inode

Hard link: ln <file_to_duplicate> <target_name>: links files

Name is duplicated, same inode

Back To Navigation

November 20: The Linux VFS

Specification of timing measurements in Assignment description. For full marks, do it
In kernel source code, many file systems to use - some specialized (network, reading/writing format)

Disconnect - some file systems that are specific, then a system call that opens/read/write any file we want to without worrying about physical length or file system is
If writing on file on a NFS mount, you just call F_OPEN/READ/WRITE - don't care to know that particular file system is mounted

Amount of effort hiding those details is a huge effort

Part of it is Virtual File System

Command Line Utilities

echo "Portlandia" > file.txt

Part of shell that opens the file name and calls F_WRITE to put contents in file.txt

cat file.txt

Simple program that opens the file, then reads and writes to stdout
C code

-> fopen()

-> open("path",flags,mode) = fd ... read(fd, buffer, num_bytes);

Location, r/w, permission bit
The string "path" would be the "file.txt" name in the current directory, but could have provided an absolute path name like "/home/eye/file.txt"

Kernel knows to take file string into rest of what kernel can work
As a user, stop caring what happens at open()

System call layer: open, read, write ...

Is VFS system call layer? No

VFS: below system call API for file-related system calls

These two APIs are right beside each other and works well because benefit of having this system call interface is quite stable

No need to change code to read/write file, which is great
Simple, stable API with stacking mechanism
Without VFS underneath system call layer, need code that worries about how to locate file name/paths on differentiate media in different systems

Would make the API buggy and error-prone as packing that code in open/read/write -> lots of code that is error-prone

Design need for factoring out different file systems -- it's an internal API for kernel

vfs_read() system call delegates to the read function

Kernel looks at a data structure that represents the file path and all metadata with it; also will know what file system it's stored in
Invokes particular file system -> delegates to a generic I/O layer that caches so file access is fast -> piece of hardware (solid state storage, USB, network)

Seven layers of calls that happen, and each layer deals with a specific abstraction that allows us to expose a nice, consistent interface for user level programs and flexibility to access different devices and media at once, in addition to many ways of storing information

`Inode` Data Structure

struct inode
How system support open/reading bytes from a file
inode is a generic concept and specific kernel data structure

Index node that references metadata about a file, allowing an OS kernel to contain metadata and reference the specific locations of it
Unifies file system to VFS

Couple of versions of inode

Generic (looking at here) which is an in-memory representation for general purpose inode information, or when you open a file an inode instance is created in kernel memory
ext4_inode: bytes on disk belonging to ext4 file system

Fields, mainly, but Line 461: array of single/double/triple indirect indexing blocks that can talk about large/sparsely populated files

Lines 256-260: size is ET4_N_BLOCKS recursively defined by symbols, that matches 12 directly mapped + 1 (from Line 258) + 1 (from Line 259) + 1 (from Line 260) -- single, double, triple

`open(2)` System Call

open(2) system call
Line 1053: entry point of table for open(2) system call
Some familiar arguments (name of system call, list of type of first arg; format of type and name
Implementation of open(2) system call is short - almost nothing there
Line 1060: ret is most important - needs to delegate job to someone else

Entry point to system call that has a helper function (do_sys_open())

Line 1031: pass in familiar options
Line 1034: turning file name into a fd so that kernel doesn't have to deal with arbitrary strings

Resolve fd by opening it
Associate string temporarily to a small integer for the rest of the system calls

Line 1039: do_filp_open(): internal kernel service routine for loading metadata about the file, i.e. inode and referencing to it as file *f

Create mapping of struct file and fd; if it's successful, can call fd_install (Line 1045)
Line 1018: fd_install

Local variables that set it to current (currently executed process)
Line 1022: spin_lock protects data

Turns linked list into something that can be treated into an array (Lines 1023-1024) then associate task file created based on file name based on fd (Line 1025)

Line 1669: do_filp_open

Work of actually opening files - is the path legal, does it exist, does it need to be created, does it have permissions?

`read(2)` System Call

read(2) system call
Line 372: Not much work at this layer

Dealing with specific fd, pointer to userspace buffer to read into and number of bytes to return
Line 374: work in function is struct file which is a data structure standing behind fd
Line 378: call routine fget_light() to look up fd and return data structure associated to it (to file)
Line 379: if successful and no null pointer, do work of vfs_read()

Line 381: find file offset and semantics want to start reading at position of some number of bytes in buffer, referring to a structure

Line 277: Magic!

Argument checking: can we access, are we allowed, etc.
Lines 289-302: the meat; if you know what file (keeps track of metadata) is not null, then do a ret =..., else do another statement
Line 291: file->f_op a member of that is a table of function pointers
Line 292: if function pointer not null, do read routine for file - represents delegation to specific file system of where it's stored (where data rests)

If file is on ext2 or ext4, that step is where vfs delegates read routine to go to that file system

Line 294: else do_sync_read a general purpose read

Line 252: leave VFS, and reach block I/O layer (does I/O on logical arrangement of blocks not yet at device driver)
Line 262: for(;;) accesses f_op to aio_read - kernel does asynchronous I/O which makes sense as calls to read/write are slow compared to CPU

Usually these are put to sleep because scheduler can't know when data might be delivered. Kernel is providing generic operation which eventually goes to a block layer that talks to memory - files are in primary memory/RAM for speed

Go back to linux/fs/ext2 and not much appears - see header file

File_operations initializing operations

Function pointers assigned to address of kernel

Implementation of read is do_sync_read
VFS gives ability to invoke particular piece of functionality

Back To Navigation

November 22: Extended Attributes in Ext4

Feature of many modern OS
File systems are a data storage mechanism

Major data it stores is files
Other attributes of files kept for bookkeeping and other purposes (file permissions, etc.)
Opportunity to store important metadata about files and file system

Today's topic: ability to store metadata and what that metadata might be used for
Hint: Slide deck probably won't be too helpful or useful (in Michael's words)

Goal

Permission bits: chmod to modify them
Other metadata: extended attributes, access control lists (built on extended attributes)

Finer grain notion of who can modify/access rather than you (group) and everybody else

Access Control Lists

Traditional security mechanism
Security world: access control matrix, which is a table

Columns: resources in system - files, anything (way to identify them)
Rows: principles, subjects involved i.e. actors

Table describes who can do what to each resource

Interesting because it can be read in two ways. The act of reading it implies how the system is set up, because the matrix enforces restrict access to resources
Expresses a policy (rule that should be true about the system)

Subject's perspective vs object's/resource's perspective (row-wise vs column-wise)

Row-wise: Capability, i.e. every user has a set of capability

E.g. Alice can rw to /etc/x, etc.

Column-wise: access control list, i.e. /var/r has access control list where Mike has n permissions, etc.

We take perspective of access control list - need to store the ACL with the resource, because upon opening/reading/writing it, you want to look at the permissions. In practice, take ACL approach to security

Is the approach better/worse than Linux permission bits?

Better as you have more refined permissions

Specific users and permissions vs groups and everyone

More general mechanism. Allows totality of permissions - can still create groups of collection of subjects
Need to keep data somewhere, so original designer said we split it in 'you' 'your group' and 'everyone else' but the table is general enough to capture all permissions

Example: Setting Access Control Lists

Locasto logs into ssh. Goes to /tmp and types password into a file

cat locasto.password.txt -> returns secret7

Cannot read it - UNIX permissions do not allow

chmod 644 locasto.password.txt

Retains own permissions and everyone gets read access

Want only a few people knowing the password: can let you into the 'profs' groups, but allows them to gain access to everything else
How to share files with teammates? Do not use Dropbox!

Make a new group, add them to the group, then properly set UNIX permissions

groupadd foo

Normal users can't create groups
UNIX permission bit is lame - doesn't let you do what you want to do, which is composing an arbitrary set of people to get a set of permissions

mkdir michael

chmod 400 locasto.password.txt

Want to share with group. Who are you?

Ex. kkajorin

getfacl locasto.password.txt

See standard UNIX permissions - how these things are implemented
By default, every file has a default set of access control lists, and have user/group/other

setfacl -m "u:kkajorin:r" locasto.pasword.txt

Added user when looking at getfacl, but unless you're in that directory, you cannot see the file
Permissions on directory is rwx
To allow user to get into directory needs read access

chmod 701 michael

Semantics of r/w/x on directories

Directories are files and lists of other files; changing into it executes it, listing just reads it
User can read file. Cannot list anything else in the directory though

setfacl -m "u:kkajorin:rw" locasto.password.txt

How is the kernel doing it? Where is this data stored?

File system, as data can't just appear out of nowhere in memory; it is stored alongside files

Extended attributes, or ability to associate arbitrary key-value pairs with every file
getfattr locasto.password.txt examines this stuff

Nothing happened. Need to dump information

getfattr -d locasto.password.txt

Result are still disappointing. What about -m pattern? Default pattern is user, but everything we want is specified with "-"

gefattr -e "hex" -m "-" -d locasto.password.txt

Encoded to see how they map to r/w/x for arbitrary groups and users

Extended attributes are useful for own purposes

How to associate a particular arbitrary key-pair value?

setfattr -n "457" -v "is taking too long" locasto.password.txt

Permission denied

rm locasto.password.txt, echo "secret" < foo.txt

setfattr -n "457" -v "is taking too long" foo.txt

Operation not supported. Can go to 'see also' in manual

Attributes to support arbitrary key-value pair stuff is defined as set of namespace for keys (ACLs, etc.) - different categories for these namespaces

What's wrong with asking OS? Not giving valid key namespace

setfattr -n "security.457" -v "..." locasto.password.txt

Operation not permitted

setfattr -n "user.457" -v "..." locasto.password.txt

Valid

How to retrieve attributes of this file?

getfacl only gets access control

getfattr -d foo.txt

Shows everything
Gets you associated arbitrary metadata with any file you want
Mark up permissions or key words or indexing words
Track field types or default permissions of opening file
Powerful, customizable mechanism

getfattr -e "Text" -m "-" -d foo.txt

Displays everything
Can have arbitrary values and name-value pairs that have to do with other systems

Metadata hiding around in these file systems - point is, file system must support this. Not only is it implementing getattr, setattr system calls but must store this metadata

Wanted to add attribute to file

setfattr -n "user.something" -v "457" hello.txt and now the "user.something="457" appears

cp hello.txt /media/exthw3

Everyone can read "hello.txt"

getfattr -m "-" -d hello.txt

Where is the label? Extra attributes not automatically copied over; metadata is not in the file, it's part of the file system

Not preserved, just recreated

Similar difficulty when copying files between Windows, Linux, and Mac - trying to write out a file system to CD

When transferring data from one file system to another; not just one place to another, it's one format to another
May not be reserving attributes and will bite you later. E.g. if you transfer files and directories that are case sensitive to a directory that isn't

Bottom line: useful sharing mechanism to temporarily give access to files for some people

Question & Answer

Q:

If you copy to the file system, would it be different?

A:

cp hello.txt /home/eye/h.txt

getfattr -m "-" -d h.txt

No, you can -p to preserve. It's not part of the default since cp has no idea what you're asking it to do. Doesn't know ahead of time that you're copying the next directory over

Stuff that aren't basic attributes of files won't be copied. When file systems have trouble of case of length of file name, preserving extra stuff isn't in name

Q:

Where are timestamps kept?

A:

Three timestamps: Access, Modify, Change. Where is info contained?

Check the LXR. There are two inodes for timespec (Recall last lecture)

stat accesses basic metadata of a file - name, timestamps, user, owner, group, how many links, inode number, etc. which is associated to the inode of the file

struct inode and ext4 inode are different - each file system has a specific inode struct that contains metadata of a file system

Fields are replicated in other children of inode - literally written on disc in inode

Q:

Will copying something to a new file system maintain metadata?

A:

Copying something creates a new file of the destination

Timestamps would not be copied over - new ones would be created. Depending on tools, you can preserve old information that overwrites what would have been there ('now')

Back To Navigation

Week 12

November 25: Example File Systems

Learning in HW3: seemingly simple problems like writing one big file or many small files is not as straightforward as they appear

Exercise may be deceptively simple, but people inquire if it's the performance you should be getting

Mounted three different file systems in the lectures folder (/home/eye/457/lectures/filesystems/[ext2]/[ext3]/[fat]
OS unifies operations in VFS
When you stretch system to its limits, how is it possible to write 1 GB of data? Or a million files?
How do we design a minimally viable file system?

Minimum Viable File System

Treat disk as an ideal virtual storage medium. File systems can mostly ignore specific hardware properties, but take some common properties into account: virtual model is the common design target
Assume you have a well-behaved storage system (rotational disk)
Care to provide some number of units of space, 0->M

Stick a file system to it - but what is its job?

To store files and retrieve chunks of the file content

What are we supporting?

Operations: open/create, read, write, seek, close (also: unlink but don't consider this yet)

Recall that a file names a collection of bytes (block of bytes)
API that should be supported: traditional UNIX file I/O syscalls

open("path")

Implies a way to translate path names to 'files' or 'inodes'

open()

Lookup file in directory, which may take long for large directories

seek(offset)

Implies a way to perform 'random access'; O(1) access to any byte offset in a file
Note that support for seek implies or requires 'seeking' within something: a virtual file space made up of a range of bytes from zero to max file size

seek()

This may take a long time for large files; we don't wish to visit every byte along the way

write("msg")

Lazy is efficient, but may lose data on a power loss; requires journaling

read()

Cache content in primary RAM

close()

What data structures?

A list of open files per process (belongs at VFS level)
For each open file, a current offset
A way to map files from where they are in the directory tree to where they might reside on a physical device
Key mechanism: map(byte offset) -> logical block number

Example:

byte offset = 1253
block_size = 1024
lbn = 1253/ block_size
lbn = 1253 / 1024
lbn= 1

This means that the byte offset 1253 is in logical block 1 of the file
In ext2, finding this logical block is a matter of following the second direct-mapped entry in the multi-level index array
This logical block could be anywhere on the disk
A file system really keeps track of which blocks belong to which files

Minimum Viable System Diagram

Bigger image

One is on the disk
The File System: start at beginning of hardware, write whatever you wish, close the file and repeat
These files take up to a random location k (did not add to diagram)

Add a new thing to keep track of where the next sequence of bytes are

Problem: used i->j for file Two, so where to stick partition?
Solution: stick it somewhere else to p - may have different entries like file Two, a position to record and a pointer to somewhere else

How to know where file is? Maybe dedicate some portion of beginning of disk to this meta information

Need meta data for things like where the starting location of each existing file is
Not all files are contiguous; where are all the corresponding parts? File Two and Two' needs a way of indexing to find the next set of bytes of the same file

File is a way to name bytes (no need for it to be contiguous)

Files not as arbitrary length of bytes, but to needs to constrain contents of a file in terms of a fixed size (a block)

Important so that they are predictable (but doesn't have to be)
Pick some KB; file system can manage files in terms of sets of blocks rather than bytes
If file Two needs a byte, can take n bytes to store an index for next location of file Two'

Or take the entire block, copy to another location then add another byte

Adding a byte to a file should not require copying, mangling or adding metadaa to it. If you're doing a seek to a specific position of a file, don't want to count for scattering of metadata intermingled with actual data

Talked about indexing allocation, FAT allocation which is the foundation of what file systems are
How to get data on a disk? File systems do it efficiently; but it also looks up files and tries to find a particular byte in a file

How it does this job is what file system it is

(1) Need to know where start of file is

(2) Need to know where subsequent bytes of file is located

Design requirements: Delete, expand existing file

File Allocation Table (FAT)

Need to solve problem of finding the first byte or position of a file/block of a file, and an offset within it - need ability to do random access
Table at beginning of disk that describes this
Table allows you to take a particular file and the file system translates the file to a logical position on the disk

FAT Table Diagram

Bigger image

File One starts at position one
Have Next "block" and end of file (EOF)
Notice that you can map pretty much any file to the start of a position/offset in table/array and follow a set of pointers to find the next location
Easy, flexible system
Need to store the table in memory, but memory is cheap
FAT 12, 16, 32 versions

32 takes 32 bits to an address -- what's the most you can address? 4 GB of data
This limits file size. 16 is even worse

Table gives nice ability to know where files are (required)

Choice of size impacts max file size to have, and number of files you can have

Why impact number of files? With no space in memory (don't want to go to disk), there's no flexibility

Size of the table is creates a hard limit K limited to memory (RAM)

Issues: complete table, every entry exists (may be EOF, 1 or unused)

Must store the whole thing in memory

Solves problem of start of file, allows to gracefully move from one portion of file to next; but have hard limit of file size and number of files

Ext2/Ext3/Ext4

Distinction of 2 and 3: 3 has journalling, 4 is a revision of 3
Ext2 is the default file system in Linux
How is it different than FAT or the minimum viable file system? What design choices were made?
Have to pick sizes when it comes down to it - number of files still matter
In ext2, a file is a semi-contiguous series of blocks, with descriptive metadata contained in an inode (index node)
Allocation strategy is to give 8 blocks to new files in anticipation of future growth/writes
Block size can vary, but is usually 1 KB, 4 KB or 8 KB

Ext2/Ext3/Ext4 Diagram

Bigger image

An ideal storage medium which also splits the disk (typically does it when partitioning disk)
swap area helps virtual memory, rest of it is about storage
Split up disk into Block Groups

Any particular block can belong anywhere on the disk

Inside a Block Group, the format of ext2:

Physically on the storage unit
Not infinite in size

In a Block Group:

The start is a Superblock

Main/primary superblock copy; other copies are scattered over disk
Describes properties of a file system stored on the particular location of the disk
File system meta data, version, etc.
Number of inodes and disk blocks
Start of free blocks

Group descriptors

Has metadata about other parts of the block group
Number of directories
Pointer to block bitmap, and inode bitmap

Bitmaps

Indexing structures purely for performance - instead of linearly scanning though inodes, can use these
Bitmap block: 1 block in length (e.g. 4 KB), a bitmap of free blocks
Bitmap free inode: 1 block in length (e.g. 4 KB), a bitmap of free inodes

Inode table

A few blocks are dedicated to list of inodes
Set of data structures that allow you to find files - one inode for every file
Not these are set at file system creation time, thus limiting the default number of files

Data blocks

All the data files

Inodes contain an array that serves to index into the Data Blocks portion of the disk

Index Array

0-11: direct map/contain block number
12: single indirect index (points to another block containing pointers to data blocks)
13: double indirect
14: triple indirect

These point to specific disk blcoks, but are not 'in' the inode itself; the inode contains metadata, not data

When creating the file system, have to dedicate a certain number of blocks to give space for the inode

Max number of inodes, max number of files
Real limitation: once you create a file system, the number of files you can store is fixed
Implication: have all inodes that you want on the disk

Ext2/3/4 does this well: given an inode, it recovers the location of the rest of file

In an inode structure, there is an array of indexes (a bunch of pointers to actual disk blocks) so that the file doesn't need to use the whole disk block

Each inode holds pointers into the logical positions on the disk, which allows the file system to traverse array quickly to locate file data

If the file data is larger than some k size, can look up in single or double indirect indexes

Ext2 gets around some restrictions

Can address large files and find them efficiently - finding a particular block is not a linear scan

It requires looking up the inode, looking into the array and when given the offset, indexing into the array (multiple pointers may be referenced)

Not unlimited: things like ext2 are limited in size of files they can manipulate
Metadata that starts at the beginning of each block - not just data

Having it means you are dedicating a portion of disk space to metadata, so don't have all of it free
Although can efficiently locate inodes and data blocks, still have hard limits on number of inodes and files

Shortcomings: all files, at minimum, have a block size. Even if you have a small file, it will still take up a block

Small price to pay as you're still able to select in fixed size blocks (despite small files taking up an entire block when not needed)

Example: opening/writing bytes on a file, then writing 10 bytes in the 512 block

How many blocks to allocate? At the moment only need 1
But consider behaviour of systems - what do they tend to do? Expand! So the file system will grab a free inode, fill its data structure with the location of free block then write some data to it
Files want to expand. Even though most files are small, you will start using up entries - not a better performance by randomly selecting a free block
When you create files, ext file system preallocates and defines 8 blocks for you ::* 512/8 blocks = 4 kb -> 32 bit, which makes sense because files are relatively small

Good strategy as most underlying pieces of hardware are pretty efficient at reading/writing something there than writing contiguous blocks - trade off between 'spend 7 unused blocks versus expanding files'
Managing piles of blocks: when files are allocated, 8 blocks are written - if grown above 8 blocks what happens?

There is a well of data blocks: can be highly fragmented, particularly systems that have been running for years

Is fragmentation a problem for ext2/3/4?

If you start to access and read, then end but need more; need to do more inode indexing. But if you're reading and go from the start to end and need to read, it's fine

Doesn't matter for read/write. If you need to do it for next chunk of file (next block) is there a performance hit in terms of locating the next block?

Issues: underlying file storage and performance of file system and locating the next one
Logically speaking, fragmentation is not an issue for indexing as at most you're doing 3 pointer references - physical address to logical block is not an issue

Can have highly fragmented blocks. Not concerned about doing a write, and running out of space. Just go back to the free block list, fill it in, and so forth - suboptimal but it works

Problem: When you get a full file system with many files, or when you've used a lot of number of blocks, indexes have to be searched for 3 inodes which can be expensive

Inode table has pointers, like a linked list (table -> block1 -> block2 -> block3)

Ext2 vs Ext3

Ext2 tries to keep files and directories relatively close, but it's a fairly simple and elegant file system that allows you to find data quickly and have large files

Gives us the basic. If computers are perfect, can walk away

Ext3: sometimes computers crash and when they're crashing, CPUs are doing something and some of the code executed is writing data to a file (pending writes)

Program trying to write to some fd, but what if machine dies? Writes are half-complete, writes in queue (that use programs believe have happened) are not flushed

Power cycling the machine should go from picking up to where it fails - checkpointing everything in machines like IBMs, but mostly just hoping for this

Hope for crashed state to start
Problem is not that the data at a particular block/file is gone or corrupted - that's a user problem they can deal with (no appropriate integrity checking or checksum) but the problem is de-allocated inode

Kernel data structures that help manage data I/O might be inconsistent or corrupted -- not just he user data
Non-atomic operations metadata for file system is corrupted. With multiple CPUs, lots of issues - multiple transactions happening. If it's messed up then the file system is messed up and the program may not work, thus the machine may not boot
Answer: take a database class and learn what they do to ensure operations are successful

Keep a 'journal'

Ext3: Journalling File System

Ext3 adds this
Gives you different options, boiling down to keeping track of log of operations that the system is about to do
One option - journal: all file system data and metadata operations are logged

Before system does anything, have fixed size buffer that you write transactions onto; can write both data and metadata but impacts performance as you capture everything about data before you are doing something

Another option - ordered: metadata operations are logged, but real data is written first. Reduces chances of file content corruption (ext3 default)

File system looks at state of journal and sees if they're committed, pending, or about to happen

Can recover, commit ones that are appropriate and stop and point of corruption

Last option - writeback: only metadata is logged, which is fast (e.g. XFS)

Keep track of metadata, and then write data to disk so that write data disk happens, then record what you did - when you recover you go up to that point, or last successful point, and if the transaction of writing disk happens just fold

Journal is a small buffer of file system operations/transactions
Journal is a small set of disk blocks (e.g. 64K) orderd as a cicular buffer of recent disk write operations

Journalling: record of file system meta data
Goal: avoid constantly running consistency checks on entire FS. The point is to avoid having minutes-long procedure occur when the system comes back up

Writeback: XFS

Design goals for this file system (from Silicon Graphics) centered on supporting intense I/O performance demands, large (media) files, and file systems with many files and many large files

Terabytes of disk space (so many files and directories)
Huge files
Hundreds of MB/s of I/O bandwidth

XFS was a clean-slate file system design effort that supports 64-bit file address space

Internally manages groups of disk blocks and inodes in Allocation Groups (AG)
Also manages groups of blocks called extents rather than individual blocks

Part of what makes it efficient is the use of B+ trees for many of its internal data structures rather than the bitmap approach of ext2

Example: it uses two B+ trees per AG for keeping track of groups of free blocks (or extents)

Entries in the tree are extent start, length pairs, and one B+ tree is indexed by the start block of free extents and the other is indeed by the length of free extents
This pair of data structures makes it quick to find an extent of e right' size or to find an extent by the right 'position' (or intersect two sets of possibilities)

XFS can easily dynamically scale the number of files it support because pools of inodes are created on demand rather than set at file creation time
XFS supports 'large' directories (i.e. directories with many file in them)

Many previous file systems performed a linear search through file names in a directory. Some previous approaches used a hashing approach to speed up this lookup

XFS also provided support for journalling metadata to improve crash recovery times
XFS improves I/O performance by using multiple read-ahead buffers (i.e. if byte b in a file has been read, it is likely that bytes b+1...b+k will be read as well)

It bypasses the system buffer caches, and it is multi-threaded to allow many simultaneous readers

Back To Navigation

November 27: Devices and Device I/O

Overview

Don't interact directly with system chip
Experience computer with keyboards, controllers, media types, etc.
Qualitatively different way interacting with computers - major role of OS is to make devices useful
Ability for OS to deal with vast array of devices is interesting task

Main Design Challenge

OS has to offer a consistent way to access such devices and control them, but the wide variety of internal organizations and physical properties of these devices offer a challenging target
Device management exemplifies all roles of an OS: multiplexing resources; protection and separation of resources; offering a consistent API to user applications; servicing asynchronous events, and managing concurrent access
Has to close the gap between what user level expects and what they export to users and low-level hardware
Bridging gap between low level physical reality (devices) and user level programs
OS has to offer consistent and sufficiently powerful interface to user level applications. On the other hand, it has to deal with a wide variety of hardware

Topics

Expose how OS perceive/exposes devices to users
Character, block devices
Major, minor numbers
mknod(1), mknod(2)
mkdir, mkfifo
Special devices (pipes, FIFOs, etc.)
Pseudo devices (/dev/null, /dev/zero)

Garbage can: sink where you don't care what goes into it
Operationally a sink for output that do not care to maintain
/dev/null: when written to, takes it and throws it away

echo "bye" > /dev/null

Program writes to stdout and redirects to file
4 = open("/dev/null") then write(4, "bye", 3) probably happens
Can observe: looks like a file, operated on like a file, but does not behave like a file

Pseudo devices can do whatever you want - LKM

udev; dynamic device registration in /dev
I/O transfer / BUS
Memory mapped I/O
DMA
Interupts vs polling
man 3 makedev
Devices are nothing ore than files
Exposes to users to manipulate device through standard system calls
Powerful design

The Device Landscape

Device Controller Diagram

Bigger image

A part of the kernel is code dedicated to talking to a particular device (device driver)

Typically implemented as LKM
Nothing more than a pile of code
Purpose: to hold conversations with an actual device (anything you can think of)

Device makes physical reality happen - printing device, monitor, etc. - has some physical effect in the real world
How does the OS know how to manage that particular device?

No native code in kernel to talk to device

Public half of device - kernel needs to help manage discussion or communication, so devices include hardware or processing registers/memory buffer in public side (device controller)
Device manufacturers hide their secrets and export minimal control interface in data storage (device)
Device controller accepts commands and data from device driver and does whatever

Ex. Hard disk: commands/requests passed from kernel to device driver (buffer layer) eventually asks for block number

Not actual location on physical disk, but the request passes through device driver, translated to some request (at particular cylinder/block, given some bytes) and device controller can ignore/rewrite request

How internal organization of device is not necessarily known or matches the public API

Have kernel driven by process requests, translated through file I/O layer of direction, and goes through code that knows how to talk to specific device; then go through some hardware communication and hopefully get some data back

Interesting details, how is communication achieved?

CPU's role:

Executing kernel code to execute statements to device driver
Interrupts

Occur if device reads some number of blocks, devices are slow so request has been serviced by controller and the initiating process gets taken off, while something else is being executed
If device controller gathers bytes of communication, device might signal CPU and say 'hey I got this data you asked for'

What happens when an interrupt comes from the device to an interrupt? Algorithm: check, fetch, execute...

How does transfer of control happen? CPU is doing some arbitrary calculation. Asynchronously an interrupt happens, and it looks up in IDT

Given a particular device, an interrupt vector given, look up in IDT what happens next
Does the CPU job, then goes to receive routine and goes to control registers/state of buffer (fixed number of bytes that become available)

Device driver is responsible for reading this data out into some buffer in its own memory space. Acknowledges interrupts, gets data, and leaves data in buffer to call schedule or returns from interrupt handler to pick up

Not in case of receiving network packet and immediately gets data - runs from hardware, store in memory, resume; eventually program doing I/O will be rescheduled and it can extract data back to stack into itself and gets copied from kernel space to some user space (control flow)

Asynchronous, event-driven, unpredictable. Dance between hardware device, CPU and OS

Device Files

/dev lots of files - create device representatives
Every device installed on machine has a handler in this directory
OS can export devices as named file systems
Can mount a device (mount /dev/sdb /media/sspace) at any particular point in the file system

No need to worry about device; can take context of device and attach anywhere in directory tree - don't care about underlying hardware

Device I/O API

Devices usually treated in the same way as file: have a name that appears in the file system and standard system calls like read(2) and write(2) can send and receive data to and from device
A thing is not what it is named, but what is done to it

Recall type of a program, instance of it is not that type; it's what happens in that sequence of bytes

Devices typically viewed, accessed, and controlled as either:

Block device - data is accessed a block at a time; random access
Character device - a stream of characters (or data values)
Generally, OS can view device as either of these: two different semantics

Block device: I/O treated as transactions of random access at any particular block (fixed sized collection of bytes)

Hard disks (get a whole whop of bytes)
Random access, get block number

Character device: semantics of sequence of byte values

One value, one byte
Mouse, many devices in here

Categorizations is useful in abstract sense, may not match reality

Rotational media - stream of bits read in serial, and controller collects them into a block of data that exposes itself to device drier after
Devices don't intrinsically need to be either - just a convenience of how controller exposes to device

Major/Minor Numbers

User or OS creates files that represent each device
For kernel to know which driver should aid communication with the device, the kernel needs to map drivers to device files
Does this by using major/minor numbers, which define a name space of device types and subtypes
List somewhere, and even though there are system calls like mknot for creating devices, have to give major/minor number
Numbers help OS decide when a write system call is invoked
4th column of cat /proc/self/maps is major/minor numbers

Files are resident on major 08, minor 3 (hard disk)

How OS knows which device driver are: a set of routines talk to which device, even though it's exposed to interface - based on file names and locations

Example Driver

How grungy we've become when you drop out of bottom of kernel - drawing halfway in kernel because much of code is not like kernel - must know about hardware
drivers/net/3c501.c
Comments: reality is hardware is complex, buggy, process of manufacturing is a tall order

Producing these physical realities making sure other pieces of software can be a hard job
People who can write device drivers are strange. Must know what's going on at the hardware level and how to talk to kernel

683: when device receives data from network

Observation: low level I/O, hiding behind things like inw and outw (assembly instructions that ask CPU or another controller variation of communication to device over sequence of bytes

A lot of low level interface is documented in device drivers - never a straightforward thing. Must be aware of idiosyncrasies of actual device
508: tulip

Interrupt routine - servicing interrupt is about what system call
ioread32: next to hardware

Acknowledge interrupt, literally asking privileged instructions and has to be in kernel mode - sending magic values to some control device

Unlike other kernel code

One model:

Another: map buffers to physical portions of memory so retrieving data uses standard assembly instructions to move data to CPU
BMA: communication memory to memory - what trade offs to take?
Chip complexity, speed, simplicity of architecture

How communication happens: medium between devices and CPU, and requests goes across medium

Question & Answer

Q:

Does IDT have unique entries for each device? It doesn't seem plausible because every time it adds a new device it would be recompiling the kernel.

A:

IDT is a table, and can dynamically write entries. There are some default entries like 0x80, but some reserved ones that handle software expectations - when it's empty, can write any handler

Q:

Don't we handle address translation in the CPU?

A:

That's memory. Block numbers are for block devices is a different kind of address

One way to view a file are blocks, but also they define an address space of bytes. If you want to find an offset b, divide by block size and get a block number

Has nothing to do with memory translation where CPU is given a block (doesn't know where it is because devices are not perfect and blocks may be bad; device keeps track of itself and doesn't expose to CPU)

Q:

How is the IDT created?

A:

IDT is an x86 specific thing. Been around since the beginning - it's a traditional data structure. It's a way to handle interrupts. Digital devices have to handle them. How it gets done, job of system software

Back To Navigation

November 29: Disk and I/O Scheduling

Overview of Mass Storage Structure

Magnetic disks provide bulk of secondary storage of modern computers

Drives rotate at 60-200 times per second
Transfer rate is rate at which data flow between drive and computer
Positioning time (random-access time) is time to move disk arm to desired cylinder (seek time) and time for desired sector to rotate under the disk head (rotational latency)
Head crash results from disk head making contact with the disk surface - bad

Disks can be removable
Drive attached to computer via I/O bus

Host controller in computer uses bus to talk to disk controller built into drive or storage array

Magnetic tape

Early secondary-storage medium
Relatively permanent and holds large quantities of data
Access time slow
Random access ~1000 times slower than disk
Mainly used for backup, storage of infrequently-used data, transfer medium between systems
Kept in spool and wound or rewound past read-write head
Once data under head, transfer rates comparable to disk
20-200 GB typical storage

Moving-head Disk Mechanism

Check diagram on slides
Two important factors to get to the right sector to read the content
Get to sector s

Two different motions; the head should get to the right cylinder c, and rotation to get to the right sector s

Disk Structure

Disk drives addressed as large 1-D arrays of logical blocks, where the logical block is the smallest unit of transfer
The 1-D array of logical blocks is mapped into the sectors of the disk sequentially

Sector 0 is the first sector of the first track on the outermost cylinder
Mapping proceeds in order through that track, then the rest of the tracks in that cylinder, and then through the rest of the cylinders from outermost to innermost

Disk Scheduling

OS responsible for using hardware efficiently. For disk drives, this means having a fast access time and disk bandwidth
Two components

Seek time: going backward/forward

Time for disk to move heads to cylinder containing the desired sector

Rotational latency: rotation

Additional time waiting for the disk to rotate the desired sector to disk head

Scheduling algorithms are about the seek time - the head going backward/forward should be small so you have faster access to the disk

Minimize seek time which is approximately equal to seek distance

Disk bandwidth is amount of information you get in a period of requesting and getting

Total number of bytes transferred, divided by total time between the first request for service and the completion of the last transfer

Example

Several algorithms exist to schedule the servicing of disk I/O requests
Request queue (range of arm) goes from 0 - 199
Head pointer: 53

Initial state of the seek arm

Queue: 98, 183,37, 122, 14, 124, 65, 67

First Come First Serve (FCFS)

Start at beginning of queue and go to very next element in queue
Worst algorithm
Total head movement: 640 cylinders

Shortest Seek Time First (SSTF)

Selects the request with the minimum seek time from the current head position

Wherever the head is, goes to closest cylinder

SSTF scheduling is form of Shortest Job First (SJF) scheduling

Downsides of SJF scheduling - might starve some accesses

Queue is pre-defined, but should by dynamic

Never gives access to far cylinders (starvation)

Total head movement: 236 cylinders

SCAN/Elevator

Disk arm starts at one end of disk, moves towards the other end and servicing requests until it gets to the other end of the disk. At this point the head movement is reversed and servicing continues

Scans the entire area, going from 0-199 and comes back (like a sweep)

Downside: consider this as a dynamic system. The access to the different cylinders are not uniform (no fair access to disc)
Total head movement: 208 cylinders

Circular Scan (C-SCAN)

More fair, more uniform wait time than SCAN
Scans the entire disk
Head moves from one end of the disk to the other, servicing requests as it goes. When it reaches the other end, it immediately returns to the beginning of the disk, without servicing any requests on the return trip
Treats cylinders as circular list that wraps around from the last cylinder to the first

Circular Look (C-LOOK)

Version of C-LOOK
Arm only goes as far as the last request in each direction, then reverses direction immediately without first going all the way to the end of the disk

Doesn't go all the way left (0) or right (199), saves cylinder migration

Selecting a Disk-Scheduling Algorithm

Depends on what you're doing - should switch between algorithms
SSTF is common and has natural appeal
SCAN and C-SCAN perform better for systems that place a heavy load on the disk
Performance depends on the number and types of request
Requests for disk service can be influenced by the file-allocation method
Disk-scheduling algorithm should be written as separate module of the OS, allowing it be replaced with different alogorithm if necessary
Either SSTF or LOOK is reasonable choice for default algorithm

Disk Management

Low-level formatting or physical formatting - dividing a disk into sectors that the disk controller can read and write
To use a disk to hold files, the OS still needs to record its own data structures on the disk

Partition the disk into one or more groups of cylinders
Logical formatting or 'making a file system'

Boot block initializes system

Bootstrap is stored in ROM
Bootstrap loader program

Methods such as sector sparing used to handle bad blocks
Numbers are uniform, can't predict behaviour of process to access disc. Have hundreds of programs running and each of them might want to access disc, don't know how to behave
Not only thing that affects disc bandwidth, just seek time

In addition, have rotation of disk and can happen at same time
Reading/writing content to disk is another factor which can add more time
Might be content of one file - assume it is adding content to different sectors and it's distribution of it

Question & Answer

Q:

What kind of assumptions can we make about distributions?

A:

Hard to have prediction of how they behave. Consider they are dynamic of system. Might be some, like if you have a good disk (brand new) and contents are sequential - better prediction of it

If disk is old and content it spread everywhere, it has different things. In that case, might have a better guess but in general would have random.

Q:

What does OS manage? Where does scheduling happen?

A:

Should be separate from OS - driver does this. Should not be fixed, as scheduling algorithm switches. Might be the case that if OS sees that, switches. Might have multiple OS

Controller manages the way things are put on disk. Size of sectors and things like this. The way of getting access depends on the processes running in the system. Not a fixed thing to put on a disk - it's dynamic and depends on behavior. Whatever is in inside disk, there is flexibility to decide at run time depending on how system is used - better seeking time

Controller considers scenarios of how to store on disk. Scheduling - the way to seek it. Should be in drivers

Back To Navigation

Week 13

December 2: User--level Networking

Management of I/O, management of physical devices, timers, interrupts, ability to transfer data to processes, system call layer, etc.

Network is a group of machines/nodes that speak the same language

Network exists at their pleasure and only as a set of distributed state among their OSnetwork stacks: the data structures and functions that make up the part of the kernel responsible for creating, delivering and receiving packets
Information flows between nodes because the nodes have the same delusion in the same distributed fashion

Buzzwords: TCP, sockets, MAC address, ports, IP address, HTTP, routers, switches, hubs, server, ports, latency, bandwidth, server, caches, OSI, ARQ, DNS, packets, ISP, NAT, OSI

Protocols are a way to organize information
Complex, lots of languages involved

OSI network stack
Heart: stacking of information

Bottom: physical layer, where signals of wires, etc. happen
Above: link layer
Above: network layer
Above: session layer
More layers on top embedding/nesting protocols (HTTP, DNS)
On top is how human perceives the messages

Layers mean there are series of encoding that happens to information (that comes from user, device, program) and communicate across network to another machine (like SSH, going to a web browser)

Piece of information gets wrapped up as the payload of sequence of packets
Each layer has a header or metadata appended to beggining of packet, and get a stream of physical 1s and 0s

Handling abstract model of communication

Lie! No such thing as layer -- just a sequence of bits, and draw arbitrary locations in bit stream to call it something

Sequence of 1s and 0s

Each layer accomplishes a job - maintains a certain type of maintainability and connectivity

Set of protocols

Link layer: direct link between two machines
Network layers gets one packet to another
PHY: actual transmission

Example: User Level Utilities

Know where machine network is. Identifiers: IP and MAC addresses

IP - valid in network layer, MAC - valid in link layer
Can go to whatsmyipaddress.com
ifconfig: tool that enumerates/listing valid network interfaces on machine

Pieces of hardware? Some might be. This machine has a wireless interface
et0 doesn't exist - piece of software that eventually asks MAC OS kernel that throws bits out
Lists out interfaces - virtual construct that represents a piece of network interface card, but don't have to and often don't
Ethernet interface, hardware/how address is hexadecimal MAC address (48 bit number supposed to uniquely identify this piece of hardware)

OS stores this number - if things work the way it's supposed to, it's actually supposed to be burned on hardware and OS checks it when it creates the data structure

Need hardware that cooperates - software stops at link layer
IP addresses are meant to uniquely identify a host, and they're 32 bits

How many hosts? 3^32= 4.3 billion, then run out of addresses. NAT is useful because it makes many address look like one

inet addr<code>: 10 - addresses reserved for non-routable address -- not an IP address to communicate over internet.


 Information helps identify machine

<code>Bcast: 10.0.2.255 - writes data to network and get to some location

Average number of machines between you and destination - traceroute -n www.yahoo.com sends a particular type of packets and all machines along the way see the packet and respond to it in a certain matter

16 - internet is about 16-18 hops wide

Need cooperation of other machines to communicate with anyone else - if someone in the middle goes down, communications can go down
Usually pick another path, but major routing (layer 3) disruptions occur - one machine has gone down
No security: organizations that agreed to cooperate with each other - internet is big bubble with a few number of organization. Nothing prevents you from being one of the organizations. Signed contract to share a connection, forward some traffic. No one node actually knows whether or not traffic should be coming from a peer. Cases - Pakistan advertises yahoo IP space and all traffic goes there. Nothing prevents people from messing with the algorithm

If you provide wrong info, no global authority saying it's wrong; you're just setting information

ping www.yahoo.com: sends ICMP packet to exchange control information from other nodes

Asks OS kernel to formulate a packet with a certain TTL (time to live) and send it to whatever topologies from you and address
Number of things interposing on communication
ICMP is a different protocol than traceroute
Running on VM and CPSC network, filters heavily and probably re-writes TTL to not expose information
TTL is higher than it should be (63 from example)

Wireshark: cache at eth0

Nothing is happening. Once ping www.yahoo.com can see traffic generated
Wireshark pulls apart each packet

Record of each packet sent out/coming in
Breakdown of all fields of packets

Raw information - truth of what's being delivered (sequence of bytes) and programs make use of this by OS breaking it down
Example of type of information OS contends with

Back To Navigation

December 4: Kernel Network Support

OS support a few major activities
Network: data source
Kernel mediated, reads/write to space
Networking support in OS
System call layer and kernel network filtering architecture

Communications endpoints, addressing
Reading from the network (packets)
Writing to the network (to some node either locally or across internet)
Kernel has a way to mediate how packets flow through the network - most interesting piece

Netfilter Architecture

Parts of OS kernel that deal with and manage how packets traverse kernel
Virtual machine: 10.0.15.2 has one defined network interface (eth0)

Traffic comes in and out over this interface

Other side of interface: network interface card receiving/sending packets to aid kernel
Kernel has code to manage mechanism of receiving packets and doing a few different things
Does the packet have a destination IP address of 10.0.15.2? If it does, deliver it to some process (Pn) that's running/expecting data
Packet comes in, kernel decides if it's for me, if it is then deliver to waiting application (web server, etc.)

If not for me, may be a gateway or forwarder - may be in front of a larger network in front of many other nodes

Traverse kernel, interfaces and put to a different interface

Spot 3: routing decision has been made, and may send information out to interface A vs B
To 5: outgoing processes

All code tied to one thing: every UNIX box can do all of this, even though you only routinely exercise one or two of the paths
Ability to intercept packets (like bad requests)

Example: User Tools Communications

ifconfig

IP address: identifies itself to other computers -- can have multiple to associate with a single interface
Identifier machine - non-routable IP address

On real Internet, like: ssh.locasto@cl.cpsc.ucalgary.ca

What does this interface have? Real IP address -- machine on internet can contact this machine (136.159.5.22)

Public address; can send a packet to the destination IP address, and routers take the packet, shuffling eventually to csl machine

netstat

Forms of sockets that are local that behaves like files
Heavily used folder - information tells you who this server is communicating with
Most columns deal with tcp protocol
Who am I talking to? Local address is IP address (0.0.0.0 is any IP), but :902 is the port

Port: destination within the machine - port number is how OS kernel decides who should be given the IP address

LISTEN -- services being run. Can find what port means in file/services
ESTABLISEHD -- connect to port 22 (ssh) - foreign ID/source IP address of where that client came from
OS willing to maintain binding and state of these connections

Example: User Tools Reading

Wireshark is GUI

Can just read data that's there

sudo tcdump -i eth0

Try to resolve various DNS

CadmusCo source, packet #9

Range of bytes that have arrived to network card
Network card should be interested in this message
Who sent this message?
Layer 2 protocol - MAC addresses

tcpdump -i en0 -v -X -A

Give me everything 0 driver location sSee what driver is controlling these pieces of hardware)
If you're not using cryptography, can see what you're typing
It is reality that if you're using a wired interface, nothing inherently private about it

People like to pretend it's not a public activity

Example: User Tools Writing

netcat (nc(1): allow you to connect to tcp connections

Connects to machine -- what happens first is lookup between name and IP address
VM does not know where IP address is - kernel looks up IP address of who to communicate with using DNS request
Once connection is established, you speak http to me
Wireshark reassembled the web protocol and display packets

Note:

Initiating network connection and communicating with it involves a few protocols - implicitly, often involves a DNS requests and may involve who's gateway

Making a connection and diagnosing network problems is rarely a one problem activity. May want to ping, but may not even have a valid IP address, maybe DNS is down
User perspective - can't surf web, but may be regarding network

more nc.out: system calls related to this activity

Establishing connection, put a few bytes together, send over the wire
Much of set-up has to do with binding a socket locally, and bind a port
Number of sockets satisfy DNS request - first has to establish DNS connection with a known DNS server
Once received from socket, can see socket treated same as the file
Received a reply and can connect machine, then send bytes
Once get a reply, connection terminates

Question & Answer

Q:

Are ports PIPS?

A:

No, much like a process can have a number of open files; a port identifier is like that. One process can be opened by many ports, and no need for port number to be equal to PID. Like the mapping of file descriptors, kernel has map of open clients?

Q:

Can two processes belong to one port?

A

No. A web server can bind clients, need sockets on them. Any information that comes back has to go through a program - need to find a port so OS can deliver data

Well known port numbers for certain protocols, but need not bother with it

Back To Navigation

December 6: Final Exam Review

Concept map for topics, structure in three main themes

What an OS is, what its responsibilities are, how it turns programs into processes (ELF, binary formats)
How system architecture supports OS jobs (protection, isolation) and system call API layer (how it operates)

When processes are created, how is concurrency created and managed by OS? What are fundamental blocks of concurrency? What kind of synchronization problems does concurrency introduce?
Scheduling algorithms to choose which process to run, ways to control access to resources, deadlock, etc.

Management of data: volatile memory, primary memory as a resource, how to allocate it and how to claim it, how to address/find parts of it, looked at volatile storage

Persistent storage, same issues allocating/deallocating names and structures
Page replacement algorithms, disk scheduling algorithms

Structure

20% of your grade, and 1 sheet of notes (8.5 x 11, front and back, any font size)
Allowed a calculator -- some math questions
Multiple choice, intention is because the late exam date needs to be processed quickly

Variety of problems, some recall/trivia-type questions, others involve work (take a concept and apply it, running scheduling algorithm to completion)
Encourage to answer quickly, not get hung up to over-analyze questions - no trick questions

Two major portions

22 T/F

Variety of topics. If you apply common sense, should get them right
Esoteric - intend to separate A from A+
Apply logic, get answer

22 MC

File systems (reading BOTH textbooks should help)

5 questions on this

Concurrency, know scheduling algorithms and how to simulate them
Know what turnaround time is
Know what problems might arise in deadlock, and how you might break that
Be familiar with page replacement algorithms

Understand what the internal organization of synchronization primitives is (all differ drastically in semantics and structure)

Understand how a synch primitive may/may not apply to a problem

Ex. here's synch 1, synch 2, select which one is most appropriate

Familiar enough with procedure/mechanism of memory addressing and translation
Understand separation of MMU/hardware's/CPU's role and OS role

Ex. here's an operation, who's doing it, the CPU or OS?

Know how to do memory page replacement
Disk scheduling - plenty of resources, descriptions in OS textbook - know them!

Fun question for the last question. Here's a modification of a scheduler, what does it do?

Idea: not only asking you knowledge of schedulers, but also reality: if you make this modification of a real OS scheduler, what are the implications?

5 questions on file systems, 6/7 with T/F

Know it in depth since it's important to know how persistent data is managed. Concentrate on the ones we've actively talked about (ext2/3, XFS)

Textbook helps, but can be bogged down by details - intersection of textbook and lectures
No content from 1.1-10.1, but emphasize on system call interface
Ask about: main OS concepts - page replacement, file systems, scheduling algorthms, etc. basic skills, types of problems in a typical OS course

Contents of assignment will sort of be on the test, but related to stuff we talked about

Not going to ask system call trace, kernel code details, use mmap system call to ...
Flavored toward general OS systems, T/F has trivia, but should be able to answer from experience

Assembly instruction on the exam (just one), snippet of code (C) with semantics of it - straightforward whether you understand machine computational model

What is OS role of this piece of code?

No translation for hex, binary, sizes, etc.
Calculator usage: track how far a disk has moved, and add hard numbers is tough

4-function calculator should suffice (TI)

Something short of a cellphone

Even distribution of topics - good chunk of file systems, stuff that appears in first half is stuff didn't really ask
Hardest thing: T/F may be intimidating at first ('wow what a random fact') - but apply logic to it and it should make sense
Hardest problem: probably last one since it's conceptual. Here's a change in the OS, tell me what it means
A few may be labor intensive (disk scheduling, track numbers)
All questions worth 5 points that add up to 200
Not cut and paste LXR stuff and its location, but should understand how certain constructs are expressed perhaps in kernel source code. Not necessarily have to remember all fields of task struct, but...:)

After class, HW2, HW3 rubrics will be posted and can get idea of mark
General advice for the rest of the degrees: if you have skills, go get paid. More for CS students: you're going to forget all this stuff (unless you teach or go into research)

Should be able to refresh memory easily - go to right resources

Hope to leave course with some amount of ability to get a good/bad resource - stack overflow is great, but good textbooks, references manuals, etc.

Also: not particular knowledge, but not to be afraid of digging into a system - used to being a user, and if it's not behaving it, look at how to fix problem - should understand how something works

Designing interfaces: requires understanding of what the machine is doing - biologists study living organisms, we study a computer - if we don't know what's inside this thing, probably not a very good computer scientist

Back To Navigation

Courses/Computer Science/CPSC 457.F2013/Lecture Notes/Scribe1

Contents

Links

Introduction

Week 1

September 13: What Problem Do Operating Systems Solve?

Fundamental Systems Principles

Main Problem

Example: addi.c

Commands

Notes

Week 2

September 16: From Programs to Processes

Introduction to the Command Line

Commands

Notes

Example: addi.c

Example: hello.c

September 18: What is an Operating System?

Operating System Definition

System Diagram

The Operating System Kernel

Example: From ELF to a Running Process

Question & Answer

September 20: System Architecture

Looking Forward from September 18 lecture

The Kernel

Refresh of x86 Execution Environment

Example: silly.c

System Diagram 2.0

Registers

Global Descriptor Table

Interrupts

Descriptor Privilege Level (DPL)

Concluding Points

Tutorial 1: Introduction to VirtualBox

Tutorial 2: Introduction to SVN

Guide for SVN

Week 3

September 23: System Calls

System Call API: Roles

System Call API Diagram

The Set of System Calls: Functionality of System Calls

Review: Calling Functions on x86

Example: silly.c version 1

Example: silly.c version 2

Example: write.asm

Assembly Program with System Calls

Example: mydel.asm

Question & Answer

September 25: Process Creation and the Process Control Block

Terminology

Process Control Block

Process States

Process Relationships

Process Creation in Linux

Example: myfork.c

September 27: The Process Address Space (Memory Regions)

Process Address Abstraction

Example: Look in hex editor for one.c

Why have a PAS abstraction?

PAS Preliminaries

Process Address Space

How to Multiplex Memory

PAS Operations

Question & Answer

Tutorial 3: Introduction to C

Tutorial 4: Introduction to C

Week 4

September 30: Memory Addressing in the Process Address Space

Memory Addressing Diagram

Definitions

Example: Translate an address to a physical one

Question & Answer

October 2: Test Review Material

October 4: Virtual Memory

Problem Origins

Focus Question

The Gap and Problems

Seeds of a Solution

Example: `addi.c`

Example: `addi.c`

Example: `hello.c`

Example: `silly.c`

Example: `silly.c` version 1

Example: `silly.c` version 2

Example: `write.asm`

Example: `mydel.asm`

Example: `myfork.c`

Example: Look in hex editor for `one.c`

Example: `greedy.c`

Example: Find the `brk` System Call in LXR

Example: `grab.c`

Example: `give()` and `recycle()` Pseudo-Code