Courses/Computer Science/CPSC 457.W2012/Lecture Notes/Prog2Proc
From Programs to Processes
In this session we will observe some of the details of how an ELF binary becomes a running process. We will consider the rationale for the process abstraction and how it is implemented in the Linux kernel.
Today we will also consider a couple of "big picture" perspectives.
- OS/Systems Concept Map
- What is an operating system?
- Properties of a process
- task_struct (as defined in Linux 2.6.37.6)
We will also see how to trace one aspect of a program's behavior (Measurement) by looking at the sequence of system calls it generates (and one tool for capturing this information).
Focus question: how does a program become a process?
Notes
We bounced around a number of topics this morning on purpose; this was a way to illustrate just how interconnected operating systems concepts are. We moved across a number of abstraction boundaries, first considering how we might formulate a decent general definition of the term "operating system". We arrived at a definition that stated that the OS kernel (i.e., the privileged piece of software responsible for actually driving the hardware and managing software applications) was a (typically) large piece of software responsible for fulfilling some of the roles mentioned last class and in the homework reading, such as hardware and resource management as well as abstraction of system resources to remove some responsibility and complexity from user-level software applications. We determined that the OS is responsible for at least a few major pieces of functionality and therefore the kernel typically contains a few major components, such as a process scheduler, a memory manager, and a file system.
The OS provides generic services for important functionality (in fact, all major meaningful actions that accomplish any real-world physical operation such as I/O) via the system call API. In Unix, this is a fairly stable set of primitive services that conform to stable contracts and interfaces. They look very much like function calls until we consider their actual implementation (next class).
One of the most important jobs of the Operating System is to manage the execution of programs by loading programs (i.e., compiled binaries) and transforming them into processes. We considered a structural definition of the term "process" in which it is a "program in execution" that has a few major properties, such as the illusion of its own memory space (the Process Address Space) and an associated data structure generally called a Process Control Block. This PCB exists within kernel memory in the Linux operating system, and is but one instance of the "struct task_struct" data type defined in sched.h. The PCB (hereafter referred to by the specific term task_struct) contains metadata about the process; the purpose of this metadata is so that the kernel can manage all processes on the machine.
We looked at some user-level utilities for observing all running processes, including
$ ps aux $ pstree
We saw that these utilities were able to extract and print data to stdout about the process, including things like its process identifier (PID), the owner of the process, how much CPU it has used, what program name is associated with it, and its state or status. Most of this information is also exposed through the /proc file system interface on linux.
We then used the 'yes' program and observed how we might communicate with it (via the OS) by using the 'kill' command.
$ ps aux | grep yes $ ... $ kill -9 PID
We saw that signals are one primitive way that processes can communicate. Many others exist (e.g., files, sockets, shared memory, IPC).
The kill command is simply another command-line user level software program. To actually accomplish something, it must ask the OS to do the "killing" on its behalf. So what specific OS functionality was invoked, and how? For this, we can use the strace tool (which is yet another user-level command line program) to observe all system calls that a particular process makes.
$ strace kill -9 PID $ ...
Here we see the OS playing its role as an isolation mechanism; all sensitive operations (such as delivering a signal to another process) must pass through the OS. How? Via a system call. That system call happens to be "kill(2)"...see
$ man 2 kill $ man 7 signal
Readings
The reading for tonight should help lay the groundwork for our discussion of the x86 machine environment, assembly programming, and the invocation of system calls.
- A Tiny Guide to Programming in 32-bit x86 Assembly Language
- A Whirlwind Tour on Creating Really Teensy ELF Executables for Linux
- MOS: 1.3: Computer Hardware Review
- MOS: 1.6: System Calls
- MOS: 1.7: Operating System Structure