Courses/Computer Science/CPSC 457.F2014/Lecture Notes/Concurrency
Contents
Concurrency, Communication, and Synchronization
In this section, we find ourselves stuck on a thorny issue arising from the ability to achieve the illusion of concurrent execution.
Focus Question
How can two otherwise independent tasks coordinate their access to a shared resource? Once we leave sequential, serial execution behind, we open up an opportunity for incoherent access and updates to data. Yet, concurrent access and parallel execution hold a great deal of promise in terms of speeding up certain kinds of partially or wholly independent tasks.
The MOS book casts this topic in terms of Interprocess Communication. In some sense, IPC is the killer app for methods and techniques that manage concurrent execution. Even if you have thousands of concurrent processes, if they do not talk to each other, you don't need mechanisms for helping them coordinate and (quite literally) achieve correct execution! (note you'd still need synchronization in the kernel because the kernel is coordinating the sharing of hardware resources --- but that coordination is largely invisible to the userlevel processes).
Agenda
We begin with a definition of the problem, or at least its symptoms: race conditions and their ill effects on data.
The PDF slides capture most of the material below.
Concepts
- Race Conditions are situations where two or more tasks compete to update some shared or global state; such situations cause the final output to depend on the happenstance ordering of the tasks running rather than a semantically correct computation
- Critical Sections are a name for code regions or slices that contain instructions that access a shared resource (typically sharing some type of memory or variable). Multiple tasks or threads entering the same critical region without some control or coordination mechanism will interleave with each other in ways that produce race conditions and (probably) incorrect output. A critical section is in some sense: (1) a code location or sequence of instructions (2) an environment and set of conditions.
- Mutual exclusion is a concept that states only a single task should have access to a critical region at a time; enforcing mutual exclusion helps serialize and normalize access to critical regions and the memory, data, or state that the code of a critical region reads and writes.
Classic IPC Problems
- Producer-Consumer
- Dining Philosophers
- Readers/Writers
A Command Line View of Communicating Processes
- shell I/O redirection
- command line pipes
- named pipes (special files called FIFOs)
- Courses/Computer_Science/CPSC_457.W2012/FIFO Example
Family Squabble
- Demonstration of a parent process and a child process updating a shared resource
- Situation 1: using fork()... discussion: why doesn't this work
- Situation 2: using clone()... discussion: why not vfork?
- introduces the concept of critical sections
Non-atomic operations:
(eye@mordor non-atomic)$ cat x.c #include <stdlib.h> #include <unistd.h> #include <stdio.h> struct foo { int field1; int field2; int field3; char field4; }; int main(int argc, char* argv[]) { struct foo* f; f = malloc(sizeof(struct foo)); if (!f) abort(); f->field1 = 100; return 0; } (eye@mordor non-atomic)$ gcc -Wall -g -o xx x.c (eye@mordor non-atomic)$ objdump -d -M intel xx ... 8048419: 8b 44 24 1c mov eax,DWORD PTR [esp+0x1c] 804841d: c7 00 64 00 00 00 mov DWORD PTR [eax],0x64 ...
Potential Solutions
- spinlocks / busy waiting (feasible but usually wasteful b/c CPU is busy doing "nothing")
- lock variables (recursive problem)
- disabling interrupts (not trustworthy, multi-CPU systems unaffected)
Synchronization Primitive: Atomic Instructions
- Peterson's Solution
- Test-and-Set (see MOS: pg 124)
- Compare-Exchange (x86). From the Intel Developers manual, section 7.3.1.2 (PDF page 160):
The CMPXCHG (compare and exchange) and CMPXCHG8B (compare and exchange 8 bytes) instructions are used to synchronize operations in systems that use multiple processors. The CMPXCHG instruction requires three oper- ands: a source operand in a register, another source operand in the EAX register, and a destination operand. If the values contained in the destination operand and the EAX register are equal, the destination operand is replaced with the value of the other source operand (the value not in the EAX register). Otherwise, the original value of the destination operand is loaded in the EAX register. The status flags in the EFLAGS register reflect the result that would have been obtained by subtracting the destination operand from the value in the EAX register.
The CMPXCHG instruction is commonly used for testing and modifying semaphores. It checks to see if a semaphore is free. If the semaphore is free, it is marked allocated; otherwise it gets the ID of the current owner. This is all done in one uninterruptible operation. In a single-processor system, the CMPXCHG instruction eliminates the need to switch to protection level 0 (to disable interrupts) before executing multiple instructions to test and modify a semaphore.
For multiple processor systems, CMPXCHG can be combined with the LOCK prefix to perform the compare and exchange operation atomically. (See “Locked Atomic Operations” in Chapter 8, “Multiple-Processor Management,” of the Intel® 64 and IA-32 Architectures Software Developer’s Manual, Volume 3A, for more information on atomic operations.)
- Spinlocks have the potential for priority inversion (i.e., a higher-priority process is spinning while a lower priority process is executing with a lock held)
Concurrency Controls
- Methods/approaches/techniques to enforce serialize access or safe access to critical sections
- locks
- mandatory locking
- advisory locking
- see, e.g., file-level Locking
- man 3 flock
"Advisory locks allow cooperating processes to perform consistent opera- tions on files, but do not guarantee consistency (i.e., processes may still access files without using advisory locks possibly resulting in inconsistencies)."
Semaphores
- up/down
- P/V history (Proberen/try) and (Verhogen/raise)
- Needs to be atomic
- Possible implementation
- system calls up and down
- disable interrupts
- protect each semaphore critical section with a lock / CMP-EXCHANGE (so other CPUs don't modify even if interrupts disabled)
Mutexes
- "mutual exclusion"
- special case of a semaphore (no need to maintain a count/list)
- operations
- pthread_mutex_lock
- pthread_mutex_unlock
- pthread_mutex_trylock //don't block; furthermore, give option to skip critical section entirely
- ... see pthread manual page for more examples.
- What happens if multiple threads are blocked on a mutex?
- random wakeup
- FCFS
- race condition for the wakeup?
Code from MOS, page 131:
mutex_lock: TSL R, MUTEX ; copy mutex to R, set mutex=1 CMP R, 0 ; was mutex zero? JZE OK ; if mutex was zero, it was unlocked; proceed CALL thread_yield ; mutex was busy, schedule another thread JMP mutex_lock ; back to spinning OK: RET ; return to calling thread, can enter critical region mutex_unlock: MOV MUTEX, 0 ; clear mutex RET ; return to calling thread, exit critical region
Note use of thread_yield
routine. A spin-waiting thread would block others if it didn't voluntarily yield.
Takeaway Lessons and deferred topics
- Locking is still typically advisory; non-cooperative tasks or threads can defeat it
- ordering of locks is very important; they must be acquired and released in a consistent order across all threads
- kernels often disable interrupts (because it trusts itself), but this is not a viable approach to concurrent execution for userlevel programs
- condition variables
- Dekker's Algorithm
- Monitors (one way to look at this: an increase in the atomicity boundary)
- Message passing
- Barriers: a barrier accumulates all tasks until ready to proceed in a group. Good for joint serialization.
Research Directions
- Mandatory Locking
- Disciplined Threads (name and memory space separation and enforcement)
- Challenge: Sharing + Isolation + Performance
- Secure Locking (prove credentials in order to enter critical section)
- Challenge: test security property atomically
Next Next session we will consider the problem of deadlock.
Next Next
- In a future session we will consider these concurrency concepts more fully in a user-level threaded environment.
- This will provide a C-language level view of concurrency issues using the pthread library and its locking and synchronization facilities.
Notes
Voluntarily yielding the CPU seems like a friendly, neighborly, altruistic thing to do, particularly if you want to clear a bottleneck by allowing other tasks to run. Excessive yielding, however, can lead to performance degradation.
The manual page for sched_yield(2) warns:
"Strategic calls to sched_yield() can improve performance by giving other threads or processes a chance to run when (heavily) contended resources (e.g., mutexes) have been released by the caller. Avoid calling sched_yield() unnecessarily or inappropriately (e.g., when resources needed by other schedulable threads are still held by the caller), since doing so will result in unnecessary context switches, which will degrade system performance." -- man 2 sched_yield
This comment hints at the need for careful design of threading and locking structures and placement.
Scribe Notes
Readings
How do processes executing concurrently (rather, processes that the OS is fooling us into believing are executing concurrently on the CPU) communicate safely on shared resources? This is a lot of reading, but it is quite important. Probably the most important topic of the semester, and the essence of what an "operating system" --- in the most general sense possible --- does.
- LKD: Chapter 9: An Introduction to Kernel Synchronization
- MOS: 2.2 "Threads"
- MOS: 2.3 "Interprocess Communication"
- MOS: 2.5 "Classical IPC Problems"
- Beware typos in Figure 2-23, pg 122 (semicolon placement)