Virtual Machine Fingerprinting

Pink Pill: Fingerprinting Virtual Machine Environments
We seek to develop a dictionary or database of Virtual Machine environment signatures. These "signatures" will be composed of a list of tokens (conceptually, a bit vector) representing various runtime behavior or data artifacts of different virtual machine environments. The signatures won't rely on static or easily fungible pieces of data like the installation directory of the VM; rather, our intent is to create a set of features that can reliably distinguish between virtual machine types based on the behavior of that virtual machine environment.

Fingerprint Archive
When we get the data, we will document the bit vector (i.e., the results of the pink pills) and the corresponding machine configuration (hardware, host OS, VM platform, guest OS).

Methodology
We will create a number of configurations of various hardware, host OS, VM platform, and guest OSs.

We will run a number of different pieces of assembly code; the intent behind each piece of shellcode is to extract some meaningful signal about how the combination of host OS and VM platform has modified the way a piece of "guest"-level machine code can view the world it believes to be bare metal. Our feature extraction shellcodes might execute either at user or supervisor privilege level (i.e., ring 3 or ring 0 from the perspective of the guest OS).

We will approach the construction of these "pink pills" by testing


 * how various privileged, semi-privileged, and normal x86 assembly instructions affect the architectural state of the machine from the viewpoint of the guest OS (e.g., issuing instructions that read or modify system registers like LDTR, IDTR, GDTR, TR...)
 * measuring timing differences for instructions like CPUID
 * performing non-standard (but legal) tasks like constructing and releasing LDTs and segment descriptors within them

Experimental Setup
This section describes the degrees of freedom (the axis to measure...) and our template for each feature extraction shellcode.

Configurations

 * underlying hardware:
 * x86
 * Sparc
 * ARM (TODO)
 * PowerPC (TODO)
 * "host" OS environment (or VMM/hypervisor)
 * VMWare ESX
 * Mac OSX
 * Windows XP
 * Windows 7
 * Fedora Linux
 * Ubuntu Linux
 * OpenSolaris
 * Virtual Machine layer
 * VMWare Fusion
 * VMWare Workstation
 * VirtualPC
 * Parallels Desktop
 * QEMU
 * Q (Mac OS X version of QEMU)
 * VirtualBox
 * Xen
 * Bochs
 * Guest OS
 * OpenSolaris
 * Fedora Linux
 * OpenBSD
 * FreeBSD
 * Ubuntu Linux
 * Windows XP
 * Windows 7
 * Snow Leopard (if it installs)

Feature Extraction
Template will:


 * define a section of code for exhibiting the feature (this varies depending on the feature)
 * define a section of code for observing / storing the feature
 * define a section of code for extracting / exporting the feature into a bit vector

Probe Behavior
Our shellcode will issue a series of assembly instructions that probe the system to exhibit a certain feature or behavior. For example, in the classic Red Pill work, the single SIDT instruction retrieves the address of the IDT (stored in the IDTR). Since most "bare metal" executives (i.e., OS kernels) set up only 1 IDT (having at least 1 is a prerequisite for operating the machine in real address mode, and thus for switching to protected mode), a VM layer would have to manage access to the IDT (which holds descriptors for how to handle different types of interrupts...since both the guest OS and the VM environment will receive interrupts (and need to handle them differently), the VM environments needs to act on interrupts first, and so it will likely maintain another IDT for each guest OS. In any event, this means a "different" location of the IDT from the point of view of the guest OS --- it is like waking up one morning and finding out that your home address is 2 blocks away (hence the reference to the Matrix via the "Red Pill" moniker). A piece of information you took for granted your whole life is false.

Observe Behavior
We may need to observe some piece of architectural state after forcing the expression of a particular characteristic or property of the execution environment...for example, we may issue an instruction that has a side effect not immediately visible from that instruction execution, so we may need to issue a read of that memory or register we expect to hold the "visible" effect of our probe.

Extraction / Reporting
One of the easiest ways to extract information is to issue a write(2) system call in assembly to a file (e.g., stdout) reporting the test number and the information; the consumer of this information will know how to interpret the resulting bytes, which may be:


 * just a single bit (0 or 1)
 * a memory address or other 32-bit value
 * a series of data values

depending on the type of test performed by the feature extraction shellcode.

Related Work

 * http://invisiblethings.org/papers/redpill.html (see this for links to other work...the USENIX 2000 link is incorrect, for that see the next list item)
 * http://www.usenix.org/events/sec00/robin.html
 * http://www.usenix.org/event/hotos07/tech/full_papers/garfinkel/garfinkel.pdf
 * http://hamsterswheel.com/techblog/?p=25
 * http://www.trapkit.de/research/vmm/index.html