Libpcap tutorial

A libpcap Tutorial
The PCAP library is a C language library (and has variants and translations for other languages and execution environments) that allows both customizable packet capture from the network (or a pre-recorded trace file) and packet injection into the network (or output trace file). It is mainly a library for managing the reading and writing process of packets to and from a data source. It is not particularly well-suited for arbitrary packet formulation.

Other libpcap tutorial exist; for example, see:


 * http://www.tcpdump.org/pcap.htm
 * http://yuba.stanford.edu/~casado/pcap/section1.html
 * http://netscale.cse.nd.edu/twiki/bin/view/Main/LibpcapTutorial

and pcap comes with a Unix manual page describing the various functions and flags provided by the library:


 * http://www.tcpdump.org/pcap3_man.html

NB: This tutorial is NOT a cut-and-paste style tutorial. It is an annotated guide.

The Code
The code in the tutorial below is in the form of snippets woven into text. The full code and build environment is available at:

http://pages.cpsc.ucalgary.ca/~locasto/research/libvei/

Tutorial Task Description
This tutorial will show how to use libpcap to transcribe packets from one data source to another (in a fashion similar to the effect of tcpreplay) while injecting packet content from other sources (such as a program, another trace file, or a network interface). It is assumed that this other source has formulated the packets they wish to appear in the output.

Here is a picture of what we'll build:



We will build a C library that has two sources of input and one output target. The first input source is a "background" PCAP file trace. The second source of input is an asynchronously-invoked function call `inject_event'. The purpose of the library is to weave these two input sources into a single output file.

The library will have an API containing three functions: initialization, start of transcription, and event injection. In this tutorial, we will build both the library and an example client of the library.

The Delta: What this tutorial has that others do not
This tutorial exists because I had to find out some things the "hard" way (e.g., reading the documentation). This section isn't meant to be boastful; other tutorials focus on what they focus on because this served the purpose of the author. I had a different set of requirements when I approached the task of using libpcap. Here are those things:


 * how to read and write to dumpfiles
 * the format of the PCAP savefile (particularly the record format of timestamp value plus packet structure)
 * This was important to understand the structure type returned by libpcap and where it got its data. It was also useful to open up a hex editor and map the structure to the header portion of the file and see the bytes in network order in individual fields, including the mapping of MAC and IP addresses.
 * http://www.manpagez.com/man/5/pcap-savefile/
 * the explanations for the data link types (needed for some of the API functions, but buried down in the man page)
 * at a certain point, I needed to use pcap_open_dead to get a propert pcap_t handle to use with pcap_dump_fopen, and one argument to pcap_open_dead is an `int linktype', but the description of link types is buried in the man page. I eventually found that I needed to specify: DLT_EN10MB (which seems obvious in retrospect, but for someone without any knowledge of the pcap API, this didn't jump out until after a thorough reading of the man page.
 * http://www.manpagez.com/man/7/pcap-linktype/
 * how to structure a program to use pcap for multiple purposes at once (i.e., using pthreads to do multiple things at once, not just open an interface and sniff)
 * potential bugs in releasing dump file handles

What You'll Need
You will need the following environment set up for this tutorial:


 * A Unix-style platform. This tutorial assumes Linux, specifically Fedora Core 10.
 * A text editor. I use emacs, but you can use vi, nano, pico, or something else.
 * A C compiler. I use gcc.
 * A threads package. I use pthreads.
 * A version of the libpcap library and its development package (e.g.,:

[michael@proton docs]$ yum list installed | grep pcap jpcap.i386                         0.7-6.fc10                         @updates libpcap.i386                       14:0.9.8-3.fc10                    installed libpcap-devel.i386                 14:0.9.8-3.fc10                    @fedora pcapdiff.noarch                    0.1-3.fc9                          @fedora pcapy.i386                         0.10.5-3.fc9                       @fedora [michael@proton docs]$

The Tutorial
The following steps describe a set of tasks, building off how to set up the development environment to writing simple packet replay code to adding in some advanced features.

Step 0: Controlling Compilation and the Build Process
This tutorial uses a directory structure and Makefile to ease the repeated process of compiling.

$ mkdir bin $ mkdir include $ mkdir src $ mkdir lib

Create a file called `Makefile' in the src/ directory. You may choose to use [this one]

We will put the C source files in src/, the VEI library header file in include/, the compiled sample client in bin/, and the compiled library archive file in lib/

Step 1: The Library API
Before diving into libpcap code, we first want to define the services that our packet transcription/replay library will provide.

int initialize_vei_library(FILE* background_trace,                           FILE* output_trace); int start_vei_transcription(void); int inject_event(VEI_EVENT_TYPE event); void vei_finish(void);

We'll need to place these in a header file and define a data type for the injected event. Because we use the data type FILE, our header file will need to include the stdio.h file. Our header file also defines a bunch of error codes that could be returned from these functions.

After we take a look at our sample client, we'll come back to the implementation of these functions in the vei.c file; this is where the actual libpcap code gets used.

Step 2: A Sample Client
Our example client will be called "nech0" (for Network Echo) and (for now) simply transcribes one PCAP file to another. It is responsible for a few things, including opening the files involved via the C library, calling the library initialization routine, and calling the library transcription routine. This client also calls the library's shutdown function.

We need to use the large file features of Linux because we may be asked to transcribe very large (i.e., >2GB network trace files).

//this is the preferred way of signaling to the compiler to use large file support
 * 1) define _FILE_OFFSET_BITS  64

Next we need to include the appropriate header files that our nech0 client will use:
 * 1) include 
 * 2) include 
 * 3) include 
 * 4) include 
 * 5) include "vei.h"

Our main function is pretty simple. It prints out some diagnostic information and passes two parameters to the actual "work" function. Our program expects two arguments: the background file to replay and the name of the target output file.

int main(int argc,         char* argv[]) {   if(3==argc) {       const char* ver = pcap_lib_version; fprintf(stdout,               "nech0 is using %s\n",                ver); fprintf(stdout,               "echoing [%s] to [%s]\n",                argv[1],                argv[2]); do_echo(argv[1], argv[2]); fprintf(stdout, "nech0 calling shutdown...\n"); shutdown; }else{ fprintf(stderr,               "nech0 srcfile.cap dstfile.cap\n"); return -1; }   return 0; }

Let's provide implementations of the do_echo and shutdown functions. `shutdown' is pretty simple:

void shutdown {   vei_finish; return; }

The do_echo function is a bit lengthier but still pretty straightforward.

static void do_echo(char* src,        char* dst) {   int init_result = 0; int kickoff_result = 0; FILE* btrace = NULL; //background trace file handle FILE* otrace = NULL; //target output trace file handle btrace = fopen(src, "r"); otrace = fopen(dst, "wb"); init_result = initialize_vei_library(btrace, otrace); kickoff_result = start_vei_transcription; return; }

Note that I have elided any error checking or error handling at this level. It should be there in your code.

Step 3: Packet Record Transcription
Now we get to the meat of the tutorial. How do we tell libpcap to open a trace file and start replaying it to an output sink? This section discusses the implementation of the VEI API. We need to include a number of header files to support the types of things we're going to do in this implementation, including libpcap (pcap.h), threading (pthread.h), timestamp manipulation (time.h), yielding to the OS scheduler (sched.h), and error checking (assert.h) and reporting (string.h). Of course, we also need to reference our local VEI header: vei.h

We need a few data structures and variables to help with the control of the library, including some flags for recording the state of the library (i.e., whether it is initialized or not, and whether it is transcribing or not). We also declare a number of `long long' variables for keeping track of some counters about how many packets were replayed and how long we slept (to deal with "realistic" playback timing, see below).

Playback Source and Destination
The main data structures, however, are those that provide handles to the dump files we are working with. We first declare two global references to the file handles of the source and destination files:

static FILE* btracefile = NULL; static FILE* otracefile = NULL;

The VEI library implementation assumes that the caller or client has successfully opened two files and passes their references to us via `initialize_vei_library'. static pcap_t* btrace_handle = NULL;       //handle to background trace static pcap_t* o_handle = NULL;            //output trace static pcap_dumper_t* otrace_handle = NULL; //output trace dump handle static char m_errbuf[PCAP_ERRBUF_SIZE];

We need two handles to deal with the output file because of the way the libpcap API is structured; we first need to open a handle offline (to get a valid pcap_t reference) and then use that reference to obtain a pcap_dumper_t for use in writing to the actual output PCAP file.

After some basic argument checking and clearing the m_errbuf via a call to memset(3), we obtain a valid reference to the two trace files:

btrace_handle = pcap_fopen_offline(btracefile, m_errbuf); o_handle = pcap_open_dead(DLT_EN10MB, 65535); otrace_handle = pcap_dump_fopen(o_handle, otracefile);

I have not shown the error handling code. See the source for details.

After a runtime check for thread availability:

configured_for_pthreads = sysconf(_SC_THREADS); if(-1==configured_for_pthreads) {   fprintf(stderr, "libpthread unavailable, clients should exit(2)...\n"); return -BAD_THREAD_INIT; }

we mark the library as initialized and return. We now await the command to start transcription.

Starting Transcription
The VEI API provides the `start_vei_transcription' function mainly to split the task of initializing the library from the task of actual packet replay (since clients might wish these events to occur at different times). A thread that invokes `start_vei_transcription' will not return to its original control flow until start_vei_transcription has finished executing (which may be never). Therefore, clients of libvei should have at least two threads: one for initialization and kicking off transcription and the other for calling `inject_event' asynchronously.

After checking whether the library is initialized, start_vei_transcription creates a pthread for managing the writing of an internal injection buffer (filled via `inject_event') and marks the library as transcribing:

err = pthread_create(&injecter,                     NULL,                      empty_buffer,                      NULL); m_transcribing = 1;

It then asks libpcap to start delivery of the packets to the callback function (discussed next).

rval = pcap_loop(btrace_handle,               //pcap handle for read source                  -1,                           //#packets to loop, < 0 means "until error"                  bg_packet_handler,            //name/address of callback function                  NULL);                        //args to pass to callback

The pcap_loop function does not return until it runs out of a data source, it is interrupted, or some error occurs. We test the various return values (see the man page for an explanation, particularly the difference between -1, -2, and 0). If this function returns 0, it means that the data source has no more packets to deliver; in the case of a PCAP dump file, we have reached the EOF. In this case, we report a number of metrics and return to the calling thread.

Handling Packet Delivery: The Callback
libpcap reads the background dump file and delivers a packet to libvei. The way it does so is via a callback function. libvei uses the callback function to then write a modified version of the packet to the output dump file.

void  bg_packet_handler(u_char* args,                          const struct pcap_pkthdr* header,                          const u_char* packet);

This is a function we declare and implement within libvei, but its signature must follow the contract declared by libpcap (so that libpcap can safely deliver information about packets it sees to us).

After checking some timing conditions (see below), this function invokes the pcap_dump function to write out the packet delivered in the parameters to the output file:

pcap_dump(((u_char*)otrace_handle),          header,           packet);

An important point to note here is that the first argument, which is usually NULL in normal cases of a user supplied packet handler (like bg_packet_handler itself is), must actually be a reference to otrace_handle, cast to a u_char pointer.

The `header' is the PCAP per-record header, not any network-level packet header. It contains information about the size of the record's packet and the number of bytes actually captured. It also contains a timestamp value in `struct timeval' format.

The `packet' is a pointer to the actual packet content, including its own headers and payloads. It is referred to as an undifferentiated pile of bytes, and any manipulation of its contents should use some standard structure reference to get access to the fields (and layers) therein. Since we're just (dumbly) replaying here, we don't peer inside the packet, although in many situations, depending on the type of replay you want to control or the type of network interface you are replaying to, you would want to update various fields in the layer 2, layer 3, and layer 4 headers and (possibly) payload.

After some record keeping for number of packets copied and output progress reporting, we update the time trackers (see below) and return. We have finished handling the packet that libpcap gave us, and we will wait for the next delivery.

Timing Replay: Introducing Delays
One subtle issue is that of timing; we want this transcription to be faithful to the timing of the original trace. For example, the trace may have been collected over an hour, but libpcap is playing back the PCAP dump file contents as fast as the OS will allow libpcap to read it (which is potentially pretty fast in comparison). Therefore, we would like to insert pauses into the packet transcription routines to add in delays to smooth out the replay. Thankfully, the PCAP format saves timing information, and libpcap provides it to the registered packet callback handler inside the 'header' structure. We first need another global variable to keep track of the "last" time of a saved packet.

static struct timeval m_last_time;

This structure type is defined in time.h. See the manual page for time for more information on its structure.

In bg_packet_handler, we check the last time global variable against the time in the packet being replayed. At second boundaries, we see if a difference of more than a second has occurred; if it has, we wait that number of seconds. This allows the library to replay all packets happening within a second "epoch" at speed, but then sleep(3) to let the proper timeline re-sychronize with the capture timeline.

One more subtle issue here occurs as we get into using multiple threads. There are conditions under which we want some subset of the threads to sleep or give up the CPU. One (bad) way to do this is to invoke the Unix C library sleep(3) function call. This is a poor choice because it causes the entire process (i.e., all threads) to sleep, not the calling thread. It also can be implemented as a spinlock, constantly checking if it should wake up. The sched_yield(2) system call is a bit nicer because it gives up the CPU without constantly checking for a timeout of the sleep condition, but it also causes the entire process to relinquish the CPU. Variations of sched_yield exist which claim to cause only the calling thread to give up the CPU, but no broad guarantees exist, and it greatly depends on how the underlying OS maps threads to processes within the kernel. So, we need a solution at the level of the pthread library in which we can get the calling thread to yield the CPU for a while. We found a solution from:

http://somethingswhichidintknow.blogspot.com/2009/09/sleep-in-pthread.html

This solution uses a couple of fake constructs to ask the pthread_cond_timedwait function to cause the calling thread to wait until the timing condition is true before resuming execution.

Locking
Looking to the future, the playback thread (i.e., the one used from libpcap's delivery of a packet via the registered callback function) will need to coordinate with another thread that performs injection asynchronously into the output dump file. In order to avoid conflicting writes to this file, we need to govern access to the critical section with a mutex:

static pthread_mutex_t ofile_lock = PTHREAD_MUTEX_INITIALIZER;

So, in bg_packet_handler, we must surround the actual write with acquiring and releasing the mutex:

pthread_mutex_lock(&ofile_lock); pcap_dump(((u_char*)otrace_handle),          header,           packet); pthread_mutex_unlock(&ofile_lock);

Step 4: Adding in Injection
Our goal is not just to replay packets from one PCAP file to another, but to add in other packets and flows in an asynchronous, unpredictable manner.

The other data source could be another network capture file, a series of such files, the network itself, specially crafted bytes (e.g., by hand or via sendip or dnet(8)) or a program.

This program or program capturing such a data source would periodically invoke the VEI API `inject_event' function, supplying it with the appropriate flags to indicate the type of data to inject. Right now, these are pre-determined, pre-formulated packet flows. In the future, we intend to add more dynamic construction, customization, and filtering.

In the VEI lib header file, we define an enumeration type for the types of events we wish to inject:

/** * Clients should use these constants with the `inject_event' function, e.g., * *  int result = inject_event(VEI_PING); * */ typedef enum _vei_event_type {  VEI_HTTP_HEAD=EVENT_HTTP_HEAD_CONVERSATION, VEI_DNS_LOOKUP=EVENT_DNS_LOOKUP_CONVERSATION, ...  VEI_PING=EVENT_PING_ICMP } VEI_EVENT_TYPE;

We can't immediatley inject raw bytes into the output stream; we have to coordinate with the playback thread to make sure that no data conflicts occur. So, we need two data structures: (1) a mutex to coordinate access to the output stream and (2) a buffer for buffering packets from the `inject_event' call to the actual emptying of the buffer and writing of these injected packets to the output stream. We get (1) from the previously declared ofile_lock mutex. We get the second from a private structure type definition:

typedef struct _vei_packet_entry {  VEI_EVENT_TYPE ve_type; u_char* packet; //struct pcap_pkthdr contains: //struct timeval ts; //bpf_u_int32 caplen; /* length of portion present */ //bpf_u_int32 len;   /* length of this packet (off wire) */ struct pcap_pkthdr header; } VEI_PACKET_ENTRY; static VEI_PACKET_ENTRY* m_packet_buffer[VEI_PACKET_BUF_LENGTH]; static long int m_pbuffer_length = 0;

This queue is managed through a couple of helper routines, particularly `pbuf_insert'. To keep things simple, the buffer is a statically allocated global array of VEI_EVENT_TYPE pointers, and `pbuf_insert' just places new entries into the first open slot:

static int pbuf_insert(VEI_PACKET_ENTRY* entry) {  if(m_pbuffer_length<VEI_PACKET_BUF_LENGTH) {     m_packet_buffer[m_pbuffer_length] = entry; m_pbuffer_length++; return 0; }else{ return -1; } }

This function does no argument or error checking. Your code probably should, b/c you may not trust whoever is calling `inject_event' (or `inject_event's implementation) to hand you correctly formulated packets. The implementation of this function also brings up an important point: I lied -- we actually need a third data structure: a mutex to control access to the buffer itself (to coordinate between the thread calling `inject_event' and the internal service thread that periodically empties the buffer).

/** * `inject_event' calling thread must own this (transparently) before * asking for bytes to be put into the buffer. * * injecter thread must own this before reading from injection buffer * and removing successfully injected packets * * replayer thread has nothing to do with this lock */ static pthread_mutex_t buffer_lock = PTHREAD_MUTEX_INITIALIZER;

We also need a thread to perform the actual injection for us (i.e., emptying the buffer and delivering the packets to libpcap).

/** inject new packets by reading m_packet_buffer and sending to pcap_dump */ static pthread_t injecter;

Recall that we set this thread up already and told it to run the `empty_buffer' function. We'll see the implementation of empty_buffer further down in this section.

The actual event injection is split into 2 parts:


 * the implementation of the `inject_event' function call and;
 * the implementation of the injecter thread service routine

The `inject_event' implementation keeps track of the number of packets injected for this invocation, the result of the injection (into the buffer), and two handles to the bytes (i.e., packet content) and the VEI_PACKET_ENTRY actually placed into the buffer. It first checks if the library is initialized; if not, it returns with a -LIB_NOT_INITIALIZED code (declared in vei.h).

An important point to note is that inject_event does not check the state of the transcribing flag; we want to be able to buffer injected packets even if the library is not actually replaying. The buffer emptying code run by the injector thread will not actually write any packets until the transcribing flag is set, so that's OK.

The implementation first needs to acquire the buffer lock. It then switches based on the injected event type (this affects what is injected and how it is injected (into the buffer). The code below demonstrates how two different types of events are handled. The first, (VEI_POST_JPG) actually invokes a function that sets up a callback routine with libpcap to replay packets from another capture file. The second (VEI_PING) actually formulates the packets and entries "manually" and copies a hand-built ICMP packet (stored in the e_ping_icmp global variable) to the injected entry. Both methods, however, eventually call pbuf_insert. Note that the lock on buffer_lock remains even through child function invocations.

pthread_mutex_lock(&buffer_lock); switch(event) {  ...   case VEI_POST_JPG: fprintf(stdout, "[vei] VEI_POST_JPG\n"); num_packets_injected = replay_jpeg_capture; m_num_scripted_packets_injected = 0; break; case ... case VEI_PING: fprintf(stdout, "[vei] VEI_PING\n"); v = (VEI_PACKET_ENTRY*)calloc(1, sizeof(VEI_PACKET_ENTRY)); bytes = (u_char*)calloc(VEI_PING_SIZE, sizeof(u_char)); if(NULL==bytes || NULL==v) {      num_packets_injected = -FAILED_TO_BUFFER; free(bytes); free(v); bytes = NULL; v = NULL; }else{ memcpy(bytes, e_ping_icmp, VEI_PING_SIZE); v->ve_type = VEI_PING; // or `event' v->header.caplen = VEI_PING_SIZE; v->header.len = VEI_PING_SIZE; v->header.ts.tv_sec = 0; v->header.ts.tv_usec = 0; v->packet = bytes; iresult = pubf_insert(v); if(0==iresult) {        num_packets_injected = VEI_PING_PACKETS; // to wit, one }else{ num_packets_injected = -FAILED_TO_BUFFER; }    }     break; default: return -BAD_EVENT_TYPE; }; //...keep track of some packet accounting pthread_mutex_unlock(&buffer_lock); return num_packets_injected;

Note how we actually don't set the timestamp values. We let the injecter thread take care of that.

The injecter thread uses the pcap_dump function to write to an output file. If we wanted to write to an interface, we would use the pcap_inject function or pcap_sendpacket function. The function signature matches the expected signature for a pthread service routine. This routine loops forever. It checks if the library is not initialized or transcribing; if either condition fails, the injector thread sleeps (using our home-brewed thread sleeping solution so that the rest of the process can actually make progress (hopefully initializing or transcribing or filling the buffer...).

Note also how we need to obtain both the lock on the internal buffer and the lock on the output PCAP stream.

void* empty_buffer(void *arg) {   int num_slots = 0; int i = 0; VEI_PACKET_ENTRY* v = NULL; struct pcap_pkthdr* header = NULL; u_char* packet = NULL; for {      if(1!=m_initialized){ vei_pthread_sleep(2); continue;} if(1!=m_transcribing){ vei_pthread_sleep(2); continue;} pthread_mutex_lock(&buffer_lock); //read m_packet_buffer, empty 0 to m_pbuffer_length assert(m_pbuffer_length < VEI_PACKET_BUF_LENGTH); num_slots = m_pbuffer_length; assert(0<=num_slots); i = 0; for(i=0;iheader); packet = v->packet; //set "now" (which we are luckily saving in m_last_time) v->header.ts.tv_sec = m_last_time.tv_sec; v->header.ts.tv_usec = m_last_time.tv_usec; pthread_mutex_lock(&ofile_lock); pcap_dump(((u_char*)otrace_handle), header, packet); pthread_mutex_unlock(&ofile_lock); free(v->packet); v->packet = NULL; free(v); v=NULL; }      m_pbuffer_length = 0; pthread_mutex_unlock(&buffer_lock); sched_yield; } }

Cleaning Up and Shutting Down
Upon invocation of `vei_finish', we immediately set the transcribing and initialized state flags to false. We report how many packets are left in the injection buffer and then attempt to use libpcap to close the output handle:

if(NULL!=otrace_handle) {   pcap_dump_flush(otrace_handle); pcap_dump_close(otrace_handle); otrace_handle = NULL; otracefile = NULL; }

It appears that pcap_dump_close actually closes the underlying FILE handle, and attempting to call:

fclose(otracefile);

immediately after calling pcap_dump_flush and pcap_dump_close causes a message from glibc: "double free or corruption (!prev): 0x083b0170"

Interesting error condition.

We then also close the background trace file:

if(NULL!=btrace_handle) {  pcap_close(btrace_handle); btracefile = NULL; }

and return to the caller.

Scripting Injection
More here later.

Contributions
See the wiki history for this page.