Courses/Computer Science/CPSC 601.29.ISSA.W2011

From wiki.ucalgary.ca
Jump to: navigation, search

Contents

Information Systems Security Analysis wiki: Winter 2011

This wiki serves the Information Systems Security Analysis (i.e., "Ethical Hacking") graduate seminar course started in the Winter of 2011. Students will post their vulnerability identification assignments and their literature reviews here.

The course Web site is located at: [1]

Class Notes

12 Jan 2011: Course Introduction

14 Jan 2011: Security Mindset

  • Discussion of papers from 12 Jan
  • Please join the class mailing list
  • Please sign up for an account on the course wiki
  • Please obtain a recent Linux environment (Fedora, BackTrack, Ubuntu) either by dual booting, having a LiveCD, or a virtual machine (VMWare, VirtualBox, QEMU, VMWare Player)
  • Readings for next class:

19 Jan 2011: IA-32 Architecture, Basic Process and Binary Inspection

The agenda for this class includes:

  • Theme Topic for today: the Hacker Curriculum Principle of Inspection (or Visibility): what does this piece of code actually contain, and what does it do at runtime?
  • disscussed the literature review project
    • 2..4 pages of analysis in a systems security topic from 5..10 papers in that area
    • check topics list of major security conferences for area/topic ideas
    • email the class mailing list with questions or to ask for pointers to papers
  • reviewing important points from the readings
    • We will cover calling conventions in more detail in a later class
    • IA-32 has implicit operands
    • AT&T vs. Intel syntax
    • variable length instructions
    • CISC-style architecture (although microcode translation is to a RISC-like ISA)
    • complicated superscalar pipeline
    • set of general purpose registers & roles (EAX...)
  • some hands-on work with system calls, assembly language, and Unix cmd-line tools:
    • strace
    • gawk
    • hexdump
    • ghex2
    • objdump
    • file
    • sort
    • uniq
    • gcc -S
  • Profiling the set of commonly-used x86 instructions (we did this for a Hello, World program, but I challenge you to do it for the glibc.so ELF file on your system):
    • awk 'BEGIN { FS = "\t" } ; {print $3}' instructions.out | awk '{print $1}' | sort | uniq -c | sort -nr
    • the file instructions.out above was produced from `objdump -d a.out'
  • Resources
  • Readings for Friday

21 Jan 2011: Process and Binary Inspection (ELF toolchain)

  • We will dissect a C program at many different levels: source, assembly, and ELF binary image. Along the way, we will discover some new tools and various specifications.
    • We started off class discussing the "big picture": the major difference (and relationship) between static analysis of the properties of binary images and dynamic analysis of running programs (i.e., processes). The production and processing of each artifact involves a number of other systems, including IDEs, the lexer/parser/compiler, assembler, linker, and the OS loader. Each of these introduces a location for instrumenting the system.
    • ELF Specification: http://www.muppetlabs.com/~breadbox/software/ELF.txt
    • using ghex2 to edit binaries directly (e.g., character array constants, numerical values)
    • Intel is little-endian, meaning that the way it stores data in memory is least significant bit first, with the most significant bit "rightmost"
    • When looking at hex dumps of a process image (i.e., ELF file) or disassembly, you're likely looking at the hexadecimal (base 16) representation of data values. For quick conversion, use bash's 'printf' command. For example:
    [michael@xorenduex 2011]$ printf "%x\n" 65
    41
    [michael@xorenduex 2011]$ printf "%d\n" 0x41
    65
    [michael@xorenduex 2011]$ printf "%d\n" 41
    41
    [michael@xorenduex 2011]$ printf "%x\n" 0x41
    41
    [michael@xorenduex 2011]$ printf "%x\n" 65
    41
    [michael@xorenduex 2011]$ 

26 Jan 2011: Disclosure Discussion and ELF Toolchain

28 Jan 2011: Shellcode Disassembly

  • In this class, we will discuss tools and techniques for examining shellcode. Shellcode is essentially a self-contained assembly program (like the one we have been working with). By convention, the term "shellcode" typically indicates some nefarious purpose, but shellcode is really just a small, self-contained program that depends only on the smallest set of OS and library functionality it can get away with. Along the way, we will discuss:
    • the udcli tool
    • calling conventions for functions vs. system calls
    • write simple system call shellcode
    • analyze a shellcode example
  • Class Activities Courses/Computer Science/CPSC 601.29.ISSA/20110128CodeSession
    • wrote a simple ASM file for writing "hello" to stdout via write(2)
      • noticed problem (i.e., SIGSEGV) with storing a low address (i.e., 65) into ebx
      • noticed problem (i.e., SIGSEGV) with not calling _exit()
      • used strace to trace the a.out to get info about where it failed
      • introduced a simple Makefile
    • analyzed a shellcode example from http://www.shell-storm.org/shellcode/files/shellcode-606.php
    • use udcli for disassembly (nasm also comes with a tool for this, ndisasm)
    • hand-execute assembly listing by simulating CPU and keeping track of registers and stack
    • patch shellcode
    • manage to workaround non-executable stack with execstack -s
    • use chown and chmod to setuid (chmod 4777 or 2777) on the fixed binary to ensure -p is used to proper effect
  • Reading for Wednesday
    • Intel Software Developers Manual, Volume 3A: System Programming Guide, Part 1
      • Sections 2.1, 2.2, 2.3, 2.4, 2.5 and 2.7

2 Feb 2011: Hardware Support for Protection

4 Feb 2011: Trapping, Interrupt and Exception Handling

9 Feb 2011: Program Supervision 1: Basic Mechanisms

  • We will examine various primitives like ptrace(2), LSM, and library interception. We will particularly look at how ptrace(2) is implemented and can be used by one process to trace another.
  • Notes
    • We looked at the manual page for ptrace to understand the variety of services/request types it offers and understand some of the constraints governing its behavior
    • We examined a number of places in the kernel source code dealing with the implementation of ptrace
    • we first started out by understanding how the system call handling mechanism is actually coded in the kernel
    • we also looked up the task_struct definition in the kernel's code and reviewed the data members of this structure relevant to ptrace
    • we looked up ptrace's system call number and the definition of its prototype
    • we saw how the kernel defines macros for handling different system calls with different numbers of arguments
    • we understood that the implementation of ptrace has two logical parts: the general abstract part and the architecturally-dependent part, as reflected by the ptrace implementation and header files in different sections of the kernel source tree
    • we looked closely at the implementation of the "highest" layer of ptrace's implementation: the actual entry into the sys_ptrace() function
    • After diving into shallow waters in the kernel, we started to look at this topic from the user-level side
  • Links
  • Exercises
    • Can program A ptrace program B which ptraces program A?
    • Can program A ptrace program B which is ptracing program C?
    • Try to write a ptrace-based program that outputs all the return values (i.e., the value of eax) after a return from each function (i.e., after a RET instruction is executed).
  • Reading for Friday

11 Feb 2011: Program Supervision 2: Using ptrace

  • Notes
    • We discussed the big picture of how to develop intuition for the "right" level or approach for instrumenting and extracting data about a program or process; while you can create a construct at many different system levels to extract data from another level of the system, you have to consider performance cost, available mechanisms, ease of implementation, etc.
    • We looked at the implementation of a ptrace(2)-based utility for printing out system call information
  • Code
  • Readings
    • None, do your flaw reports and lit reviews

16 Feb 2011: Program Supervision 3: From the Kernel Up

  • Here we will look at how to modify the kernel source directly. While we will focus on porting Mac OS X's PTRACE_DENY_ATTACH request to Linux, many of the lessons about how to touch the kernel, recompile, and redeploy extend directly to other modifications you might wish to make to the kernel to instrument, intercept, or observe processes. Note, however, that the kernel is seldom the best place to do this kind of observation. Sometimes (like in a rootkit case) it is the only environment you can depend on (as a rootkit author), but in most benign examples, it is better to:
    • implement a user-level process for extracting the data
    • define a psuedo device for extracting information from the kernel
    • compose existing kernel interfaces or communicate with them (e.g., /proc)
    • employ a Loadable Kernel Module if you absolutely must have code executing in the kernel context

Generally, you want to avoid adding extra code to the kernel, since your code might contain bugs and errors and unnecessarily complicate the kernel (or interfere with performance, since you may use locks and interact with other kernel components like the file system, disk scheduler, memory allocator, loader, or process scheduler).

  • Code
    • To be posted after class.
  • Readings
    • None, do the vuln reports and lit review, or go back and re-read some of the other references.

18 Feb 2011: Program Supervision 4: Frameworks

23 Feb 2011: Reading Week, no lecture

25 Feb 2011: Reading Week, no lecture

2 March 2011: Midterm Exam

In class, open book, open note, open web, open computer. Individual.

You have an hour and fifteen minutes to email me the answers to the four questions.

The questions will focus on activities involved in the analysis of binaries.

4 March 2011: Basic Countermeasures (held 7 March)

9 March 2011 (Actual 11 March): Advanced Countermeasures

  • Topics: tainted dataflow analysis, Artificial diversity, control flow integrity
  • Notes
    • Class Management
      • midterm answers & discussion
      • turning of basic countermeasures (see notes from last time)
      • demo an attack (e.g., libpng via gdb, or even a simple strcpy with all protections turned off)
    • Artificial diversity
      • n version programming
      • anomaly detection based on system call sequences: "A Sense of Self for Unix Processes" Somayaji et al.
        • asynchronous system calls
        • convergence of a profile & calibration
        • mimicry attacks
        • looking at arguments, not just syscall sequences (UCSB work)
      • ASLR (last time)
      • Instruction Set Randomization (Barrantes et al., Kc et al., CCS 2003)
        • mysql randomization, postgresql randomization
      • program reproduction by Somayaji et al.
    • Tainted Dataflow Analysis
      • TaintCheck (CMU)
      • mark sources of input (i.e., from system calls such as read(2) and recv(2)) with a tag, propagate tag with each assembly instruction
      • attack detection conditions
        • tainted data enters the contents of EIP
        • EIP points to an address containing tainted data
        • tainted data enters a system call argument, such as a parameter to execve(2)
    • Control Flow Integrity:
      • model control flow graph of an application statically; new injected control flow paths should not deviate from this static "complete" model; CFI paper
      • XFI: software guards
    • Advanced Topics: Signature Generation, Recovery
      • STEM
      • FLIPS, Shadow Honeypots, DIRA
      • Vigilante
      • VSEF: Vulnerability Specific Execution Filters
      • Application Communities
      • ShieldGen
      • Bouncer
      • Rx: execute through errors; see bugs as allergens, slightly change the environment and re-execute
      • ASSURE: model control flow, recover to a "safe" state
      • SEAD: Speculative Execution for Automated Defense: have an execution monitor that interprets "repair policy" (basically an after-production exeception handling mechanism) when a code injection attack is detected
  • http://technet.microsoft.com/en-us/security/dd285253.aspx (Mitigations by Matt Miller)
  • Code
    • We wrote a small piece of code to gather data on the value of ESP (position of stack in a randomized (i.e., ASLR environment) or non-randomized environment.
#include <stdio.h>
int main()
{
  int* x = 0;
  int y = 0xDEADBEEF;
  x = &y;
  fprintf(stdout, "%u\n", ((unsigned int)x));
  return ((int)x);
}

This produces a distribution of values showing limited (max 3 in our experiment) reuse of the same stack state. Note that the return of x from main is truncated by the shell to a short int. We wrapped the binary in a bash shell script:

#!/bin/bash
for ((;;))
do
      ./a.out >> newstack.dat
done

and then passed newstack.dat through a chain of commands, first sorting -n the output, then passing to uniq -c, then awk to print $2 then $1, then sorting -n again. We directed that final output to a file newstack.sorted and plotted it with gnuplot (plot 'newstack.sorted' with impulses).

Turning ASLR off (via `echo '0' > /proc/sys/kernel/randomize_va_space produced a file with the same value (322122588).

11 March 2011 (Actual 14 March): Debugging Session of Basic Code Injection

  • Topics: We will use gdb to look at a simple strcpy-based injection. The idea was to shut off all protections (execstack, turn off ASLR, turn off stack protector), write a small program that uses strcpy(3) unsafely:

http://pages.cpsc.ucalgary.ca/~locasto/teaching/2011/ISSA/code/scopy.c

#include <stdio.h>
#include <string.h>
int do_work(char* src){
  char dst[10];
  strcpy(dst, src);
  return 0;
}
int main(int argc, char* argv[]){
  if(2==argc){
    do_work(argv[1]);    
  }else{
    fprintf(stdout, "./scopy [arg]\n");
    return -1;
  }
  return 0;
}

and then proceed to look at the stack using GDB:

http://pages.cpsc.ucalgary.ca/~locasto/teaching/2011/ISSA/code/session.txt

  • Notes:
  • Links

16 March 2011: Analysis of a Real (but old) Vulnerability

  • Topics: the point of this lecture is to provide a hands-on assessment of an old stack-based buffer overflow vulnerability and the resulting exploit. This takes the previous class and cranks it up a notch: the vuln is in real, widely used software, and we get a richer address space than just a simple program copying a string from argv[1]
  • Notes: In this class, we looked at a wealth of information in both the source code (at the src-level definition of various functions) and at the assembly level. We first observed how rpng-x reacted when fed the proof-of-concept test case to exercise the vulnerability. We observed some output error messages and then a segfault. We started rpng-x in gdb and began to place breakpoints from the png_handle_tRNS() function in the `pngrutil.c' file. We placed breakpoints at different related functions (e.g., png_crc_finish, png_read_data, png_default_read_data) and looked at them both at the source level as well as disassembly within gdb. We were able to modify the PoC exploit to include shellcode that spawned a shell once we got the return address correct (via stack examination of where the NOP sled landed on the stack). The key problem seemed to be that the length of a fixed-sized local buffer (256 declared bytes) -- and specifically the amount of space allocated on the stack (300 bytes) was smaller than the 512 bytes of data that fread(3) ultimately stuffs into that fixed-sized local buffer.
  • Code Session
    • To be posted
  • Links
  • Readings

18 March 2011: Heap-based Overwrites of Function Pointers

23 March 2011: Counter-Countermeasures 1 (rescheduled to 25 March, 3pm)

25 March 2011: Counter-Countermeasures 2

30 March 2011: Reverse Engineering Protocols and File Formats

1 April 2011: Measuring Security Code

  • Topic: this will be a hands-on class where we will attempt to measure various security primitives and learn something about their organization.
  • Cost of Information Security

6 April 2011: Intrusion Analysis: Classic Examples and the Invention of Honeypots

8 April 2011: Intrusion Recovery: Responding and Cleanup

13 April 2011: Towards a formal theory of computer insecurity: a language-theoretic approach

15 April 2011: Guest Lecture on Quality of Digital Evidence (Sergey Bratus)

Systems Infosec Theme Areas (Examples for Essay/LitReview Topics)

The recent literature has many themes. The following are but a taste.

Example Themes

Background reading:

Conference Proceedings / Programs

USENIX Security: http://www.usenix.org/events/sec10/index.html

ACM CCS: http://www.sigsac.org/ccs.html

NDSS: http://www.isoc.org/isoc/conferences/ndss/

Oakland: http://www.ieee-security.org/TC/SP-Index.html

Literature Reviews

Post links to your literature reviews (PDF preferred) here or create a new wiki page for it.

Flaw Reports

Post links to your flaw reports (PDFs, URLs) here, or create new wiki pages describing each.

Flaw report: Kimai

Megavideo and Leksah: Ethical Brief Reports GHC: In Depth Analysis Megavideo


ATutor Multiple Flaw Report

Cad-Kas PDFreader Flaw Report

PHFTP XSS Report

TEMS XSS Report

We will have a discussion of vulnerability disclosure policies and approaches to vulnerability disclosure sometime in the next couple of weeks. I am comfortable with posting your analysis on the public wiki (i.e., here) if you have submitted the issue to the vendor and had some acknowledgment or communication with them.

Your flaw reports will be due by the end of the semester, but I encourage you to get them in before then.

Sites To Bootstrap Your Thinking

Ethical Disclosure Policy

Our class policy is the following:

The value of this exercise for the student is threefold. First, the student puts into practice the analysis tools and techniques covered in the course. Second, the students nurtures their "security mindset" by examining code with the intent to violate its expectations about correct execution. Finally, students will gain some practical experience with the ethical and practical considerations involved in the disclosure of real-life flaws.

Users and vendors have a right to be informed of potential bugs, flaws, and vulnerabilities. Our first principle is to be as precise and accurate as possible with the information we develop. Speculation should be clearly separated from the facts of the flaw report description. We will assemble information about the code flaw itself and the conditions under which it can be exercised. We will make a good faith effort to contact vendors to report this information to them ahead of any public disclosure. This contact may include personal email where the vendor is a single person responsible for maintaining the software, but we will seek to report the flaw via the vendor-preferred channels (such as a special security mailing list for the project or product or a bugzilla report). After receiving an acknowlegement from the vendor, we may choose to post a brief, abstract description of the flaw on this wiki in terms that generally mimic CERT advisories. In any case, the full report will be sent privately to the instructor. We have no obligation to offer a fix or support for the bug.

Raw Notes from Class Discussion

There are several adjectives that we might modify the term "disclosure" with, e.g.,:

  • Disclosure
  • Full Disclosure
  • Partial Disclosure
  • Ethical Disclosure
  • Unethical Disclosure
  • Responsible Disclosure
  • Irresponsible disclosure

but it is worth asking the question whether there is any such thing as "ethical" disclosure?

The complexity of this decision (who to inform, when to tell them, what to tell them, how to follow up or support your announcement) can potentially offend some stakeholder or fall out of line with their ethical expectations or frame of reference.

There is also the issue of the legal ramifications of disclosure, ranging from violating a software license agreement to violating local, state, or federal statute.

Key question: Who do you tell, when?

  • what about your employer?
  • severity of a flaw
  • but: big hole still exists
  • how do you convince vendor?
    • your credibility, employer, background, track record
    • how hard it is to actually exploit
  • vendor response, nature of the vendor
  • users (how?) specific system or bug list
  • is timeline a consideration?
    • difficulty of deploying "fix" or workaround
    • time reporting window in terms of patch cycles
  • should we propose solution or work with vendor (potential pitfall: you may promise a fix you can't deliver b/c you didn't completely understand the vuln)
  • do you have culpability for the quality of your report? generality?
  • compensation?
Principles
  • consensus: we should post, but only after disclosure to vendor
  • level of detail on wiki: CERT or secunia advisory
  • level of detail to prof: full disclosure
  • level of detail to vendor: full disclosure
  • level of detail to public: full disclosure to public IFF vendor has patched issue ?
  • should give vendor a window of opportunity
    • standard bug reporting (companies)
    • bugzilla or mailing list for security issues
    • severity, user base, exact time window, consequence
  • where does obligation end?

Suggested Projects

These are not class assignments, but interesting research or development ideas that occur during class. If one interests you, talk to me about setting up a project.

  • interesting experiment: rewrite a selection of security policies and models in E-Prime and survey ~25 security professionals for their impression of clarity, precision, correctness, etc.
  • augment ghex2 (or your favorite hex editor) to automatically group/tag similar or equal groups of bytes and replace the grouping with the tag name
  • comparative analysis of gdb, IDAPro, and Immunity Debugger
  • Use scapy to pass network data to udcli, use "abstract payload execution" to determine whether these bytes really are code; compare with NEMU

Other Resources

Information Security Conferences and Workshops