Courses/Computer Science/CPSC 355.W2014/Lecture Notes/IA32Intro

= Intro and Examination of x86 Assembly Language =

Announcements

 * Homeworks are released.
 * note: this week is about (1) teaching you basic concepts of the architecture and (2) a teaser for showing you the correspondence between "high" level code and assembly language.

Class Notes
We started class by augmenting the "ChalkSim" picture with access to memory, especially the program stack. Now we are treating portions of memory as special and distinct from each other. We will keep augmenting this picture throughout the semester. Note that access to the stack required instructions (e.g., PUSH and POP) and some way to keep track of the location of the stack (the ESP register (and the EBP register)).

We then picked up with a discussion of the 'gx' program and its code, learning about its assembly representation via the objdump -d output.

080483e4 : 80483e4:	55                  	push   ebp 80483e5:	89 e5               	mov    ebp,esp 80483e7:	83 e4 f0            	and    esp,0xfffffff0 80483ea:	83 ec 20            	sub    esp,0x20 80483ed:	ba d4 84 04 08      	mov    edx,0x80484d4 80483f2:	a1 80 96 04 08      	mov    eax,ds:0x8049680 80483f7:	89 54 24 04         	mov    DWORD PTR [esp+0x4],edx 80483fb:	89 04 24            	mov    DWORD PTR [esp],eax 80483fe:	e8 15 ff ff ff      	call   8048318  8048403:	89 44 24 1c         	mov    DWORD PTR [esp+0x1c],eax 8048407:	8b 44 24 1c         	mov    eax,DWORD PTR [esp+0x1c] 804840b:	c9                  	leave 804840c:	c3                  	ret 804840d:	90                  	nop 804840e:	90                  	nop 804840f:	90                  	nop

We learned what information each of these sets of columns contains. We learned some of the machine codes for specific instructions. We observed the dependency of the program counter on the instruction addresses and the instruction lengths. We recalled that even with this small sample, we could begin to categorize instructions into four or five groups:
 * logic operations
 * arithmetic operations
 * data move/transfer operations
 * I/O operations (not shown)
 * control transfer instructions

Compare this output with the output of the 'hexdump' utility for the same region of bytes in the ELF file 'gx' (see byte beginning at offset 3e0):

(eye@mordor l5)$ hexdump -C ../l4/gx 00000000 7f 45 4c 46 01 01 01 00  00 00 00 00 00 00 00 00  |.ELF............| 00000010 02 00 03 00 01 00 00 00  30 83 04 08 34 00 00 00  |........0...4...| ... 000003e0  ff d0 c9 c3 55 89 e5 83  e4 f0 83 ec 20 ba d4 84  |....U....... ...| 000003f0 04 08 a1 80 96 04 08 89  54 24 04 89 04 24 e8 15  |........T$...$..| 00000400 ff ff ff 89 44 24 1c 8b  44 24 1c c9 c3 90 90 90  |....D$..D$......| ... 00001280 42 43 5f 32 2e 30 00 5f  65 64 61 74 61 00 5f 5f  |BC_2.0._edata.__| 00001290 69 36 38 36 2e 67 65 74  5f 70 63 5f 74 68 75 6e  |i686.get_pc_thun| 000012a0 6b 2e 62 78 00 6d 61 69  6e 00 5f 69 6e 69 74 00  |k.bx.main._init.| 000012b0 (eye@mordor l5)$

Literally, the ELF file contains only the byte values. Objdump does us a favor and disassembles this machine code back to human-readable assembly. The ELF contains the machine code: the actual number/byte values representing the bit patterns that the circuitry of the machine will interpret. We looked around at some of the other assembly instructions and were able to observe the phenomena that most programs contain a small number of unique instructions: in other words, the distribution of instructions in real programs follows a power law: there is a relatively small number of popular and commonly used instructions. To get a sense of the distribution, we can use some command line scripting over the output of 'objdump' (see below). Note that I cleaned up this output slightly to remove spurious output items that are not instructions. We can see that data movement operations are quite common.

(eye@mordor l5)$ objdump -d -j ".text" --no-show-raw-insn ./ax | awk '{print $2}' | sort | uniq -c | sort -nr 34 nop 27 mov 19 push 12 lea 11 pop 7 ret 7 call 5 cmp 4 sub 4 je      4 add 3 test 2 xor 2 sar 2 jne 2 jb      1 of       1 movl 1 movb 1 leave 1 jae 1 hlt 1 file 1 data32 1 cmpb 1 and (eye@mordor l5)$

Intro to GDB
We then moved on to using the 'gdb' program to attach to and run the 'gx' program. It can also show us the disassembly of the code, but more importantly, it allows us to control the program and examine the state of the CPU and memory.

(eye@mordor l4)$ gdb -q ./gx Reading symbols from /home/eye/355/lectures/l4/gx...(no debugging symbols found)...done. (gdb) run Starting program: /home/eye/355/lectures/l4/gx hello, 355 Program exited with code 013. Missing separate debuginfos, use: debuginfo-install glibc-2.12-1.107.el6.i686 (gdb) run AA Starting program: /home/eye/355/lectures/l4/gx AA hello, 355 Program exited with code 013. (gdb) break main Breakpoint 1 at 0x80483e7 (gdb) run AAa Starting program: /home/eye/355/lectures/l4/gx AAa Breakpoint 1, 0x080483e7 in main (gdb) info reg eax           0xbffff8f4	-1073743628 ecx           0xae509aa	182782378 edx           0x2	2 ebx           0x744ff4	7622644 esp           0xbffff848	0xbffff848 ebp           0xbffff848	0xbffff848 esi           0x0	0 edi           0x0	0 eip           0x80483e7	0x80483e7  eflags        0x200246	[ PF ZF IF ID ] cs            0x73	115 ss            0x7b	123 ds            0x7b	123 es            0x7b	123 fs            0x0	0 gs            0x33	51 (gdb) disassemble main Dump of assembler code for function main: 0x080483e4 <+0>:	push  %ebp 0x080483e5 <+1>:	mov   %esp,%ebp => 0x080483e7 <+3>:	and   $0xfffffff0,%esp 0x080483ea <+6>:	sub   $0x20,%esp 0x080483ed <+9>:	mov   $0x80484d4,%edx 0x080483f2 <+14>:	mov   0x8049680,%eax 0x080483f7 <+19>:	mov   %edx,0x4(%esp) 0x080483fb <+23>:	mov   %eax,(%esp) 0x080483fe <+26>:	call  0x8048318  0x08048403 <+31>:	mov   %eax,0x1c(%esp) 0x08048407 <+35>:	mov   0x1c(%esp),%eax 0x0804840b <+39>:	leave 0x0804840c <+40>:	ret End of assembler dump. (gdb) step Single stepping until exit from function main, which has no line number information. hello, 355 0x005c8ce6 in __libc_start_main from /lib/libc.so.6 (gdb) stepi 0x005c8ce9 in __libc_start_main from /lib/libc.so.6 (gdb) nexti Program exited with code 013. (gdb) run AAA Starting program: /home/eye/355/lectures/l4/gx AAA Breakpoint 1, 0x080483e7 in main (gdb) nexti 0x080483ea in main (gdb) 0x080483f2 in main (gdb) 0x080483f7 in main (gdb) 0x080483fb in main (gdb)

We'll pick up again with this mode of program analysis next week and throughout the semester.

Case Study: The AMD K6-2
The physical CPU I had dug out of the HP PC the first week of class is an AMD K6-2 processor. You can search google for this processor's data sheet.
 * http://www.amd-k6.com/cpu-specs/
 * http://www.datasheetcatalog.com/datasheets_pdf/A/M/D/-/AMD-K6-2.shtml

This processor is a binary-compatible implementation of the x86 / Intel IA-32 architecture. The existence of this chip is an excellent illustration of the difference between an architecture and a microarchitecture: the high--level specification of the "public" part of the CPU and the private implementation supporting the execution of programs conforming to the x86 language. See especially:
 * PDF page 22, paragraph 4 (a list of microarchitecture features, many of which we'll examine this semester)
 * PDF page 25, Section 2.2, which says, in part:

"When discussing processor design, it is important to understand the terms architecture, microarchitecture, and design implementation.  The term architecture  refers to the instruction set and features of a processor that are visible to software programs running on the processor.   The architecture determines what software the processor can run. The architecture of the AMD-K6-2 processor is the industry-standard x86 instruction set."

"The term microarchitecture refers to the design techniques used in the processor to reach the target cost, performance, and functionality goals."

"The term design implementation refers to the actual logic and circuit designs from which the processor is created according to the microarchitecture specifications."

--all the above from the AMD K6-2 data sheet found at the above two URLs

This document also illustrates the stark difference (but surprising correspondence) between the architecture (e.g., the general CPU organization picture we've been looking at and adding to every class session so far) and the microarchitecture (cf. Figure 1 (PDF page 27)).