Courses/Computer Science/CPSC 355.W2014/Lecture Notes/ELFIntro

From wiki.ucalgary.ca
Jump to: navigation, search

Quick intro to the ELF format and structure; important ELF sections and contents.

#include <stdio.h>
#include <stdlib.h> //see: man 3 strtol
int MAX = -1000;
int number[10];
int main(int argc, char* argv[])
{
  int i = 0;
  int cn = 0;
  // iterate over argv[i]
  for (i=1;i<argc;i++)
  {
      // translate argv[i] into an int
      cn = strtol( argv[i], NULL, 10 );
      //compare that int with MAX
      if ( cn > MAX )
      {
          MAX = cn;
      }
  }
  fprintf(stdout, "MAX is: %d\n", MAX);
  return MAX;
}
(eye@mordor l10)$ make
gcc -Wall -g -o fx find.c
(eye@mordor l10)$ ./fx
MAX is: -1000
(eye@mordor l10)$ ./fx 1 2 4 6
MAX is: 6
(eye@mordor l10)$ ./fx 1 2 4 600
MAX is: 600
(eye@mordor l10)$ ./fx 1 2 4 -1 600
MAX is: 600
(eye@mordor l10)$ file fx
fx: ELF 32-bit LSB executable, Intel 80386, version 1 (SYSV), dynamically linked (uses shared libs), for GNU/Linux 2.6.18, not stripped
(eye@mordor l10)$ hexdump -C fx
00000000  7f 45 4c 46 01 01 01 00  00 00 00 00 00 00 00 00  |.ELF............|
00000010  02 00 03 00 01 00 00 00  60 83 04 08 34 00 00 00  |........`...4...|
00000020  38 11 00 00 00 00 00 00  34 00 20 00 08 00 28 00  |8.......4. ...(.|
...

From the hexdump output, we learned that the contents of the ELF file 'fx' actually has a rich structure, even though we don't yet know what it is.

We consult the manual page for readelf(1),

(eye@mordor l10)$ man readelf

and we can glean just by looking at the options that ELF files are organized in sections, where sections are eventually mapped to ABI constructs. In this way, the ELF represents a contract between the compiler and the OS (and eventually CPU).

Below, I highlight several of these sections (excerpted from the complete listing via -t)

  1. .init
  2. .text
  3. .data
  4. .bss

(eye@mordor l10)$ readelf -t fx There are 38 section headers, starting at offset 0x1138: Section Headers:

 [Nr] Name
      Type            Addr     Off    Size   ES   Lk Inf Al
      Flags


 [11] .init
      PROGBITS        080482dc 0002dc 000030 00   0   0  4
      [00000006]: ALLOC, EXEC
 [13] .text
      PROGBITS        08048360 000360 0001ec 00   0   0 16
      [00000006]: ALLOC, EXEC
 [24] .data
      PROGBITS        0804971c 00071c 000008 00   0   0  4
      [00000003]: WRITE, ALLOC
 [25] .bss
      NOBITS          08049740 000724 000048 00   0   0 32
      [00000003]: WRITE, ALLOC

We learned that the program code (actually, the binary representation of the x86 assembly that your C code is compiled to) is contained in the .text section. The .data section holds initialized global variables/data (so: MAX) and the .bss section holds uninitialized data (so: the numbers array). The .init section is a code section produced by the compiler that actually winds up calling your main function (in .text).

Since we believe that the variable MAX is located in .data, and we see above that .data is 8 bytes long, we should also hope to see the value -1000 (it's initialized value) at the virtual address of the .data section.

We learned by consulting the readelf manual page that we can dump the contents of an individual ELF section. For example, to dump the hexadecimal representation of the bytes in the .text section (i.e., the program code), we say:

(eye@mordor l10)$ readelf -x ".text" fx
Hex dump of section '.text':
  0x08048360 31ed5e89 e183e4f0 50545268 b0840408 1.^.....PTRh....
  0x08048370 68c08404 08515668 14840408 e8abffff h....QVh........
  0x08048380 fff49090 90909090 90909090 90909090 ................
  0x08048390 5589e553 8d6424fc 803d4497 04080075 U..S.d$..=D....u
  0x080483a0 3ebb2c96 0408a148 97040881 eb289604 >.,....H.....(..
  0x080483b0 08c1fb02 83eb0139 d8731d90 8d742600 .......9.s...t&.
  0x080483c0 83c001a3 48970408 ff148528 960408a1 ....H......(....
  0x080483d0 48970408 39d872e8 c6054497 0408018d H...9.r...D.....
  0x080483e0 6424045b 5dc38d76 008dbc27 00000000 d$.[]..v...'....
  0x080483f0 5589e58d 6424e8a1 30960408 85c07412 U...d$..0.....t.
  0x08048400 b8000000 0085c074 09c70424 30960408 .......t...$0...
  0x08048410 ffd0c9c3 5589e583 e4f083ec 20c74424 ....U....... .D$
  0x08048420 18000000 00c74424 1c000000 00c74424 ......D$......D$
  0x08048430 18010000 00eb418b 442418c1 e0020345 ......A.D$.....E
  0x08048440 0c8b00c7 4424080a 000000c7 44240400 ....D$......D$..
  0x08048450 00000089 0424e8e1 feffff89 44241ca1 .....$......D$..
  0x08048460 20970408 3944241c 7e098b44 241ca320  ...9D$.~..D$.. 
  0x08048470 97040883 44241801 8b442418 3b45087c ....D$...D$.;E.|
  0x08048480 b68b0d20 970408ba 74850408 a1409704 ... ....t....@..
  0x08048490 08894c24 08895424 04890424 e8abfeff ..L$..T$...$....
  0x080484a0 ffa12097 0408c9c3 90909090 90909090 .. .............
  0x080484b0 5589e55d c366662e 0f1f8400 00000000 U..].ff.........
  0x080484c0 5589e557 5653e84f 00000081 c3351200 U..WVS.O.....5..
  0x080484d0 0083ec1c e803feff ff8dbb20 ffffff8d ........... ....
  0x080484e0 8320ffff ff29c7c1 ff0285ff 742431f6 . ...)......t$1.
  0x080484f0 8b451089 4424088b 450c8944 24048b45 .E..D$..E..D$..E
  0x08048500 08890424 ff94b320 ffffff83 c60139fe ...$... ......9.
  0x08048510 72de83c4 1c5b5e5f 5dc38b1c 24c39090 r....[^_]...$...
  0x08048520 5589e553 8d6424fc a1209604 0883f8ff U..S.d$.. ......
  0x08048530 7412bb20 96040890 8d5bfcff d08b0383 t.. .....[......
  0x08048540 f8ff75f4 8d642404 5b5dc390          ..u..d$.[]..

The first column is the virtual address, the middle columns are the byte values, and the final column is the ASCII representation of those byte values.

We can do the same thing to the .data section.

(eye@mordor l10)$ readelf -x ".data" fx
Hex dump of section '.data':
  0x0804971c 00000000 18fcffff                   ........

Is the value 0x18fcffff really -1000? From what we know about two's complement and hexadecimal representation, it doesn't seem like it. What does printf(1) say?

(eye@mordor l10)$ printf "%d\n" 0x18fcffff
419233791
(eye@mordor l10)$ 

Indeed, something is weird here. This is an artifact of how data is stored in memory in an x86 computer. The IA-32 is a little-endian architecture, which means that the least significant byte is stored in memory at the smallest address (hence, above, the byte '18' is stored at the smaller address and thus the leftmost position).

Thus, the "number" at that address is actually: 0xfffffc18

We can print this value in many different representations:

(gdb) print /t 0xfffffc18
$4 = 11111111111111111111110000011000
(gdb) print /x 0xfffffc18
$5 = 0xfffffc18
(gdb) print /d 0xfffffc18
$6 = 4294966296

But these are unsigned representations of the value. We can see that by printing its decimal representation above, it is a very large positive number. But very large positive numbers can also represent small negative values.

(gdb) print /d 0x0-0xfffffc18
$15 = 1000
(gdb)