Courses/Computer Science/CPSC 355.W2014/Lecture Notes/ComplexTypes

From wiki.ucalgary.ca
Jump to: navigation, search

Collections of primitive types.

  • Arrays
  • Structures
  • Unions

Subtopics

  • sizeof
  • declarations in C
  • appearance in assembly
    • padding by compiler, alignment
    • access a position or field via xxx
    • location / representation, global, on stack
  • global, static, initialized, uninitialized,
  • pointers to structure instances

Arrays

Arrays are contiguous collections of data items of the same type.

#include <stdio.h>
int i[2];
int data[16];
char name[8] = "michael";
char identifier[] = {0};
short beef[] = {
  0xD, 0xE, 0xA, 0xD,
  0xB, 0xE, 0xE, 0xF
};
int main(int argc, char* argv[]){
  int whereami[64];
  fprintf(stdout,
          "sizeof(i)          = %d\n"
          "sizeof(data)       = %d\n"
          "sizeof(name)       = %d\n"
          "sizeof(identifier) = %d\n"
          "sizeof(beef)       = %d\n"
          "sizeof(whereami)   = %d\n",
          sizeof(i),
          sizeof(data),
          sizeof(name),
          sizeof(identifier),
          sizeof(beef),
          sizeof(whereami));
  whereami[60] = 0xaabb1000;
  whereami[0] = 0x41414141;
  whereami[63] = 0xffffffff;
  //is the following statement "legal"?                                                            
  //is this "possible"?                                                  
  //will this "work"?                                                         
  whereami[64] = 0x44eeeeee;
  return 0;
}

This C code gets translated to:

080483e4 <main>:
 80483e4:	55                   	push   %ebp
 80483e5:	89 e5                	mov    %esp,%ebp
 80483e7:	83 e4 f0             	and    $0xfffffff0,%esp
 80483ea:	81 ec 20 01 00 00    	sub    $0x120,%esp
 80483f0:	ba 34 85 04 08       	mov    $0x8048534,%edx
 80483f5:	a1 80 97 04 08       	mov    0x8049780,%eax
 80483fa:	c7 44 24 1c 00 01 00 	movl   $0x100,0x1c(%esp)
 8048401:	00 
 8048402:	c7 44 24 18 10 00 00 	movl   $0x10,0x18(%esp)
 8048409:	00 
 804840a:	c7 44 24 14 01 00 00 	movl   $0x1,0x14(%esp)
 8048411:	00 
 8048412:	c7 44 24 10 08 00 00 	movl   $0x8,0x10(%esp)
 8048419:	00 
 804841a:	c7 44 24 0c 40 00 00 	movl   $0x40,0xc(%esp)
 8048421:	00 
 8048422:	c7 44 24 08 08 00 00 	movl   $0x8,0x8(%esp)
 8048429:	00 
 804842a:	89 54 24 04          	mov    %edx,0x4(%esp)
 804842e:	89 04 24             	mov    %eax,(%esp)
 8048431:	e8 e2 fe ff ff       	call   8048318 <fprintf@plt>
 8048436:	c7 84 24 10 01 00 00 	movl   $0xaabb1000,0x110(%esp)
 804843d:	00 10 bb aa 
 8048441:	c7 44 24 20 41 41 41 	movl   $0x41414141,0x20(%esp)
 8048448:	41 
 8048449:	c7 84 24 1c 01 00 00 	movl   $0xffffffff,0x11c(%esp)
 8048450:	ff ff ff ff 
 8048454:	c7 84 24 20 01 00 00 	movl   $0x44eeeeee,0x120(%esp)
 804845b:	ee ee ee 44 
 804845f:	b8 00 00 00 00       	mov    $0x0,%eax
 8048464:	c9                   	leave  
 8048465:	c3                   	ret     

Note the differences in how global variables are treated vs. the local array on main's stack (i.e., references to a static global address vs. an offset of esp register.

How many bytes long are .bss and .data sections? Does this match our expectation about what global variables were initialized?

(eye@mordor l14)$ readelf -S ax
There are 30 section headers, starting at offset 0x8d0:
Section Headers:
  [Nr] Name              Type            Addr     Off    Size   ES Flg Lk Inf Al
  ...
   [24] .data             PROGBITS        08049760 000760 00001c 00  WA  0   0  4
   [25] .bss              NOBITS          08049780 00077c 000068 00  WA  0   0 32

What is in the .data section? What do we learn about the initialized character array? (For example, we see the content in .data, not a "pointer" to some other data somewhere else.)

(eye@mordor l14)$ readelf -x ".data" ax
Hex dump of section '.data':
  0x08049760 00000000 6d696368 61656c00 0d000e00 ....michael.....
  0x08049770 0a000d00 0b000e00 0e000f00          ............
(eye@mordor l14)$ readelf -x ".bss" ax
Section '.bss' has no data to dump.

What effect does running the program have in telling us about the sizes of these variables (and any inserted padding)?

(eye@mordor l14)$ ./ax
sizeof(i)          = 8
sizeof(data)       = 64
sizeof(name)       = 8
sizeof(identifier) = 1
sizeof(beef)       = 16
sizeof(whereami)   = 256

Simple Structures (Structs)

(eye@mordor l14)$ cat introstruct.c
struct simple
{
  char x;
  int data;
};
struct simple s;
int main(int argc, char* argv[])
{
  s.x = 0x41;
  s.data = 0xffffff00;
  return s.x;
}
(eye@mordor l14)$ 

Turns into this code (nm says that 's' is located at:

08049624 B s

So is 's' in the .data section or in .bss? (Based on what we know about uninitialized global variables, where should it be? The "B" above is a hint.)

  [24] .data             PROGBITS        08049618 000618 000004 00  WA  0   0  4
  [25] .bss              NOBITS          0804961c 00061c 000010 00  WA  0   0  4

This is the assembly representation of the code:

08048394 <main>:
 8048394:	55                   	push   ebp
 8048395:	89 e5                	mov    ebp,esp
 8048397:	c6 05 24 96 04 08 41 	mov    BYTE PTR ds:0x8049624,0x41
 804839e:	c7 05 28 96 04 08 00 	mov    DWORD PTR ds:0x8049628,0xffffff00
 80483a5:	ff ff ff 
 80483a8:	0f b6 05 24 96 04 08 	movzx  eax,BYTE PTR ds:0x8049624
 80483af:	0f be c0             	movsx  eax,al
 80483b2:	5d                   	pop    ebp
 80483b3:	c3                   	ret    


Complex Structures and Unions

Let's get a bit more complicated. We will declare a number of structures.


Other topics

  • structures of arrays
  • arrays of structures

Invoking write(2)

Also a quick overview of how to invoke the write(2) system call on Linux. (needed for HW4)