Courses/Computer Science/CPSC 355.W2014/Lecture Notes/InstrEncoding

Jump to: navigation, search

Encoding x86 Instructions

Today we will continue our discussion of the 8088 instruction set and practice encoding and decoding various instructions.


  1. ISA overview
  2. FDX cycle and pipelinin
    1. relationship between decoding and execution
    2. exposing instruction--level parallelism
  • instruction set architecture
  • ISA elements for decoding
  • addressing modes
  • variations
  • machine instruction examples (as time allows)
    • INT3
    • INT 0x21
    • HLT
    • MOV
    • PUSH
    • POP
    • DEC
    • NEG
    • CBW (convert byte to word)
    • CWD (convert word (16 bits) to double word (32 bits))
    • NOT
    • XOR
  1. Handouts of 8088 data sheet.
  2. Writing 8088 code for an DOS environment (DOSBOX)

We spent much of today manually translating a small assembly program. To simplify the task, this program contained only instructions we have seen before, it had only straightline control flow, and it only dealt with registers and immediate values. As a warmup to help ourselves scan the instruction format section of the 8088 data sheet, we reminded ourselves what these 3 instructions

INT 0x21
NOP (really: XCHG AX, AX)

translated into (we also looked at this on Monday).


Here is a 16-bit assembly file containing the program we looked at on the whiteboard:

(eye@mordor l12)$ cat foo.asm
CPU 8086
GLOBAL _start
        HLT                     ;
        MOV AX, 0x5             ;
        PUSH BX                 ;
        POP CX                  ;
        DEC AX                  ;
        NEG CX
        NOT CX                  ;
        XOR BH, BL              
(eye@mordor l12)$ 

Here is a disassembly of the resulting assembled machine code:

(eye@mordor l12)$ ndisasm foo
00000000  F4                hlt
00000001  B80500            mov ax,0x5
00000004  53                push bx
00000005  59                pop cx
00000006  48                dec ax
00000007  F7D9              neg cx
00000009  F7D1              not cx
0000000B  30DF              xor bh,bl
(eye@mordor l12)$ 

HLT is easy enough to encode as F4 -- its value is right there on the data sheet and it takes no arguments. Let's take a closer look at some simple instructions, like PUSH BX, POP CX, and DEC AX. All these instructions deal with 16 bit register values. In looking at the data sheet, we see that the PUSH, POP, and DEC instructions each have a few different formats they could be expressed in, depending on which operands we're talking about. For each, the "register" format is relatively simple and involves looking up the correct register specifier in the table on the rear of the 8088 data sheet.

Looking at NEG and NOT of the CX register, we remind ourselves that we've seen these instructions before and we know how they are semantically related: NEG is the two's complement or arithmetic negation, and NOT is one's complement or logical negation. They both modify the CX register, and again translating them is a matter of looking up the bit pattern for a NEG or NOT of a 16-bit register. These instructions also cause us to have to deal with the w bit, the mod bits and the r/m bits. Since we're dealing with a register operand, the two 'mod' bits are 11, and since we're dealing with 16 bit (or word)-sized operands the 'w' bit is also '1'. The three r/m bits specify the register involved in this operation; in the case of the NEG and NOT instructions in our little program, this is CX, whose specifier is '001'

For example, translating "NOT CX":

        NOT CX                  ;F7D1
        ;; 1111 011w mod 010 r/m
        ;; 1111 0111 11  010 001
        ;; 1111 0111 1101 0001
        ;; F    7    D    1

The two most complicated instructions in this small program are the MOV and XOR, but even they are not much more complex than the instructions we have just overcome.

        MOV AX, 0x5             ;
        ;; we pick the third instruction format listed: "Immediate to Register"
        ;; 1011 w reg   [data] [data if w=1]
        ;; If w==1, then this means the immediate value is 2 bytes (16 bits, word-sized). If w==0, then the immediate data is
        ;;  only a byte.
        ;; 1011 1 AX [0000 0000] [0000 0101]
        ;; 1011 1000 0000 0000 0000 0101
        ;; B80005
        ;; questions: then why is the disassembly above indicate: 0xB80500?
        ;; answer: little-endian storage of the data values.

The XOR case is also interesting as a test of our translation skills (especially: making sure we understand the semantics of the 'd' bit):

        XOR BH, BL              ; 0x30DF
        ;; BH = 111
        ;; BL = 011
        ;; 001100dw mod reg r/m
        ;; 001100d0 xx  111 011
        ;; 00110010 11 111 011
        ;; 0011 0010 1111 1011
        ;; 3    1    F    b
        ;; is this translation correct? Has our choice of the 'd' bit misled us? No, and yes. The correct
        ;; translation is below with the semantics of "to" BH, i.e., ( BH := XOR(BH,BL) )

But 0x31FB does not actually represent XOR BH, BL! Instead, it represents XOR BX, DI:

[michael@gondolin PC-DOS]$ echo -en "\x31\xFB" |udcli -16
0000000000000000 31fb             xor bx, di              
[michael@gondolin PC-DOS]$ 

We must be careful to use the direction bit as specified in the Intel syntax; the direction bit does not allow us to arbitrarily re-order the operands. So if we try to do the above (e.g., say "XOR BH and BL, but store the result in BH), the 'reg' bits MUST be set to BH. It does not suffice to set the 'reg' bits to BL and then say "from BL". Thus, the proper way of encoding is:

        ;; 001100dw mod reg r/m
        ;; 00110000  11  BH  BL
        ;; 00110000  11 011 111
        ;; 0011 0000 1101 1111
        ;; 3    0    D    F