Courses/Computer Science/CPSC 355.W2014/Lecture Notes/InstrEncoding
Encoding x86 Instructions
Today we will continue our discussion of the 8088 instruction set and practice encoding and decoding various instructions.
Topics
- ISA overview
- FDX cycle and pipelinin
- relationship between decoding and execution
- exposing instruction--level parallelism
- instruction set architecture
- ISA elements for decoding
- addressing modes
- variations
- machine instruction examples (as time allows)
- INT3
- INT 0x21
- HLT
- MOV
- PUSH
- POP
- DEC
- NEG
- CBW (convert byte to word)
- CWD (convert word (16 bits) to double word (32 bits))
- NOT
- XOR
- Handouts of 8088 data sheet.
- Writing 8088 code for an DOS environment (DOSBOX)
We spent much of today manually translating a small assembly program. To simplify the task, this program contained only instructions we have seen before, it had only straightline control flow, and it only dealt with registers and immediate values. As a warmup to help ourselves scan the instruction format section of the 8088 data sheet, we reminded ourselves what these 3 instructions
INT3 INT 0x21 NOP (really: XCHG AX, AX)
translated into (we also looked at this on Monday).
0xCC 0xCD21 0x90
Here is a 16-bit assembly file containing the program we looked at on the whiteboard:
(eye@mordor l12)$ cat foo.asm BITS 16 CPU 8086 GLOBAL _start SECTION .text _start: HLT ; MOV AX, 0x5 ; PUSH BX ; POP CX ; DEC AX ; NEG CX NOT CX ; XOR BH, BL (eye@mordor l12)$
Here is a disassembly of the resulting assembled machine code:
(eye@mordor l12)$ ndisasm foo 00000000 F4 hlt 00000001 B80500 mov ax,0x5 00000004 53 push bx 00000005 59 pop cx 00000006 48 dec ax 00000007 F7D9 neg cx 00000009 F7D1 not cx 0000000B 30DF xor bh,bl (eye@mordor l12)$
HLT is easy enough to encode as F4 -- its value is right there on the data sheet and it takes no arguments. Let's take a closer look at some simple instructions, like PUSH BX, POP CX, and DEC AX. All these instructions deal with 16 bit register values. In looking at the data sheet, we see that the PUSH, POP, and DEC instructions each have a few different formats they could be expressed in, depending on which operands we're talking about. For each, the "register" format is relatively simple and involves looking up the correct register specifier in the table on the rear of the 8088 data sheet.
Looking at NEG and NOT of the CX register, we remind ourselves that we've seen these instructions before and we know how they are semantically related: NEG is the two's complement or arithmetic negation, and NOT is one's complement or logical negation. They both modify the CX register, and again translating them is a matter of looking up the bit pattern for a NEG or NOT of a 16-bit register. These instructions also cause us to have to deal with the w bit, the mod bits and the r/m bits. Since we're dealing with a register operand, the two 'mod' bits are 11, and since we're dealing with 16 bit (or word)-sized operands the 'w' bit is also '1'. The three r/m bits specify the register involved in this operation; in the case of the NEG and NOT instructions in our little program, this is CX, whose specifier is '001'
For example, translating "NOT CX":
NOT CX ;F7D1 ;; ;; 1111 011w mod 010 r/m ;; 1111 0111 11 010 001 ;; 1111 0111 1101 0001 ;; F 7 D 1
The two most complicated instructions in this small program are the MOV and XOR, but even they are not much more complex than the instructions we have just overcome.
MOV AX, 0x5 ; ;; we pick the third instruction format listed: "Immediate to Register" ;; 1011 w reg [data] [data if w=1] ;; If w==1, then this means the immediate value is 2 bytes (16 bits, word-sized). If w==0, then the immediate data is ;; only a byte. ;; 1011 1 AX [0000 0000] [0000 0101] ;; 1011 1000 0000 0000 0000 0101 ;; B80005 ;; questions: then why is the disassembly above indicate: 0xB80500? ;; answer: little-endian storage of the data values.
The XOR case is also interesting as a test of our translation skills (especially: making sure we understand the semantics of the 'd' bit):
XOR BH, BL ; 0x30DF ;; BH = 111 ;; BL = 011 ;; 001100dw mod reg r/m ;; 001100d0 xx 111 011 ;; 00110010 11 111 011 ;; 0011 0010 1111 1011 ;; 3 1 F b ;; ;; is this translation correct? Has our choice of the 'd' bit misled us? No, and yes. The correct ;; translation is below with the semantics of "to" BH, i.e., ( BH := XOR(BH,BL) )
But 0x31FB does not actually represent XOR BH, BL! Instead, it represents XOR BX, DI:
[michael@gondolin PC-DOS]$ echo -en "\x31\xFB" |udcli -16 0000000000000000 31fb xor bx, di [michael@gondolin PC-DOS]$
We must be careful to use the direction bit as specified in the Intel syntax; the direction bit does not allow us to arbitrarily re-order the operands. So if we try to do the above (e.g., say "XOR BH and BL, but store the result in BH), the 'reg' bits MUST be set to BH. It does not suffice to set the 'reg' bits to BL and then say "from BL". Thus, the proper way of encoding is:
;; 001100dw mod reg r/m ;; 00110000 11 BH BL ;; 00110000 11 011 111 ;; 0011 0000 1101 1111 ;; 3 0 D F