CSci 5980/8980: Homework Assignment 1

Binary Reverse Engineering

Homework Assignment 1: basic manual binary reverse engineering

This homework assignment will be due on Wednesday, March 18th, before lecture (i.e., at 9:45am). You should submit a .tar.gz file containing a PDF with answers to questions 1 and 2, and C files containing answers to questions 3 and 4.

Parts with no extra label should be answered by all students. The parts labeled (5980) are only required of students registered for the 5980 version of the course.

1. You're the disassembler

This question gives you some practice disassembling x86-64 instructions. Each part lists a sequence of bytes in hex that are the bytes of an x86-64 instruction. Your job is to write the corresponding assembly code for the instruction.

You can use a computer to check your answers, such as by writing the instructions in an assembly file and checking that they assemble to the right hex bytes. But the point of the question is to do the disassembly yourself using non-automated resources like the CPU manual. My specific recommendation is to use the latest version of the Combined Volume Set of Intel 64 and IA-32 Architectures Software Developer's Manuals. Whatever non-automated resource you use, you need to also provide brief notes on which parts of the resource you used, to prove that it was you rather than the computer doing the work. In the case of the Intel manual or a similar book-like document, you can include the page numbers of the pages you referenced.

You may use either AT&T or Intel syntax in your answers, but you must use one or the other consistently and declare which format you're using.

52
0f 31
41 94
20 cc
f7 e1
13 08
f3 aa (5980)
08 16
ff 04 24 (5980)
41 ff c0
48 83 c0 ff (5980)
48 8d 3c bf
83 6c 4b 18 42 (5980)
8b 35 80 89 00 00
66 c7 84 16 80 59 00 00 5c 17 (5980)

2. One-instruction decompilation

This question covers the very smallest-scale decompilation of assembly instructions to C code. Each part gives one or two x86-64 instructions; your task is express the semantics of that instruction (or two) in C. These are self-contained versions of the systematic first step of decompilation we saw in lecture.

Your C code should represent the registers with variables with the same names; each answer should contain an appropriate type declaration for all the variables for registers mentioned in the instruction(s). (Of course in a working C program, the variables would need to be initialized in between the declaration and use, but you can ignore this.) Represent jumps with gotos to labels named after the jump target address.

Here's an example. If the question is:

add    %rax,%rbx

Then you could answer:

long rax, rbx;
rbx += rax;

The instructions are shown in the syntax of the objdump disassembler.

```
xor    %ecx,%edx
  
```
```
sub    $0x1,%cl
  
```
```
neg    %dx
  
```
```
callq  400040 
  
```
```
imul   %rcx,%rax
  
```
```
or     %cl,(%rbx)
  
```
```
and    (%r11),%r12w   // (5980)
  
```
```
notl   0x10(%rbp)
  
```

movq   $0x64,(%rbx,%rax,8)   // (5980)

```
lea    0x5(%rax,%rax,4),%r15
  
```

cmp    $0xa,%edi  // (5980)
je     400110

```
cmp    $0x14,%rdi
jae    400120
  
```

test   %dx,%dx  // (5980)
je     400130

```
test   $0x400,%eax
jne    400140
  
```

test   %ecx,%ecx  // (5980)
jle    400150

3. Function decompilation (mystery with arrays)

The binary q3-mystery does something when provided with a non-negative integer as an argument. Most of the interesting work of the program is performed by a single function that the developer helpfully named func. Your task for this question is to decompile, understand, and document that function.

To help you a bit with testing your decompilation, we've also provided an object file test-func.o which you can link together with an implementation of func to give a stand-alone test program that exercises more of the function's functionality than the q3-mystery program does.

For the first part of the question, do a mechanical instruction-by-instruction decompilation of func into working but unreadable C code, like we've demonstrated in class.
Then, convert your mechanical decompilation into natural-looking C code, while figuring out what it does and why. Document your understanding by writing a detailed comment for the function which contains enough information about its arguments and what it does that another programmer could use the function based just on the comment, without reading the function's source.

Turn in two separate C files defining a function func, one for each part.

4. Function decompilation (tree-stripped)

The binary tree-stripped is the same one we looked at in lecture, where the name gives away that it does something related to trees. In lecture, we decompiled the function func_4011c2. In this question you'll continue the job.

Do a mechanical instruction-by-instruction decompilation of func_401226 into working but unreadable C code.
Convert your mechanical decompilation of func_401226 into natural-looking C code, and give it a detailed comment.
(5980) Do a mechanical instruction-by-instruction decompilation of the main function into working but unreadable C code. Working should mean that the whole program works correctly when your decompiled functions are combined with the one we did in class.
(5980) Convert your mechanical decompilation of main into natural-looking C code.

For parts (a) and (b), turn in two separate C files. The one for (a) should define a function named func_401226, while the one for (b) should define a function with a more natural name. For parts (c) and (d) (5980), turn in C files for complete programs.