CSci 8980-1: Hands-on Assignment 1 solutions

Program Analysis for Security

Hands-on Assignment 1 solutions: symbolic execution and (separately) binary code

1. Magic squares with KLEE (20 pts)

a. (10 pts) Your classmate Alyssa started implementing the checking function, but didn't have time to finish it. Fill in the remaining checks:

    /* Check that each row has the correct sum. */
    for (i = 0; i < order; i++) {
        int sum = 0;
        for (j = 0; j < order; j++) {
            sum += a[order*i + j];
        }
        assert(sum == target_sum);
    }

    /* Check that each column has the correct sum. */
    for (j = 0; j < order; j++) {
        int sum = 0;
        for (i = 0; i < order; i++) {
            sum += a[order*i + j];
        }
        assert(sum == target_sum);
    }

    /* Check that both diagonals have the correct sum. */
    {
        int sum = 0;
        for (i = 0; i < order; i++) {
            sum += a[order*i + i];
        }
        assert(sum == target_sum);
    }
    {
        int sum = 0;
        for (i = 0; i < order; i++) {
            sum += a[order*i + (order - i - 1)];
        }
        assert(sum == target_sum);
    }

    printf("Check succeeded\n");

Look through the test cases KLEE generated to find which test corresponds to the successful square.

% ktest-tool klee-out-0/test000001.ktest
ktest file : 'klee-out-0/test000001.ktest'
args       : ['magic-square.o']
num objects: 1
object    0: name: 'square'
object    0: size: 36
object    0: data: '\x08\x00\x00\x00\x01\x00\x00\x00\x06\x00\x00\x00
                    \x03\x00\x00\x00\x05\x00\x00\x00\x07\x00\x00\x00
                    \x04\x00\x00\x00\t  \x00\x00\x00\x02\x00\x00\x00'

8	1	6
3	5	7
4	9	2

b. (10 pts) The code Alyssa wrote for checking whether all the numbers appear in the matrix, shown above, looks rather weird. Why do you think she implemented the check this way?

The way Alyssa wrote the checking code reduces the number of branches. The more usual way of writing a check like this would break out of the inner loop as soon as it found an occurrence of the target number, which has the advantage of reducing the number of iterations of the inner loop required. But if the code were written that way, every possible combination of locations of the numbers would be a different control-flow path, greatly increasing the number of paths symbolic execution would have to explore. By contrast Alyssa's use of the &= operator to update a boolean can be compiled without a branch. There's still a branch for the assert at the end of each loop iteration, but since the program will terminate if the assertion fails, this branch also doesn't cause a multiplicative increase in the number of paths.

2. Counting paths through rot13 (25 pts)

To help Ben understand what's going wrong, explain to him how many execution paths KLEE would have to explore if you ran rot13 on a 25-character symbolic input. It won't be feasible to determine this number of paths directly by experiment, so instead, run KLEE on some short strings to see what the pattern is, and then use math to extrapolate. Be sure to explain where your number comes from: for instance if you use a formula that has a constant in it, you should explain why that constant has the value it does.

If we run the example to completion with symbolic string inputs of length l equal to 1, 2, 3, 4, and 5, KLEE reports the number of paths as 6, 31, 156, 781, and 3906:

% klee -libc=uclibc --posix-runtime rot13.bc -sym-arg 5
[...]
KLEE: done: completed paths = 3906

The formula for this sequence is (5^(l+1) - 1)/4, which is the formula for the sum of a geometric series with ratio 5 (6 = 1 + 5, 31 = 1 + 5 + 25, etc.). The reason for this summation is that a symbolic string of length l can hold a string of length l or of a shorter length, if one of the characters is a null. For instance for a three character symbolic input, it could represent the unique empty string if the first character is \0, or it could represent a one-character string if the second character is \0, or a two-character string if the third character is \0, or a three-character string if none of the symbolic bytes is null.

The ratio 5 is the branching factor: the factor by which the number of paths increases for each additional symbolic byte. If we were just exploring concretely, the branching factor would be 256, but for symbolic execution it's determined by the number of possible paths through the body of the loop. For instance all upper-case characters take the same path. The case of a character being null is counted separately (as discussed in the previous paragraph), so the factor of 5 comes from the remaining possible results of the comparisons with the letter ranges. A character can be either (1) less than A, (2) an upper-case letter between A and Z, (3) greater than Z but less than a, (4) a lower-case letter between a and z, or (5), greater than z. This corresponds to treating the two parts of the AND conditions in the rot13_char function as separate branches. Note that the number of paths is still less than the 7 you would expect by looking at the control-flow graph, because some of those paths are infeasible: if a character is less than A, it's guaranteed to be less than a.

Depending on the optimization you use, you might also get results that are the same as described above, except with a ratio and branching factor of 3. This corresponds to a way of compiling the rot13_char function in which the ANDed conditions are each a single branch, so that the three paths through the function correspond to the three return statements.

3. Implementation limitations (25 pts)

Give a worked-out example of a violation of one of these properties, demonstrated with an experiment using the standard KLEE implementation. In other words, construct a program that can fail an assertion when run directly, but not when explored by KLEE, or conversely a program for which KLEE proposes an assertion failure that does not occur in regular execution.

Hint (inspired by an exchange with David Gloe): one rich source of differing behaviors is KLEE's system call model, like the model for read() sketched in Figure 3 of the paper. For instance this model embodies an assumption that if you read twice from the offset in a file, you'll get the same value back. (Since if the file is symbolic, you get the same symbolic bytes.) If you're having trouble imagining how this assumption could ever fail in a real system, you might want to review the concepts of race conditions and time-of-check vs. time-of-use vulnerabilities.

Let's take the hint. Here's a program that reads four bytes from the beginning of a file, waits ten seconds, reads those same bytes again, and asserts that the two values are equal:

#include <sys/types.h>
#include <sys/stat.h>
#include <errno.h>
#include <fcntl.h>
#include <unistd.h>

#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <assert.h>

int main(int argc, char **argv) {
    int fd, x1, x2;
    if (argc != 2) {
        fprintf(stderr, "Usage: reread <input>\n");
        exit(1);
    }
    fd = open(argv[1], O_RDONLY);
    if (fd == -1) {
        fprintf(stderr, "Open failed: %s\n", strerror(errno));
        exit(1);
    }
    read(fd, &x1, sizeof(int));
    sleep(10);
    lseek(fd, 0, SEEK_SET);
    read(fd, &x2, sizeof(int));
    assert(x1 == x2);
    exit(0);
}

On Unix there is no guarantee that the two reads will read the same value. Files aren't automatically locked when a program opens them (and even the locks that Unix does support are usually only advisory), so another program might write to the file in between the reads. Waiting for ten seconds between the reads isn't necessary: it just makes the possibility easier to demonstrate. This sort of unexpected interleaving of operations is a significant source of security vulnerabilities, though the most common instances tend to involve operations on files in directories (which can often be exploited using symbolic links) rather than the contents of files.

KLEE's symbolic model of system calls uses a single symbolic variable for any read of a given byte from a file, so it implicitly assumes that file changes like this can't happen. As such, KLEE will conclude that the assertion cannot be triggered. But you can see that this is a false negative by running the real program and writing to the file during the 10-second wait.

Note that writing to the file from the same program (i.e., in place of the sleep system call) doesn't trigger the same problem, because KLEE's symbolic system call layer updates its model of the file's contents in that case.

Trying to do the write in a separate process with fork(2) is another idea that does expose a limitation of KLEE's system call model, but it's not the reuse of symbolic values. Rather, it exposes that KLEE doesn't implement fork. If you have the POSIX model turned on, fork will always return an error. If you don't have it turned on, the entire KLEE process will fork, which also doesn't have the right behavior because it makes an independent copy of the symbolic state.

4. Reading binary code (30 pts)

First, here's a very literal instruction-by-instruction translation, with comments inline:

/* 00000000 <mystery>: */
/*    0:	55                   	push   %ebp */
/*    1:	57                   	push   %edi */
/*    2:	56                   	push   %esi */
/*    3:	53                   	push   %ebx */
/*    4:	83 ec 04             	sub    $0x4,%esp */
/* These instructions save the callee-saved registers and set up the
   stack frame; we don't need to translate them. */

int mystery1(int arg) {
/*    7:	8b 44 24 18          	mov    0x18(%esp),%eax */
    /* Because we pushed/reserved 20 bytes of stack in the prolog,
       0x18(%esp) corresponds to the first (and only) argument. */
    int eax = arg;

/*    b:	c7 04 24 00 a3 e1 11 	movl   $0x11e1a300,(%esp) */
    /* A local variable on the stack */
    int temp = 0x11e1a300;
    
/*   12:	85 c0                	test   %eax,%eax */
    /* This tests sets ZF iff %eax is 0, SF if it's negative, and
       clears OF. */
/*   14:	7e 36                	jle    4c <mystery+0x4c> */
    /* Thus, the condition "le" corresponds to %eax being signed less
       than or equal to zero. The branch is used to skip over code, so
       we turn it into an "if" with the opposite condition. */
    if (eax > 0) {
/*   16:	be 03 00 00 00       	mov    $0x3,%esi */
	int esi = 3;
/*   1b:	bb 00 e1 f5 05       	mov    $0x5f5e100,%ebx */
	int ebx = 0x5f5e100;
/*   20:	b9 01 00 00 00       	mov    $0x1,%ecx */
	int ecx = 1;

	int edx, ebp;
/*   25:	8d 76 00             	lea    0x0(%esi),%esi */
	/* This instruction has no effect, it's just used for padding
	   because the next instruction is the target of a loop back
	   edge. */
	do {
/*   28:	89 cf                	mov    %ecx,%edi */
	    int edi = ecx;
/*   2a:	89 cd                	mov    %ecx,%ebp */
	    ebp = ecx;
/*   2c:	0f af fe             	imul   %esi,%edi */
	    edi *= esi;
/*   2f:	83 c1 01             	add    $0x1,%ecx */
	    ecx++;
/*   32:	89 da                	mov    %ebx,%edx */
	    edx = ebx;
/*   34:	89 d8                	mov    %ebx,%eax */
	    eax = ebx;
/*   36:	c1 fa 1f             	sar    $0x1f,%edx */
	    /* This is a sign-extending shift that throws away all but
	       the sign bit, so it sets edx to -1 if it was negative
	       and to 0 otherwise. */
	    edx >>= 31;
/*   39:	f7 db                	neg    %ebx */
	    ebx = -ebx;
/*   3b:	83 c6 02             	add    $0x2,%esi */
	    esi += 2;
/*   3e:	0f af f9             	imul   %ecx,%edi */
	    edi *= ecx;
/*   41:	f7 ff                	idiv   %edi */
	    {
		long long dividend = (long long)edx << 32 | eax;
		eax = dividend / edi;
		edx = dividend % edi;
	    }
/*   43:	01 04 24             	add    %eax,(%esp) */
	    temp += eax;
/*   46:	3b 6c 24 18          	cmp    0x18(%esp),%ebp */
/*   4a:	75 dc                	jne    28 <mystery+0x28> */
	} while (ebp != arg);
    }
/*   4c:	8b 04 24             	mov    (%esp),%eax */
    eax = temp;
/*   4f:	83 c4 04             	add    $0x4,%esp */
/*   52:	5b                   	pop    %ebx */
/*   53:	5e                   	pop    %esi */
/*   54:	5f                   	pop    %edi */
/*   55:	5d                   	pop    %ebp */
    /* Matching the pushes and sub at the beginning */
/*   56:	c3                   	ret     */
    /* The calling convention uses eax for the return value. */
    return eax;
}

And here's just the C code:

int mystery1(int arg) {
    int eax = arg;

    int temp = 0x11e1a300;
    
    if (eax > 0) {
	int esi = 3;
	int ebx = 0x5f5e100;
	int ecx = 1;

	int edx, ebp;
	do {
	    int edi = ecx;
	    ebp = ecx;
	    edi *= esi;
	    ecx++;
	    edx = ebx;
	    eax = ebx;
	    edx >>= 31;
	    ebx = -ebx;
	    esi += 2;
	    edi *= ecx;
	    {
		long long dividend = (long long)edx << 32 | eax;
		eax = dividend / edi;
		edx = dividend % edi;
	    }
	    temp += eax;
	} while (ebp != arg);
    }
    eax = temp;
    return eax;
}

This still looks rather complicated, so let's try cleaning up and simplifying it in various ways:

First let's simplify the division. The compiler generated a division with a 64-bit divisor because that's what the hardware supports. But you can see that the high bits of the dividend in edx are just the sign extension of the low bits in eax, so at the C level we can get the same effect with just a 32-bit division. Also the remainder is computed by the instruction but not used, so we can get rid of it.

@@ -8,23 +8,17 @@
 	int ebx = 0x5f5e100;
 	int ecx = 1;
 
-	int edx, ebp;
+	int ebp;
 	do {
 	    int edi = ecx;
 	    ebp = ecx;
 	    edi *= esi;
 	    ecx++;
-	    edx = ebx;
 	    eax = ebx;
-	    edx >>= 31;
 	    ebx = -ebx;
 	    esi += 2;
 	    edi *= ecx;
-	    {
-		long long dividend = (long long)edx << 32 | eax;
-		eax = dividend / edi;
-		edx = dividend % edi;
-	    }
+	    eax /= edi;
 	    temp += eax;
 	} while (ebp != arg);
     }

The two large constants look weird in hexadecimal, but if we convert them to decimal they're just 100 million and 300 million. Let's guess it's not a coincidence that those two numbers are similar.

@@ -1,11 +1,11 @@
 int mystery1(int arg) {
     int eax = arg;
 
-    int temp = 0x11e1a300;
+    int ebx = 100000000;
+    int temp = 3 * ebx;
     
     if (eax > 0) {
 	int esi = 3;
-	int ebx = 0x5f5e100;
 	int ecx = 1;
 
 	int ebp;

The return value has to go into eax at the end, but effectively it looks like temp is the value that will be returned. Since it's formed by adding something on every round of the loop, let's call it total. Also while we're at it, note that the use of eax for the branch is unrelated to the way it's used inside the loop (it's just a copy of the argument), so let's rewrite too.

@@ -1,15 +1,14 @@
 int mystery1(int arg) {
-    int eax = arg;
-
     int ebx = 100000000;
-    int temp = 3 * ebx;
+    int total = 3 * ebx;
     
-    if (eax > 0) {
+    if (arg > 0) {
 	int esi = 3;
 	int ecx = 1;
 
 	int ebp;
 	do {
+	    int eax;
 	    int edi = ecx;
 	    ebp = ecx;
 	    edi *= esi;
@@ -19,9 +18,8 @@
 	    esi += 2;
 	    edi *= ecx;
 	    eax /= edi;
-	    temp += eax;
+	    total += eax;
 	} while (ebp != arg);
     }
-    eax = temp;
-    return eax;
+    return total;
 }

ecx starts at 1 and is incremented on every iteration of the loop, so let's give it the generic integer name i.

@@ -4,19 +4,19 @@
     
     if (arg > 0) {
 	int esi = 3;
-	int ecx = 1;
+	int i = 1;
 
 	int ebp;
 	do {
 	    int eax;
-	    int edi = ecx;
-	    ebp = ecx;
+	    int edi = i;
+	    ebp = i;
 	    edi *= esi;
-	    ecx++;
+	    i++;
 	    eax = ebx;
 	    ebx = -ebx;
 	    esi += 2;
-	    edi *= ecx;
+	    edi *= i;
 	    eax /= edi;
 	    total += eax;
 	} while (ebp != arg);

The divisor for the division is built up in the register edi. We'll call that value divisor instead, and put its construction into a single statement. Note that because i is incremented in the middle of the computation, one of the uses is really i+1. Then we'll move the increment of i down to the end of the loop body so it's not in the way.

@@ -9,16 +9,14 @@
 	int ebp;
 	do {
 	    int eax;
-	    int edi = i;
+	    int divisor = i * esi * (i + 1);
 	    ebp = i;
-	    edi *= esi;
-	    i++;
 	    eax = ebx;
 	    ebx = -ebx;
 	    esi += 2;
-	    edi *= i;
-	    eax /= edi;
+	    eax /= divisor;
 	    total += eax;
+	    i++;
 	} while (ebp != arg);
     }
     return total;

Next, we'll get rid of the use of eax to hold the quotient. Note that we have move the negation of ebx to keep it after its use.

@@ -8,14 +8,11 @@
 
 	int ebp;
 	do {
-	    int eax;
 	    int divisor = i * esi * (i + 1);
 	    ebp = i;
-	    eax = ebx;
-	    ebx = -ebx;
 	    esi += 2;
-	    eax /= divisor;
-	    total += eax;
+	    total += ebx / divisor;
+	    ebx = -ebx;
 	    i++;
 	} while (ebp != arg);
     }

We can eliminate the variable ebp by replacing its one use with i - 1.

@@ -6,15 +6,13 @@
 	int esi = 3;
 	int i = 1;
 
-	int ebp;
 	do {
 	    int divisor = i * esi * (i + 1);
-	    ebp = i;
 	    esi += 2;
 	    total += ebx / divisor;
 	    ebx = -ebx;
 	    i++;
-	} while (ebp != arg);
+	} while (i - 1 != arg);
     }
     return total;
 }

Observe that esi is being incremented by two every time around the loop, which suggests that it has a value related to 2*i. In particular from the way it was initialized, we can see that it's always equal to 2*i + 1. GCC thought the code would run faster keeping it in a separate variable, but I think the code is easier to read if you use the more complex expression.
```
@@ -3,12 +3,10 @@
     int total = 3 * ebx;
     
     if (arg > 0) {
-	int esi = 3;
 	int i = 1;
 
 	do {
-	    int divisor = i * esi * (i + 1);
-	    esi += 2;
+	    int divisor = i * (2*i + 1) * (i + 1);
 	    total += ebx / divisor;
 	    ebx = -ebx;
 	    i++;
```
One thing that definitely still looks weird here is the do-while loop inside the if statement. The point of the do-while construct is to let you write the loop test only after the loop body, so that the loop always executes at least once. But plain while loops are more popular because it's often more easy to reason about loops where the check is at the top: for instance this often easily covers the case where it's not necessary to execute the loop at all. The loop is counting i up to arg: specifically the last iteration is the one on which i is equal to arg. The condition in the if seems related, which suggests maybe we can combine the if and the do-while back into a regular while loop. Normally the following combination:
```
if (cond) { do { ... } while (cond) }
```
would be equivalent to:
```
while (cond) { ... }
```
The problem is that the two conditions arg > 0 and i - 1 != arg look rather different. But maybe they both came from the same condition that the compiler optimized. For the if condition, the compiler knows that the initial value of the loop counter is 1, so 0 might be the optimized version of i - 1. Similarly when we're inside the loop the compiler knows that i-1 can never be greater than arg, because it was initialized to be less than or equal and we stop after the iteration when they're equal. So it looks like both conditions are equivalent to i - 1 < arg, or more idiomatically i <= arg.

However there's a subtle problem lurking here, which you would probably only notice if you were paranoid about integer overflow attacks, or if you were testing the function with the largest possible positive integer (INT_MAX) as an argument. If we write the condition as i <= arg, then when arg is equal to INT_MAX, the loop's exit condition will never be satisfied. (The program won't actually loop forever, because when i loops around back near zero, you'll get a divide by zero crash.) You might think you would be safe if you kept the condition as the less natural-looking i - 1 < arg, since that condition looks like it should still be false when arg is INT_MAX and i is INT_MAX + 1. (2's complement arithmetic is associative, so (x + 1) - 1 = x for all values of x.) Unfortunately, the C compiler is still allowed to "optimize" the condition i - 1 < arg into the condition i <= arg, even though they have the differing behavior we just described. The reason is that the C standard says that overflow of a signed integer causes undefined behavior. This means that a program is entitled to do whatever it wants if an overflow might occur, or equivalently, optimize as if the situation triggering undefined behavior could never occur. It's the programmer's responsibility to ensure that the undefined behavior can never occur.

Undefined behavior has turned out to be a ongoing source of friction between C programmers and compiler makers, since programmers' intuitions about how the compiler works often differ from what the standard allows. In particular problems tend to arise when compilers become more sophisticated at optimization: optimizations that take better advantage of the undefined-behavior rules can make some previously-working programs run faster, and other previously-working programs crash in hard-to-debug ways. John Regehr's blog has some lucid discussions of these issues.

For now, we can work around this behavior by adding an extra condition on i to prevent overflow.
```
@@ -2,15 +2,12 @@
     int ebx = 100000000;
     int total = 3 * ebx;
     
-    if (arg > 0) {
 	int i = 1;
-
-	do {
+    while (i < 0x7fffffff && i <= arg) {
 	    int divisor = i * (2*i + 1) * (i + 1);
 	    total += ebx / divisor;
 	    ebx = -ebx;
 	    i++;
-	} while (i - 1 != arg);
     }
     return total;
 }
```

Though actually it would be even more idiomatic to write this as a for loop.

@@ -2,12 +2,11 @@
     int ebx = 100000000;
     int total = 3 * ebx;
     
-    int i = 1;
-    while (i < 0x7fffffff && i - 1 < arg) {
+    int i;
+    for (i = 1; i < 0x7fffffff && i - 1 < arg; i++) {
 	int divisor = i * (2*i + 1) * (i + 1);
 	total += ebx / divisor;
 	ebx = -ebx;
-	i++;
     }
     return total;
 }

This code now looks pretty good, though it still doesn't compile to the same instructions as the mystery instructions; in particular there's that weird condition to prevent overflow that wasn't present in the binary. In fact both of these problems can be solved at once, though it requires a somewhat non-obvious transformation: shifting the index i of the loop. You might have noticed it's a little bit more natural in C to use 0 rather than 1 as the starting index of a loop, and to compare to the bound using a strict comparison. We can achieve this by using a new index j which is related to the old index by j = i - 1 and i = j + 1. This will make the bounds of the for loop look more natural, and most importantly it will avoid the overflow problem in the extreme case when the argument in INT_MAX. The trade-off is that the formula for the divisor will look somewhat more complicated.
```
@@ -2,9 +2,9 @@
     int ebx = 100000000;
     int total = 3 * ebx;
     
-    int i;
-    for (i = 1; i < 0x7fffffff && i - 1 < arg; i++) {
-	int divisor = i * (2*i + 1) * (i + 1);
+    int j;
+    for (j = 0; j < arg; j++) {
+	int divisor = (j + 1) * (2*j + 3) * (j + 2);
 	total += ebx / divisor;
 	ebx = -ebx;
     }
```
We've now managed to match the original binary code. But what does this function do, anyway? From the code, it appears to be computing the sum of some sort of alternating series. The series probably isn't a familiar one, but if you examine the function's output for an input of 1000 or so, the output of 314159263 is suggestive: it looks a lot like π times ebx. And in fact that is what the series converges to mathematically; this series was discovered by the Indian mathematician Nilakantha in the 15th century. Based on this understanding, we might rename the argument to terms; a good name for the variable ebx is more elusive, but let's call it unit. Here's our final decompilation:
```
int mystery1(int terms) {
    int unit = 100000000;
    int total = 3 * unit;
    
    int j;
    for (j = 0; j < terms; j++) {
	int divisor = (j + 1) * (2*j + 3) * (j + 2);
	total += unit / divisor;
	unit = -unit;
    }
    return total;
}
```
For comparison, here's the code as the instructor originally wrote it:
```
int mystery(int steps) {
    int m = 100000000;
    int sum = 3*m;
    int i;
    for (i = 0; i < steps; i++) {
	int denom = (i+1) * (2*i+3) * (i+2);
	int change = m / denom;
	sum += change;
	m = -m;
    }
    return sum;
}
```