CSci 8980-1:

Program Analysis for Security

Symbolic execution, part 1

Patrice Godefroid, Nils Klarlund, and Koushik Sen. “DART: directed automated random testing”. In Programming Language Design and Implementation (PLDI), pages 213-223, Chicago, IL, USA, June 2005.
[ACM]

Cristian Cadar, Daniel Dunbar, and Dawson Engler. “KLEE: Unassisted and automatic generation of high-coverage tests for complex systems programs”. In Operating Systems Design and Implementation (OSDI), pages 209-223, San Diego, CA, USA, December 2008.
[USENIX]

Question: In section 5.5, the KLEE paper discusses a technique for using symbolic execution to check the equivalence of two functions f1 and f2 that are supposed to implement the same behavior. They make a combined program that looks like "y1 = f1(x); y2 = f2(x); assert(y1 == y2);" and run it with x symbolic. If the symbolic execution terminates without finding a way to trigger the assertion, the two functions must be equivalent.

Your classmate Ben thinks that this approach seems slower than it needs to be, since the number of paths explored in the combined program might be the product of the number of paths through each function individually. Instead he proposes the following approach: for each of the two functions separately, run the symbolic execution tool to create a test suite with one test for each feasible path. Then combine the sets of tests generated for the two functions, and for each test, check that whether it gives the same result for both implementations.

What do you think of Ben's scheme? Is it guaranteed to find any differences between the two functions, in the same way the approach in the KLEE paper is, or could it miss some? If you think Ben's scheme will work, explain why. If not, give an example of two implementations and the corresponding test cases in which it can miss a difference.

Historic

James C. King. “Symbolic execution and program testing”. Communications of the ACM, 19(7):385-394, July 1976.
[ACM]

The first work on symbolic execution occurred in the 1970s, such as in this paper by King.

Cristian Cadar and Dawson R. Engler. “Execution generated test cases: How to make systems code crash itself”. In SPIN Workshop on Model Checking Software, pages 2-23, San Francisco, CA, USA, August 2005.
[Springer]

Cristian Cadar, Vijay Ganesh, Peter M. Pawlowski, David L. Dill, and Dawson R. Engler. “EXE: automatically generating inputs of death”. In ACM Conference on Computer and Communications Security (CCS), pages 322-335, Alexandria, VA, USA, November 2006.
[ACM]

Koushik Sen, Darko Marinov, and Gul Agha. “CUTE: a concolic unit testing engine for C”. In European Software Engineering Conference held jointly with Foundations of Software Engineering (ESEC/FSE), pages 263-272, Lisbon, Portugal, September 2005.
[ACM]

There was a renaissance of interest in symbolic execution starting in 2005, exemplified by the DART paper in the main readings and these three others. I would argue this change was more of a matter of realizing the power of previously-described techniques, than particular fundamental advances. The DART and CUTE papers emphasize the importance of basing symbolic execution on concrete executions, but the EGT and EXE systems got similarly good results without this technique. Conversely the EXE paper emphasizes the importance of a good decision procedure, but DART and CUTE used only linear constraints.