Simple Software Verification, Part 1: Execution Tables
[hamzh_toggle title=”Series Parts” state=”closed”]
(Image by Manfred Mohr)
If you’re a software developer, and you work somewhere even remotely sane, then chances are good you employ some form of automated testing in your day-to-day practice. Automated testing is a wonderful and extremely valuable tool, but there is nevertheless something slightly fanatical and suspicious about the claims of those who prescribe it as the strategy for managing software quality. Namely, experience teaches us that even the most well-tested software, when exposed to a large enough set of users, will still have swarms of defects (aka. bugs).
The typical response to this situation is either world-weary resignation or a hopeless promise to “try harder” next time, to “write more and better tests”. But, inevitably, neither of these fix the problem.
The truth of the matter is that even experienced software engineers inject 80 – 125 defects for every 1000 lines of code they write. Even worse, techniques like TDD tend to hide this fact from us by smearing together testing and coding in a way that can make us feel over-confident in the results we’re producing. In reality, until machines replace us, we are stuck with fallible minds and hands, and that fallibility will inevitably creep into our code.
Given the above, we should instead say: knowing that I will inevitably have to find and fix a lot of defects, what is the quickest and most effective means of doing so? Automated testing is the tempting answer, but it’s problematic for two main reasons:
First, tests are expensive to write. Each test can only check one set of operating conditions, so often many tests are required to provide even a reasonable assurance that a component works as expected. Tests must then be maintained for the lifetime of the code they test. And, even with a suite of tests, when a test fails it produces only a symptom. Finding and fixing the root cause of that symptom is often quick, but occasionally it can take hours or even days. Once you start bringing in external services and components, testing becomes increasingly more expensive, and increasingly difficult to manage in a way that actually provides confidence.
Second, given human fallibility, any one verification technique will miss some errors. We’ve all experienced writing a suite of tests and feeling confident in the code tested, only to have allowed a bug to escape into production because we missed a trivial additional check. A more useful frame is to think in terms of verification rather than testing. Automated testing is one method of verification. We can think of the overall verification process as a series of filters, where each method of verification leaves a certain percentage of defects remaining. If we have only one verification method, we only have one chance to reduce the number of defects.
[hamzh_toggle title=”Verification As A Series of Filters” state=”closed”] To elaborate let’s do some math. We’ll say that your unit-tests find approximately 70% of all the defects in your programs. Your team also does code reviews on GitHub using pull requests, but they’re pretty informal so they’re only really about 20% effective. If you wrote 600 LOC we would expect you to have about 48 defects. Testing alone would find 34 of those, leaving 14 for the next phase. If you stopped here, you’d still have 14 bugs in your program. If you did an informal code review, you’d find about 3 more defects, leaving 11 defects in the final program.
Given this information you might consider formalizing your review process in order to try to catch even more bugs before they went into production. The more filters you add, the better off you’ll be. [/hamzh_toggle]
So what is the alternative? One answer is to review your code and perform some basic formal verification. It is with an eye towards this approach that I’d like to present a series of articles showcasing some software verification techniques you can add to your arsenal. The goal is to equip you with additional tools you can use when reasoning about programs or for those occasions where you’d really, really, really like to be sure your program does what it’s supposed to. In addition, you might find these techniques useful in your day-to-day work or for handling situations where testing may either not be possible or not be appropriate. Without further ado…
Symbolic Execution Tables
The first technique we’ll look at are execution tables. These allow you to do a dry-run of some code without actually executing it. Essentially, you construct a table of one linear pass through a section of code, and at the end you get the state of all the relevant variables. The bare-bones table structure is as follows:
#, Instruction, Var1, Var2, Var3, …
,Initial Values, A, B, C, …
1, CODE LINE 1, , , , …
, Final Values,,,,
The general process for constructing an execution table is:
- Create a table like the one above, providing a column for each variable used in the coe.
- Set the initial values of each variable.
- For each line of code, write the new values for any variables it modifies.
- At the end copy the most recent value from each column to get the final value of each variable.
We’ll start with a simple linear example containing neither loops nor branches:
def substitution(v1, v2, v3, v4): v1 = v2 v2 = v4 v3 = v1 v4 = v3
To verify this with an execution table we construct the table as follows:
#, Instruction, v1, v2, v3, v4
,Initial Values, A, B, C, D
1,v1 = v2, B,,,
2,v2 = v4,,D,,
3,v3 = v1,,,B,
3,v4 = v3,,,,B
To show how this works when analyzing a branching statement, let’s examine appending to a doubly linked list:
def append(self, data): new_node = Node(data, None, None) if self.head is None: self.head = self.tail = new_node else: new_node.prev = self.tail new_node.next = None self.tail.next = new_node self.tail = new_node
The if statement tells us there are two main cases we’re interested in here. We can first test the condition where the if statement evaluates to true, that is when we’re inserting a brand new node:
#, Instruction, Test, data, new_node, head, tail
,Initial Values, , A, None, None, None
1,new_node = Node(…),,,B,,
2,if self.head is None,T,,,,
3,self.head = self.tail = new_node,,,,B,B
Checking the final values, we see that things came out as expected in this case. Now let’s examine the case where the list is already populated, we’ll let nn stand in for new_node, p stand for prev, and n for next to save some space:
#, Instruction, Test, data, nn, nn.p, nn.n, head, tail, tail.n
,Initial Values, , A, -, -, -, B, C, D
1,new_node = Node(…),,,E,None,None,,,
2,if self.head is None,F,,,,,,,
3,new_node.prev = self.tail,,,,C,,,,
4,new_node.next = None,,,,,None,,,
5,self.tail.next = new_node,,,,,,,,E
6,self.tail = new_node,,,,,,,E,
While things get a little strange with references here, we can still easily verify that the old tail points to the new node, the new node gets a link to the old tail, and the new node finally becomes the tail of the list at the end.
Loops require a bit more work to handle properly. Namely, since every execution table represents a single linear run, you have to create a separate execution table for each iteration through the loop. Let’s use the following simple chunk of code
def fib(): i, j = 1, 1 while True: temp = j j += i i = temp
We now construct a new table for our first iteration through the loop:
#, Instruction, Test, i, j, temp
,Initial Values,, 1, 1, –
1, while True, True,,,
2, temp = j,,,,1
3, j += i,,,2,
4,i = temp,,1,,
Now we construct a new table for the second loop:
#, Instruction, Test, i, j, temp
,Initial Values,, 1, 2, 1
1, while True, True,,,
2, temp = j,,,,2
3, j += i,,,3,
4,i = temp,,2,,
We can continue producing tables until we are satisfied that the program works as expected, and we get easy insight into the final results at the end of each loop. Extremely useful for verifying logic by hand.
I hope I’ve demonstrated that symbolic execution can be a powerful and simple tool in the tool chest. Its main appeal is that it provides a structured and theoretically sound way of verifying your programs. I’ve personally used these to step through thorny code with co-workers, and I’ve uncovered many bugs with it in this way (for example, a conditional that accidentally always evaluated to true). If you’re interested in exploring these techniques further, I highly recommend: