Guideline: Test Ideas for Method Calls

Introduction

Here's an example of defective code:

File file = new File(stringName);
file.delete();

The defect is that File.delete can fail, but the code doesn't check for that. Fixing it requires the addition of the italicized code shown here:

File file = new File(stringName);



if (file.delete()


== false) {...}

This guideline describes a method for detecting cases where your code does not handle the result of calling a method. (Note that it assumes that the method called produces the correct result for whatever inputs you give it. That's something that should be tested, but creating test ideas for the called method is a separate task. That is, it's not your job to test File.delete.)

The key notion is that you should create a test idea for each distinct unhandled relevant result of a method call. To define that term, let's first look at result. When a method executes, it changes the state of the world. Here are some examples:

It might push return values on the runtime stack.
It might throw an exception.
It might change a global variable.
It might update a record in a database.
It might send data over the network.
It might print a message to standard output.

Now let's look at relevant, again using some examples.

Suppose the method being called prints a message to standard output. That "changes the state of the world", but it cannot affect the further processing of this program. No matter what gets printed, even nothing at all, it can't affect the execution of your code.
If the method returns true for success and false for failure, your program very likely should branch based on the result. So that return value is relevant.
If the called method updates a database record that your code later reads and uses, the result (updating the record) is relevant.

(There's no absolute line between relevant and irrelevant. By calling print, your method might cause buffers to be allocated, and that allocation might be relevant. It's conceivable that a defect might depend on whether and what buffers were allocated. It's conceivable, but is it at all plausible?)

A method might often have a very large number of results, but only some of them will be distinct. For example, consider a method that writes bytes to disk. It might return a number less than zero to indicate failure; otherwise, it returns the number of bytes written (which might be fewer than the number requested). The large number of possibilities can be grouped into three distinct results:

a number less than zero.
the number written equals the number requested
some bytes were written, but less than the number requested.

All the values less than zero are grouped into one result because no reasonable program will make a distinction among them. All of them (if, indeed, more than one is possible) should be treated as an error. Similarly, if the code requested that 500 bytes be written, it doesn't matter if 34 were actually written or 340: the same thing will probably be done with the unwritten bytes. (If something different should be done for some value, such as 0, that will form a new distinct result.)

There's one last word in the defining term to explain. This particular testing technique is not concerned with distinct results that are already handled. Consider, again, this code:

File file = new File(stringName);
if (file.delete() == false) {...}

There are two distinct results (true and false). The code handles them. It might handle them incorrectly, but test ideas from Guideline: Test Ideas for Booleans and Boundaries will check that. This test technique is concerned with distinct results that are not specifically handled by distinct code. That might happen for two reasons: you thought the distinction was irrelevant, or you simply overlooked it. Here's an example of the first case:

result = m.method();
switch (result) {
    case FAIL:
    case CRASH:
       ...
       break;
    case DEFER:
       ...
       break;
    default:
       ...
       break;
}

FAIL CRASH are handled by the same code. It might be wise to check that that's really appropriate. Here's an example of an overlooked distinction:

result = s.shutdown();
if (result == PANIC) {
   ...
} else {
   // success! Shut down the reactor.
   ...
}

It turns out that shutdown can return an additional distinct result: RETRY. The code as written treats that case the same as the success case, which is almost certainly wrong.

Finding test ideas

So your goal is to think of those distinct relevant results you previously overlooked. That seems impossible: why would you realize they're relevant now if you didn't earlier?

The answer is that a systematic re-examination of your code, when in a testing frame of mind and not a programming frame of mind, can sometimes cause you to think new thoughts. You can question your own assumptions by methodically stepping through your code, looking at the methods you call, rechecking their documentation, and thinking. Here are some cases to watch for.

"Impossible" cases

Often, it will appear that error returns are impossible. Doublecheck your assumptions.

This example shows a Java implementation of a common Unix idiom for handling temporary files.

File file = new File("tempfile");
FileOutputStream s;
try {
    // open the temp file.
    s = new FileOutputStream(file);
} catch (IOException e) {...}
// Make sure temp file will be deleted
file.delete();

The goal is to make sure that a temporary file is always deleted, no matter how the program exits. You do this by creating the temporary file, then immediately deleting it. On Unix, you can continue to work with the deleted file, and the operating system takes care of cleaning up when the process exits. A not-painstaking Unix programmer might not write the code to check for a failed deletion. Since she just successfully created the file, she must be able to delete it.

This trick doesn't work on Windows. The deletion will fail because the file is open. Discovering that fact is hard: as of August 2000, the Java documentation did not enumerate the situations in which delete could fail; it merely says that it can. But-perhaps-when in "testing mode", the programmer might question her assumption. Since her code is supposed to be "write once, run everywhere", she might ask a Windows programmer when File.delete fails on Windows and so discover the awful truth.

"Irrelevant" cases

Another force against noticing a distinct relevant value is being already convinced that it doesn't matter. A Java Comparator's compare method returns either a number <0, 0, or a number >0. Those are three distinct cases that might be tried. This code lumps two of them together:

void allCheck(Comparator c) {
   ...
   if (c.compare(o1, o2) <= 0) {
      ...
   } else {
      ...
   }

But that might be wrong. The way to discover whether it is or not is to try the two cases separately, even if you really believe it will make no difference. (Your beliefs are really what you're testing.) Note that you might be executing the then case of the if statement more than once for other reasons. Why not try one of them with the result less than 0 and one with the result exactly equal to zero?

Uncaught exceptions

Exceptions are a kind of distinct result. By way of background, consider this code:

void process(Reader r) {
   ...
   try {
      ...
      int c = r.read();
      ...
   } catch (IOException e) {
      ...
   }
}

You'd expect to check whether the handler code really does the right thing with a read failure. But suppose an exception is explicitly unhandled. Instead, it's allowed to propagate upward through the code under test. In Java, that might look like this:

void process(Reader r) 


throws IOException {
    ...
    int c = r.read();
    ...
}

This technique asks you to test that case even though the code explicitly doesn't handle it. Why? Because of this kind of fault:

void process(Reader r) throws IOException {
    ...
    


Tracker.hold(this);
    ...
    int c = r.read();
    ...
    


Tracker.release(this);
    ...
}

Here, the code affects global state (through Tracker.hold). If the exception is thrown, Tracker.release will never be called.

(Notice that the failure to release will probably have no obvious immediate consequences. The problem will most likely not be visible until process is called again, whereupon the attempt to hold the object for a second time will fail. A good article about such defects is Keith Stobie's "Testing for Exceptions". (Get Adobe Reader))

Undiscovered faults

This particular technique does not address all defects associated with method calls. Here are two kinds that it's unlikely to catch.

Incorrect arguments

Consider these two lines of C code, where the first line is wrong and the second line is correct.

... strncmp(s1, s2, strlen(s1)) ...
... strncmp(s1, s2, strlen(


s2)) ...

strncmp compares two strings and returns a number less than 0 if the first one is lexicographically less than the second (would come earlier in a dictionary). It returns a "0" if they are equal. It returns and a number greater than 0 if the first one is lexicographically larger. However, it only compares the number of characters given by the third argument. The problem is that the length of the first string is used to limit the comparison, whereas it should be the length of the second.

This technique would require three tests, one for each distinct return value. Here are three you could use:

s1	s2	expected result	actual result
"a"	"bbb"	<0	<0
"bbb"	"a"	>0	>0
"foo"	"foo"	=0	=0

The defect is not discovered because nothing in this technique forces the third argument to have any particular value. What's needed is a test case like this:

s1	s2	expected result	actual result
"foo"	"food"	<0	=0

While there are techniques suitable for catching such defects, they are seldom used in practice. Your testing effort is probably better spent on a rich set of tests that targets many types of defects (and that you hope catches this type as a side effect).

Indistinct results

There's a danger that comes when you're coding - and testing - method-by-method. Here's an example. There are two methods. The first, connect, wants to establish a network connection:

void connect() {
   ...
   Integer portNumber = serverPortFromUser();
   if (portNumber == null) {
      // pop up message about invalid port number
      return;
   }

When it needs a port number it calls serverPortFromUser. That method returns two distinct values. It returns a port number chosen by the user if the number chosen is valid (1000 or greater). Otherwise, it returns null. If null is returned, the code under test pops up an error message and quits.

When connect was tested, it worked as intended: a valid port number caused a connection to be established, and an invalid one led to a popup.

The code to serverPortFromUser is a bit more complicated. It first pops up a window that asks for a string and has the standard OK and CANCEL buttons. Based on what the user does, there are four cases:

If the user types a valid number, that number is returned.
If the number is too small (less than 1000), null is returned (so the message about invalid port number will be displayed).
If the number is misformatted, null is again returned (and the same message is appropriate).
If the user clicks CANCEL, null is returned.

This code also works as intended.

The combination of the two chunks of code, though, has a bad consequence: the user presses CANCEL and gets a message about an invalid port number. All the code works as intended, but the overall effect is still wrong. It was tested in a reasonable way, but a defect was missed.

The problem here is that null is one result that represents two distinct meanings ("bad value" and "user cancelled"). Nothing in this technique forces you to notice that problem with the design of serverPortFromUser.

Testing can help, though. When serverPortFromUser is tested in isolation - just to see if it returns the intended value in each of those four cases - the context of use is lost. Instead, suppose it were tested using connect. There would be four tests that would exercise both of the methods simultaneously:

input	expected result	thought process
user types "1000"	connection to port 1000 is opened	serverPortFromUser returns a number, which is used.
user types "999"	popup about invalid port number	serverPortFromUser returns null, which leads to popup
user types "i99"	popup about invalid port number	serverPortFromUser returns null, which leads to popup
users clicks CANCEL	whole connection process should be cancelled	serverPortFromUser returns null, hey wait a minute that doesn't make sense...

As is often the case, testing in a larger context reveals integration problems that escape small-scale testing. And, as is also often the case, careful thought during test design reveals the problem before the test is run. (But if the defect isn't caught then, it will be caught when the test is run.)