Running Tests

Functions are objects you can save in data structures or pass to other functions.
Python stores local and global variables in dictionary-like structures.
A unit test performs an operation on a fixture and passes, fails, or produces an error.
A program can use introspection to find functions and other objects at runtime.

Terms defined: actual result (of test), assertion, dynamic typing, error (result of test), exception, expected result (of test), failure (result of test), fixture, global, local, pass (result of test), pretty print, raise (an exception), register (in code), scope, unit test

Not all software needs rigorous testing: for example, it’s OK to check a one-off data analysis script by looking at the output of each stage as we add it. But we should all be grateful that 98% of the lines of code in the SQLite database are there to make the other 2% always do the right thing.

The examples in this book lie somewhere between these two extremes. Together, they are over 7000 lines long; to make sure they work correctly, we wrote several hundred unit tests using pytest. We used this framework because it makes tests easier to write, and because it runs them in a reliable, repeatable way [Meszaros2007, Aniche2022]. Understanding how tools like this work will help you use them more effectively, and will reinforce one of the big ideas of this book: programs are just another kind of data.

Storing and Running Tests

As we said in Chapter 2, a function is just an object that we can assign to a variable. We can also store them in lists just like numbers or strings (Figure 6.1):

def first():
    print("First")

def second():
    print("Second")

def third():
    print("Third")

everything = [first, second, third]
for func in everything:
    func()

First
Second
Third

However, we have to be able to call the functions in the same way in order for this trick to work, which means they must have the same signature:

def zero():
    print("zero")

def one(value):
    print("one", value)

for func in [zero, one]:
    func()

zero
Traceback (most recent call last):
  File "/sdx/test/signature.py", line 8, in <module>
    func()
TypeError: one() missing 1 required positional argument: 'value'

Now suppose we have a function we want to test:

def sign(value):
    if value < 0:
        return -1
    else:
        return 1

and some functions that test it (two of which contain deliberate errors):

def test_sign_negative():
    assert sign(-3) == -1

def test_sign_positive():
    assert sign(19) == 1

def test_sign_zero():
    assert sign(0) == 0

def test_sign_error():
    assert sgn(1) == 1

Each test does something to a fixture (such as the number 19) and uses assertions to compare the actual result against the expected result. The outcome of each test can be:

Pass: the test subject works as expected.
Fail: something is wrong with the test subject.
Error: something is wrong in the test itself, which means we don’t know if the thing we’re testing is working properly or not.

We can implement this classification scheme as follows:

If a test function completes without raising any kind of exception, it passes. (We don’t care if it returns something, but by convention tests don’t return a value.)
If the function raises an AssertionError exception, then the test has failed. Python’s assert statement does this automatically when the condition it is checking is false, so almost all tests use assert for checks.
If the function raises any other kind of exception, then we assume the test itself is broken and count it as an error.

Translating these rules into code gives us the function run_tests that runs every test in a list and counts how many outcomes of each kind it sees:

def run_tests(all_tests):
    results = {"pass": 0, "fail": 0, "error": 0}
    for test in all_tests:
        try:
            test()
            results["pass"] += 1
        except AssertionError:
            results["fail"] += 1
        except Exception:
            results["error"] += 1
    print(f"pass {results['pass']}")
    print(f"fail {results['fail']}")
    print(f"error {results['error']}")

We use run_tests by putting all of our test functions into a list and passing that to the test runner:

TESTS = [
    test_sign_negative,
    test_sign_positive,
    test_sign_zero,
    test_sign_error
]

run_tests(TESTS)

pass 2
fail 1
error 1

Independence

Our function runs tests in the order they appear in the list. The tests should not rely on that: every unit test should work independently so that an error or failure in an early test doesn’t affect other tests’ behavior.

Finding Functions

Making lists of functions is clumsy and error-prone: sooner or later we’ll add a function to TESTS twice or forget to add it at all. We’d therefore like our test runner to find tests for itself, which it can do by exploiting the fact that Python stores variables in a structure similar to a dictionary.

Let’s run the Python interpreter and call the globals function. To make its output easier to read, we will pretty-print it using Python’s pprint module:

import pprint
pprint.pprint(globals())

{'__annotations__': {},
 '__builtins__': <module 'builtins' (built-in)>,
 '__cached__': None,
 '__doc__': None,
 '__file__': '/sdx/test/globals.py',
 '__loader__': <_frozen_importlib_external.SourceFileLoader object \
at 0x109d65290>,
 '__name__': '__main__',
 '__package__': None,
 '__spec__': None,
 'pprint': <module 'pprint' from \
'/sdx/conda/envs/sdxpy/lib/python3.11/pprint.py'>}

As the output shows, globals is a dictionary containing all the variables in the program’s global scope. Since we just started the interpreter, all we see are the variables that Python defines automatically. (By convention, Python uses double underscores for names that mean something special to it.)

What happens when we define a variable of our own?

import pprint
my_variable = 123
pprint.pprint(globals())

{'__annotations__': {},
 '__builtins__': <module 'builtins' (built-in)>,
 '__cached__': None,
 '__doc__': None,
 '__file__': '/sdx/test/globals_plus.py',
 '__loader__': <_frozen_importlib_external.SourceFileLoader object \
at 0x108039290>,
 '__name__': '__main__',
 '__package__': None,
 '__spec__': None,
 'my_variable': 123,
 'pprint': <module 'pprint' from \
'/sdx/conda/envs/sdxpy/lib/python3.11/pprint.py'>}

Sure enough, my_variable is now in the dictionary.

If function names are just variables and a program’s variables are stored in a dictionary, we can loop over that dictionary to find all the functions whose names start with test_:

def find_tests(prefix):
    for (name, func) in globals().items():
        if name.startswith(prefix):
            print(name, func)

find_tests("test_")

test_sign_negative <function test_sign_negative at 0x105bcd440>
test_sign_positive <function test_sign_positive at 0x105bcd4e0>
test_sign_zero <function test_sign_zero at 0x105bcd580>
test_sign_error <function test_sign_error at 0x105bcd620>

The hexadecimal numbers in the output show where each function object is stored in memory, which isn’t particularly useful unless we’re extending the language, but at least it doesn’t take up much space on the screen.

Having a running program find things in itself like this is called introspection, and is the key to many of the designs in upcoming chapters. Combining introspection with the pass-fail-error pattern of the previous section gives us something that finds test functions, runs them, and summarizes their results:

def run_tests():
    results = {"pass": 0, "fail": 0, "error": 0}
    for (name, test) in globals().items():
        if not name.startswith("test_"):
            continue
        try:
            test()
            results["pass"] += 1
        except AssertionError:
            results["fail"] += 1
        except Exception:
            results["error"] += 1
    print(f"pass {results['pass']}")
    print(f"fail {results['fail']}")
    print(f"error {results['error']}")

pass 2
fail 1
error 1

We could add many more features to this (and pytest does), but almost every modern test runner uses this design.

Summary

When reviewing the ideas introduced in this chapter (Figure 6.2), it’s worth remembering Clarke’s Third Law, which states that any sufficiently advanced technology is indistinguishable from magic. The same is true of programming tricks like introspection: the code that finds tests dynamically seems transparent to an expert who understands that code is data, but can be incomprehensible to a novice. As we said in the discussion of comprehension curves in Chapter 1, no piece of software can be optimal for both audiences; the only solution to this problem is education, which is why books like this one exist. Please see Appendix B for extra material related to these ideas.

Concept map of test runner — Figure 6.2: Concept map.

Exercises

Looping Over `globals`

What happens if you run this code?

for name in globals():
    print(name)

What happens if you run this code instead?

name = None
for name in globals():
    print(name)

Why are the two different?

Individual Results

Modify the test framework so that it reports which tests passed, failed, or had errors and also reports a summary of how many tests produced each result.
Write unit tests to check that your answer works correctly.

Setup and Teardown

Testing frameworks often allow programmers to specify a setup function that is to be run before each test and a corresponding teardown function that is to be run after each test. (setup usually recreates complicated test fixtures, while teardown functions are sometimes needed to clean up after tests, e.g., to close database connections or delete temporary files.)

Modify the testing tool in this chapter so that if a file of tests contains a function called setup then the tool calls it exactly once before running each test in the file. Add a similar way to register a teardown function.

Timing Tests

Modify the testing tool so that it records how long it takes to run each test. (The function time.time may be useful.)

Selecting Tests

Modify the testing tool so that if a user provides -s pattern or --select pattern on the command line then the tool only runs tests that contain the string pattern in their name.

Finding Functions

Python is dynamically typed, which means it checks the types of values as code runs. We can do this ourselves using the type function, which shows that 3 is an integer:

print(type(3))

<class 'int'>

or that a function is a function:

def example():
    pass

print(type(example))

<class 'function'>

However, built-in functions have a different type:

print(type(len))

<class 'builtin_function_or_method'>

so it’s safer to use callable to check if something can be called:

def example():
    pass

print(callable(example), callable(len))

True True

Modify the test runner in this chapter so that it doesn’t try to call things whose names start with test_ but which aren’t actually functions.
Should the test runner report these cases as errors?

Local Variables

Python has a function called locals that returns all the variables defined in the current local scope.

Predict what the code below will print before running it. When does the variable i first appear and is it still there in the final line of output?
Run the code and compare your prediction with its behavior.

def show_locals(low, high):
    print(f"start: {locals()}")
    for i in range(low, high):
        print(f"loop {i}: {locals()}")
    print(f"end: {locals()}")

show_locals(1, 3)

Running Tests

Storing and Running Tests

Independence

Finding Functions

Summary

Exercises

Looping Over globals

Individual Results

Setup and Teardown

Timing Tests

Selecting Tests

Finding Functions

Local Variables

Looping Over `globals`