Running Tests
- Functions are objects you can save in data structures or pass to other functions.
- Python stores local and global variables in dictionary-like structures.
- A unit test performs an operation on a fixture and passes, fails, or produces an error.
- A program can use introspection to find functions and other objects at runtime.
Terms defined: actual result (of test), assertion, dynamic typing, error (result of test), exception, expected result (of test), failure (result of test), fixture, global, local, pass (result of test), pretty print, raise (an exception), register (in code), scope, unit test
Not all software needs rigorous testing: for example, it’s OK to check a one-off data analysis script by looking at the output of each stage as we add it. But we should all be grateful that 98% of the lines of code in the SQLite database are there to make the other 2% always do the right thing.
The examples in this book lie somewhere between these two extremes. Together, they are over 7000 lines long; to make sure they work correctly, we wrote several hundred unit tests using pytest. We used this framework because it makes tests easier to write, and because it runs them in a reliable, repeatable way [Meszaros2007, Aniche2022]. Understanding how tools like this work will help you use them more effectively, and will reinforce one of the big ideas of this book: programs are just another kind of data.
Storing and Running Tests
As we said in Chapter 2, a function is just an object that we can assign to a variable. We can also store them in lists just like numbers or strings (Figure 6.1):
def first():
print("First")
def second():
print("Second")
def third():
print("Third")
everything = [first, second, third]
for func in everything:
func()
First
Second
Third
However, we have to be able to call the functions in the same way in order for this trick to work, which means they must have the same signature:
def zero():
print("zero")
def one(value):
print("one", value)
for func in [zero, one]:
func()
zero
Traceback (most recent call last):
File "/sdx/test/signature.py", line 8, in <module>
func()
TypeError: one() missing 1 required positional argument: 'value'
Now suppose we have a function we want to test:
def sign(value):
if value < 0:
return -1
else:
return 1
and some functions that test it (two of which contain deliberate errors):
def test_sign_negative():
assert sign(-3) == -1
def test_sign_positive():
assert sign(19) == 1
def test_sign_zero():
assert sign(0) == 0
def test_sign_error():
assert sgn(1) == 1
Each test does something to a fixture (such as the number 19) and uses assertions to compare the actual result against the expected result. The outcome of each test can be:
-
Pass: the test subject works as expected.
-
Fail: something is wrong with the test subject.
-
Error: something is wrong in the test itself, which means we don’t know if the thing we’re testing is working properly or not.
We can implement this classification scheme as follows:
-
If a test function completes without raising any kind of exception, it passes. (We don’t care if it returns something, but by convention tests don’t return a value.)
-
If the function raises an
AssertionError
exception, then the test has failed. Python’sassert
statement does this automatically when the condition it is checking is false, so almost all tests useassert
for checks. -
If the function raises any other kind of exception, then we assume the test itself is broken and count it as an error.
Translating these rules into code gives us the function run_tests
that runs every test in a list
and counts how many outcomes of each kind it sees:
def run_tests(all_tests):
results = {"pass": 0, "fail": 0, "error": 0}
for test in all_tests:
try:
test()
results["pass"] += 1
except AssertionError:
results["fail"] += 1
except Exception:
results["error"] += 1
print(f"pass {results['pass']}")
print(f"fail {results['fail']}")
print(f"error {results['error']}")
We use run_tests
by putting all of our test functions into a list
and passing that to the test runner:
TESTS = [
test_sign_negative,
test_sign_positive,
test_sign_zero,
test_sign_error
]
run_tests(TESTS)
pass 2
fail 1
error 1
Independence
Our function runs tests in the order they appear in the list. The tests should not rely on that: every unit test should work independently so that an error or failure in an early test doesn’t affect other tests’ behavior.
Finding Functions
Making lists of functions is clumsy and error-prone:
sooner or later we’ll add a function to TESTS
twice
or forget to add it at all.
We’d therefore like our test runner to find tests for itself,
which it can do by exploiting the fact that
Python stores variables in a structure similar to a dictionary.
Let’s run the Python interpreter and call the globals
function.
To make its output easier to read,
we will pretty-print it
using Python’s pprint
module:
import pprint
pprint.pprint(globals())
{'__annotations__': {},
'__builtins__': <module 'builtins' (built-in)>,
'__cached__': None,
'__doc__': None,
'__file__': '/sdx/test/globals.py',
'__loader__': <_frozen_importlib_external.SourceFileLoader object \
at 0x109d65290>,
'__name__': '__main__',
'__package__': None,
'__spec__': None,
'pprint': <module 'pprint' from \
'/sdx/conda/envs/sdxpy/lib/python3.11/pprint.py'>}
As the output shows,
globals
is a dictionary containing
all the variables in the program’s global scope.
Since we just started the interpreter,
all we see are the variables that Python defines automatically.
(By convention,
Python uses double underscores for names that mean something special to it.)
What happens when we define a variable of our own?
import pprint
my_variable = 123
pprint.pprint(globals())
{'__annotations__': {},
'__builtins__': <module 'builtins' (built-in)>,
'__cached__': None,
'__doc__': None,
'__file__': '/sdx/test/globals_plus.py',
'__loader__': <_frozen_importlib_external.SourceFileLoader object \
at 0x108039290>,
'__name__': '__main__',
'__package__': None,
'__spec__': None,
'my_variable': 123,
'pprint': <module 'pprint' from \
'/sdx/conda/envs/sdxpy/lib/python3.11/pprint.py'>}
Sure enough,
my_variable
is now in the dictionary.
If function names are just variables
and a program’s variables are stored in a dictionary,
we can loop over that dictionary
to find all the functions whose names start with test_
:
def find_tests(prefix):
for (name, func) in globals().items():
if name.startswith(prefix):
print(name, func)
find_tests("test_")
test_sign_negative <function test_sign_negative at 0x105bcd440>
test_sign_positive <function test_sign_positive at 0x105bcd4e0>
test_sign_zero <function test_sign_zero at 0x105bcd580>
test_sign_error <function test_sign_error at 0x105bcd620>
The hexadecimal numbers in the output show where each function object is stored in memory, which isn’t particularly useful unless we’re extending the language, but at least it doesn’t take up much space on the screen.
Having a running program find things in itself like this is called introspection, and is the key to many of the designs in upcoming chapters. Combining introspection with the pass-fail-error pattern of the previous section gives us something that finds test functions, runs them, and summarizes their results:
def run_tests():
results = {"pass": 0, "fail": 0, "error": 0}
for (name, test) in globals().items():
if not name.startswith("test_"):
continue
try:
test()
results["pass"] += 1
except AssertionError:
results["fail"] += 1
except Exception:
results["error"] += 1
print(f"pass {results['pass']}")
print(f"fail {results['fail']}")
print(f"error {results['error']}")
pass 2
fail 1
error 1
We could add many more features to this (and pytest does), but almost every modern test runner uses this design.
Summary
When reviewing the ideas introduced in this chapter (Figure 6.2), it’s worth remembering Clarke’s Third Law, which states that any sufficiently advanced technology is indistinguishable from magic. The same is true of programming tricks like introspection: the code that finds tests dynamically seems transparent to an expert who understands that code is data, but can be incomprehensible to a novice. As we said in the discussion of comprehension curves in Chapter 1, no piece of software can be optimal for both audiences; the only solution to this problem is education, which is why books like this one exist. Please see Appendix B for extra material related to these ideas.
Exercises
Looping Over globals
What happens if you run this code?
for name in globals():
print(name)
What happens if you run this code instead?
name = None
for name in globals():
print(name)
Why are the two different?
Individual Results
-
Modify the test framework so that it reports which tests passed, failed, or had errors and also reports a summary of how many tests produced each result.
-
Write unit tests to check that your answer works correctly.
Setup and Teardown
Testing frameworks often allow programmers to specify a setup
function
that is to be run before each test
and a corresponding teardown
function
that is to be run after each test.
(setup
usually recreates complicated test fixtures,
while teardown
functions are sometimes needed to clean up after tests,
e.g., to close database connections or delete temporary files.)
Modify the testing tool in this chapter so that
if a file of tests contains a function called setup
then the tool calls it exactly once before running each test in the file.
Add a similar way to register a teardown
function.
Timing Tests
Modify the testing tool so that it records how long it takes to run each test.
(The function time.time
may be useful.)
Selecting Tests
Modify the testing tool so that if a user provides -s pattern
or --select pattern
on the command line
then the tool only runs tests that contain the string pattern
in their name.
Finding Functions
Python is dynamically typed,
which means it checks the types of values as code runs.
We can do this ourselves using the type
function,
which shows that 3 is an integer:
print(type(3))
<class 'int'>
or that a function is a function:
def example():
pass
print(type(example))
<class 'function'>
However, built-in functions have a different type:
print(type(len))
<class 'builtin_function_or_method'>
so it’s safer to use callable
to check if something can be called:
def example():
pass
print(callable(example), callable(len))
True True
-
Modify the test runner in this chapter so that it doesn’t try to call things whose names start with
test_
but which aren’t actually functions. -
Should the test runner report these cases as errors?
Local Variables
Python has a function called locals
that returns all the variables defined in the current local scope.
-
Predict what the code below will print before running it. When does the variable
i
first appear and is it still there in the final line of output? -
Run the code and compare your prediction with its behavior.
def show_locals(low, high):
print(f"start: {locals()}")
for i in range(low, high):
print(f"loop {i}: {locals()}")
print(f"end: {locals()}")
show_locals(1, 3)