Introduction to software testing

This talk is designed to discuss the concepts of software testing. In it, I will outline the concept of software testing and summarize the current real-world best practices and procedures. We will then discuss how to, and the difficulties of, applying this to scientific software, but we won't actually discuss how to do it this week: next time will be a tutorial for that, based on this week's feedback.

Introduction to software testing


  • First, I will discuss basic concepts of testing and how other projects use it.
  • Then, I will describe some of the common tools for testing
  • Finally, we will discuss things that make scientific code testing hard

Basic concepts of unit testing

How do you write correct software?

  • You are a scientist, and your code is what gives you results.
  • How do you personally convince yourself that what you write is correct?
    • Do you have enough confidence that you just write and use?
    • Do you write and then run it a few times and see if the results look good?

What is (automated) software testing?

  • Instead of running some tests and looking at output yourself...
  • ... make it automatic
  • Tests can be re-run at any time in the future

Testing is considered one of the cornerstones to good software.

  • Benefits:
    • Find problems early
    • Find regressions when you make big change
    • Simplifies integration
    • Documentation
    • Design
  • This talk is about systematically doing so automatically and systematically, instead of just testing only while developing.
  • The key to making testing work is balance.

Examples of unit tests of different libraries

Good projects have a policy of never accepting contributions without tests.

Good projects have a policy: if someone reports a bug, add a test to reproduce the bug, then fix it. Bug will not appear again.

Tests can be a lot of work! You can easily write more lines of code to test than to solve the problem sometimes. But if code is designed well, it can be very easy.

Different types of testing

  • Unit testing
    • Testing the smallest atomic components, each function in isolation without risk of other functions affecting things.
    • A test failure here is easy to track down since it should have only one function used.
  • Integration testing
    • Testing how things work together, functions or components can communicate properly.
  • System testing
    • Testing everything together, an entire run and OS interaction.
  • You can make tests at different levels of this hierarchy.
  • Each level of this hierarchy is good for different things
  • In research, it's not worth testing every function at every level.

Example of system test: having test dataset that calculates quickly. You can quickly and often run on test data to make sure the script completes and has proper output.*

Example of tests for research: you may want unit tests on calculations, integration tests on some components, and no system tests: if the whole thing breaks, you'll notice anyway.

How to test

  • Tests should be automatic - can run and check themselves in a script, without any user interaction.
  • Tests should generally be fast to run.
  • Commit hooks: tests can be automatically run when you commit to version control. (Continuous integration)
  • There can be significant technical overhead in testing certain applications (e.g. web applications)

As scientists, should we, and how should we, use testing in our work?

  • With code always changing, things could break and you won't notice.
  • With good tests, you have the ability to change things up with less risk of wrong results.
  • Good testing will rely on knowing the testing hierarchy and writing the right tests for the right jobs.

Unit testing tools and workflows

unittest libraries

  • unittest: In python standard library, provides a base to build on
    • Fully object oriented (to the point of being annoying to use)
  • nose: Python module to make unit testing nicer
    • "nose extends unittest to make testing easier."
    • Provides a wrapper "nosetests" to automatically find and run tests
    • Tests can also be simple functions.

Python nose example

from import assert_true, assert_equal, assert_greater_equal, assert_less

from import *

def test_sole():
    # For small graphs we can exactly specify what the outcome should be:

    # alpha=0, delta=0
    assert_isomorphic(sole(T=3, alpha=0, delta=0),
                      G({0:(1,2), 1:(0,2)}))

    assert_isomorphic(sole(T=4, alpha=0, delta=0),
                      G({0:(1,2), 1:(0,2), 3:(1,2)}))


  • Put tests in the comments/docstrings in functions:

    def factorial(n):
        >>> factororial(5)
  • When run with the doctest framework, the >>> lines are input, and output is below.

  • Input is evaluated and must match output.

  • Very simple to make, and documents as well as test

Python doctest example


def factorial(n):
    """Return the factorial of n, an exact integer >= 0.

    If the result is small enough to fit in an int, return an int.
    Else return a long.

    >>> [factorial(n) for n in range(6)]
    [1, 1, 2, 6, 24, 120]
    >>> [factorial(long(n)) for n in range(6)]
    [1, 1, 2, 6, 24, 120]
    >>> factorial(30)
    >>> factorial(30L)


  • inline sanity checks (not unit tests)!
  • They catch things that your code and unit tests don't catch.
  • They should exist in any good language - if not, make them yourself.
  • Recommendation: write assertions when making new functions. Remove them later once the function works AND if speed is an issue.
  • Can be removed automatically for performance purposes.
    • python -o runs python without assertions, gcc -DNDEBUG compiles without assertions.
    • I personally leave them in as long as possible - correctness is more important to me than speed.

Assertions example

  • I am making a growing model of a network.
  • My calculations say the next edge should be added between a and b.
  • Before calling g.add_edge(a, b), I ...
  • ... write assert not g.has_edge(a, b).
  • If my calculations were wrong, I will know instead of it passing silently.

Python syntax:

assert test_expression, message
    # test__expression - evaluated, if True then nothing happens, if false raise AssertionError
    # message - only evaluated if expression is False, used as the assertion message.

C syntax:

#include <assert.h>

Assertions are especially useful when making new functions and code. It is an important, and cheap, sanity check.

Test driven development

  • Testing taken to the extreme
  • You write the tests first, then write code to make the test pass.
  • Nothing exists without a test.
  • You can feel free to change anything and not think about it, as long as the tests pass you are good to go.

Code coverage

  • Tools that take the unit tests and run and show you which lines were NOT tested.
  • Integrated with other tools.


Thought process behind making test scripts

  • Think about the simplest problem with an easily computed answer. That is your benchmark.
    • You will need to make mock data that has known properties
  • Write tests to verify those mock properties.
  • Make other small changes and test them.
  • Test all options to the functions.
    • Do they work together?

Benefits from this:

  • Forces you to think about testing.
  • Better design earlier.
  • Less chance of random bugs being introduced later on.

Scientific software testing

"Is this worth it?"

  • Making test scripts is hard.
  • But you _do_ always test your code anyway, just interactively and non-repeatably (you just run things). Right?
  • In fact, as a scientist your obligation is to make sure that your code is correct (reproducible!)
  • You "just" need to think about this some and turn it into an automatic system.
    • Replaces the "interactive debugging" steps.
  • Something you should do anyway, even though it's hard.

Code structure issues

  • You need to design code in a testable fashion
  • Suggestion: Separate input/output/processing from calculation. It's easy to test calculation in isolation.
  • Sometime, you'll need to make some real scripts and functions that can be called automatically, instead of just running everything interactively.

Combinatiorial issues

  • With 5 different options, that is 32 different combinations to test! Do all combinations need testing?
  • Ideally, yes, but practically, no, unless you automatically write something to test them all.
  • Test corner cases: invalid input, overflows inputs.
  • Ideally, try to make sure that all code paths are hit at least once (see the coverage tests)

Stochastic issues

  • What happens if the function depends on randomness? You can't test that the output matches a fixed value.
  • Possible solutions:
    • Seeding for reproduciblity.
      • makes it immediately reproducible, but test depends on internal structure.
    • Compare results to a distribution.
    • Taking extreme values to eliminate stochasticity.
      • I tested a model by using extreme parameter values. The output then should have been either a clique or a tree. It's easy to verify that, and then I hope that the middle values work.
    • Making the stochastic part modular and mocking it.

Other issues in research

  • You don't know a "true" answer
    • Compare to theory
    • Compare different implementations


  • Testing is a key point of modern software development
  • There are many tools and procedures to help people do this
  • Making the tests can be significant work in itself
  • As scientists, we have some unique difficulties in making tests, but also a unique responsibility to do so.

What do you want for the next talk?

Please give me feedback and requests.


Simply doing an internet search for most of these topics will yield plenty of reading and tutorials of all sorts of levels.

Reading list