Practical ways to write correct code

We've covered testing. We've covered version control. But have we put it all together yet?

I (and I think most people) are surprised by the number of trivial bugs we find in our code. This discussion is based off of techniques I use to minimize bugs. Most of them are summaries of things discussed in other talks. Since I am not an expert here, I hope that others will contribute ideas to this talk.

See also: http://arxiv.org/abs/1210.0530, "Best Practices for Scientific Computing", Greg Wilson et. al.

Life-critical programming

Can you imagine programming a traffic control system? Or a Mars rover? We would completely fail at that.

There are techniques that professionals use in order to circumvent human error.

I can't claim to know these techniques, but I can talk about what I do.

Types of bugs

Use version control a lot

Every change should be seen at least two times:

When you commit, don't just commit all at once (git commit -a), use git commit -p to verify each change individually.

Use lots of functions

Reuse code as much as possible

The more something is used, the more situations it is tested in different circumstances. That is _good_.

You should try to reuse your code in as many projects as possible.

Different people should use the same code in different contexts in order to maximize the number of chances for bugs to appear.

But never copy and paste code!

Write as clearly as possible

Even if something is more verbose or slower, clearer is better.

Beautiful is easy to read and understand, easy to understand is more likely to be correct, and more likely to be correct is more productive.

Think as if you want someone else to be able to read it and use it.

Program incrementally

Complete rewrites are bad

Once I saw something about complete rewrites being bad:

Run on test data first

When writing anything non-trivial, compare to known results first. Make sure that you get existing things right.

Corollary: your code should be flexible enough to run on smaller or test data first for verification.

Use assertions

Reproducible

One command line will repeat my entire analysis and produce the final output, even if all intermediate results are done.

References

Other notes