We've covered testing. We've covered version control. But have we put it all together yet?
I (and I think most people) are surprised by the number of trivial bugs we find in our code. This discussion is based off of techniques I use to minimize bugs. Most of them are summaries of things discussed in other talks. Since I am not an expert here, I hope that others will contribute ideas to this talk.
See also: http://arxiv.org/abs/1210.0530, "Best Practices for Scientific Computing", Greg Wilson et. al.
Can you imagine programming a traffic control system? Or a Mars rover? We would completely fail at that.
There are techniques that professionals use in order to circumvent human error.
I can't claim to know these techniques, but I can talk about what I do.
Every change should be seen at least two times:
When you commit, don't just commit all at once (git commit -a), use git commit -p to verify each change individually.
The more something is used, the more situations it is tested in different circumstances. That is _good_.
You should try to reuse your code in as many projects as possible.
Different people should use the same code in different contexts in order to maximize the number of chances for bugs to appear.
But never copy and paste code!
Even if something is more verbose or slower, clearer is better.
Beautiful is easy to read and understand, easy to understand is more likely to be correct, and more likely to be correct is more productive.
Think as if you want someone else to be able to read it and use it.
Once I saw something about complete rewrites being bad:
When writing anything non-trivial, compare to known results first. Make sure that you get existing things right.
Corollary: your code should be flexible enough to run on smaller or test data first for verification.
Wikipedia: Fail-fast
A fail-fast system is designed to immediately report at its interface any failure or condition that is likely to lead to failure. Fail-fast systems are usually designed to stop normal operation rather than attempt to continue a possibly flawed process.
Think if you can sanity-check all function inputs
Raise exceptions for input domain which is not handled yet.
One command line will repeat my entire analysis and produce the final output, even if all intermediate results are done.