Can I make my code faster with parallelism, or just plain better coding?

When optimizing or refactoring code like this, I always employ a simple form of Golden Master Testing. Basically, all you’re testing then, is that your working version is equal to the original version. Whether it was correct or not to begin with, who knows, but at least it works the same.

This is usually trivial to set up. Example:

Random.seed!(0)          # fix random seed, so you always get the same data

A = rand(100, 100)        # create input data
B = complex_algorithm(A)  # output data returned as a matrix

# Now calculate some hash of the output (problem specific)
hash = sum(sum(B))

# And test against what you got with the original version of your code
expected = 8425.2178752625
hash ≈ expected || println("Expected $expected, got $hash")

Of course, a sum is not a perfect hash (and if you have NaNs it’s worthless), but I don’t think you need to go overboard in creating a perfect hash. A simple sum will almost always catch refactoring errors (like the one you had), and the above takes a minute or so to implement.

(Note: Just pay attention that if you rearrange floating point operations, floats may differ slightly due to rounding artifacts, so don’t do exact comparisons.)

There are many variations of this. A more robust option is to keep a copy of you original code, and call both that and your new code with the given input, and make sure that the results are equal.

5 Likes