Performance deviations when summing an integer matrix

It will be a lot easier to see what is going on if you benchmark the summation computation separately from importing the CSV file and converting it to a Tables.matrix.

julia> a = rand(Int,100,200);

julia> @btime process1($a);
  7.625 μs (0 allocations: 0 bytes)

julia> @btime process2($a);
  1.933 μs (0 allocations: 0 bytes)

process1 is slower because it has worse cache locality (worse spatial locality = less-consecutive access = poor cache-line utilization), because Matrix is column major (columns are contiguous).

julia> @btime process3($a);
  16.393 μs (100 allocations: 176.56 KiB)

julia> @btime process4($a);
  21.190 μs (200 allocations: 175.00 KiB)

julia> @btime process5($a);
  24.343 μs (2 allocations: 156.33 KiB)

These are much slower because a slice like sum(a[i, :]) creates a copy in Julia, so you are allocating little 1d arrays over and over. You can speed things up by using a view, e.g. by putting @views in front of your function declaration. If I do that for all three of these functions, then the allocations go away:

julia> @btime process3($a);
  8.856 μs (0 allocations: 0 bytes)

julia> @btime process4($a);
  3.696 μs (0 allocations: 0 bytes)

julia> @btime process5($a);
  2.217 μs (0 allocations: 0 bytes)

and process4 is faster than process3 because of the above-mentioned spatial locality, while process5 is fastest because it calls an optimized sum routine on the whole array at once.

(Note also that the ::Matrix{Int} argument-type declaration is irrelevant for performance. You would get exactly the same performance if the argument type were not declared at all.)

4 Likes