No significant performance difference of row- vs column-wise loops

Hi everyone,

Based on my last question here and from what I understand in the performance tips section, Julia is a column major and I should try to (1) start the most inner loop by column and (2) fill pre-allocated output by column if possible.

I tried to verify this for myself by an example using the Distributions package, but I did not see any performance discrepancy of row- or columnwise allocation. Does this only occur with double loops normally? I would actually rather write code row-by-row normally as time-series data is usually stated like that, but if performance is an issue I can also just transpose data and write my functions col-by-col.

Based on my limited understanding, the code I wrote seems to be type stable (at least @code_warntype did not show any ::Any types), here is the code I used:

#Import modules
using Distributions
using BenchmarkTools

function fun_rowwise(t::Int, distr)
x = zeros(Float64, (t, Distributions.length(distr)) )
for i in 1:size(x,1)
x[i,:] = rand(distr, 1)
end
x
end

function fun_colwise(t::Int, distr)
x = zeros(Float64, (Distributions.length(distr), t) )
for i in 1:size(x,2)
x[:,i] = rand(distr, 1)
end
x
end

#Assign time index and distributions
d_univ = Normal(0., 1.)
d_multiv = MvNormal([0., 0., 0., 0., 0.],[1., 1., 1., 1., 1.])
t = 10^6

fun_rowwise(t, d_univ)
fun_rowwise(t, d_multiv)
fun_colwise(t, d_univ)
fun_colwise(t, d_multiv)

@code_warntype fun_rowwise(t, d_univ)
@code_warntype fun_colwise(t, d_univ)

@benchmark fun_rowwise(t, d_univ)
@benchmark fun_colwise(t, d_univ)
@benchmark fun_rowwise(t, d_multiv)
@benchmark fun_colwise(t, d_multiv)
1 Like

In this test case, the speed is limited by the random number generator, not by the memory access. Thus it will make no difference which way you do it.

However if the random number generator is outside the loop it makes a big difference:

julia> using BenchmarkTools

julia> d1 = 10000
10000

julia> d2 = 10000
10000

julia> data = randn(d1, d2)
10000×10000 Array{Float64,2}:
...

julia> function sum_row(data)
         answer = eltype(data)(0)
         for cnt1 = 1:size(data,1)
           for cnt2 = 1:size(data, 2)
             answer += data[cnt1, cnt2]
           end
         end
         answer
         end
sum_row (generic function with 1 method)

julia> function sum_col(data)
         answer = eltype(data)(0)
         for cnt2 = 1:size(data,2)
           for cnt1 = 1:size(data, 1)
             answer += data[cnt1, cnt2]
           end
         end
         answer
         end
sum_col (generic function with 1 method)

julia> @benchmark sum_row($data)
BenchmarkTools.Trial: 
  memory estimate:  0 bytes
  allocs estimate:  0
  --------------
  minimum time:     1.343 s (0.00% GC)
  median time:      1.362 s (0.00% GC)
  mean time:        1.358 s (0.00% GC)
  maximum time:     1.366 s (0.00% GC)
  --------------
  samples:          4
  evals/sample:     1

julia> @benchmark sum_col($data)
BenchmarkTools.Trial: 
  memory estimate:  0 bytes
  allocs estimate:  0
  --------------
  minimum time:     95.393 ms (0.00% GC)
  median time:      97.126 ms (0.00% GC)
  mean time:        97.391 ms (0.00% GC)
  maximum time:     101.757 ms (0.00% GC)
  --------------
  samples:          52
  evals/sample:     1

julia> 

6 Likes

Thanks for your answer - makes sense!