No significant performance difference of row- vs column-wise loops

mrVeng · July 15, 2019, 11:53am

Hi everyone,

Based on my last question here and from what I understand in the performance tips section, Julia is a column major and I should try to (1) start the most inner loop by column and (2) fill pre-allocated output by column if possible.

I tried to verify this for myself by an example using the Distributions package, but I did not see any performance discrepancy of row- or columnwise allocation. Does this only occur with double loops normally? I would actually rather write code row-by-row normally as time-series data is usually stated like that, but if performance is an issue I can also just transpose data and write my functions col-by-col.

Based on my limited understanding, the code I wrote seems to be type stable (at least @code_warntype did not show any ::Any types), here is the code I used:

#Import modules
using Distributions
using BenchmarkTools

function fun_rowwise(t::Int, distr)
x = zeros(Float64, (t, Distributions.length(distr)) )
for i in 1:size(x,1)
x[i,:] = rand(distr, 1)
end
x
end

function fun_colwise(t::Int, distr)
x = zeros(Float64, (Distributions.length(distr), t) )
for i in 1:size(x,2)
x[:,i] = rand(distr, 1)
end
x
end

#Assign time index and distributions
d_univ = Normal(0., 1.)
d_multiv = MvNormal([0., 0., 0., 0., 0.],[1., 1., 1., 1., 1.])
t = 10^6

fun_rowwise(t, d_univ)
fun_rowwise(t, d_multiv)
fun_colwise(t, d_univ)
fun_colwise(t, d_multiv)

@code_warntype fun_rowwise(t, d_univ)
@code_warntype fun_colwise(t, d_univ)

@benchmark fun_rowwise(t, d_univ)
@benchmark fun_colwise(t, d_univ)
@benchmark fun_rowwise(t, d_multiv)
@benchmark fun_colwise(t, d_multiv)

lwabeke · July 15, 2019, 12:13pm

In this test case, the speed is limited by the random number generator, not by the memory access. Thus it will make no difference which way you do it.

However if the random number generator is outside the loop it makes a big difference:

julia> using BenchmarkTools

julia> d1 = 10000
10000

julia> d2 = 10000
10000

julia> data = randn(d1, d2)
10000×10000 Array{Float64,2}:
...

julia> function sum_row(data)
         answer = eltype(data)(0)
         for cnt1 = 1:size(data,1)
           for cnt2 = 1:size(data, 2)
             answer += data[cnt1, cnt2]
           end
         end
         answer
         end
sum_row (generic function with 1 method)

julia> function sum_col(data)
         answer = eltype(data)(0)
         for cnt2 = 1:size(data,2)
           for cnt1 = 1:size(data, 1)
             answer += data[cnt1, cnt2]
           end
         end
         answer
         end
sum_col (generic function with 1 method)

julia> @benchmark sum_row($data)
BenchmarkTools.Trial: 
  memory estimate:  0 bytes
  allocs estimate:  0
  --------------
  minimum time:     1.343 s (0.00% GC)
  median time:      1.362 s (0.00% GC)
  mean time:        1.358 s (0.00% GC)
  maximum time:     1.366 s (0.00% GC)
  --------------
  samples:          4
  evals/sample:     1

julia> @benchmark sum_col($data)
BenchmarkTools.Trial: 
  memory estimate:  0 bytes
  allocs estimate:  0
  --------------
  minimum time:     95.393 ms (0.00% GC)
  median time:      97.126 ms (0.00% GC)
  mean time:        97.391 ms (0.00% GC)
  maximum time:     101.757 ms (0.00% GC)
  --------------
  samples:          52
  evals/sample:     1

julia>

mrVeng · July 15, 2019, 12:22pm

Thanks for your answer - makes sense!

Topic		Replies	Views
The columns and the rows - puzzled about performance Performance question	8	904	October 14, 2021
Column-wise calculations in arrays are slower New to Julia	7	402	December 19, 2023
Looping over rows and columns: what am I doing wrong? Performance	4	1115	February 17, 2020
Performance difference when accessing square matrix rows-first or cols-first Performance	14	1729	April 13, 2021
Understanding major order performance when broadcasting in column vs row operations Performance question , array , benchmark	9	1003	June 21, 2021

No significant performance difference of row- vs column-wise loops

Related topics