Based on my last question here and from what I understand in the performance tips section, Julia is a column major and I should try to (1) start the most inner loop by column and (2) fill pre-allocated output by column if possible.
I tried to verify this for myself by an example using the Distributions package, but I did not see any performance discrepancy of row- or columnwise allocation. Does this only occur with double loops normally? I would actually rather write code row-by-row normally as time-series data is usually stated like that, but if performance is an issue I can also just transpose data and write my functions col-by-col.
Based on my limited understanding, the code I wrote seems to be type stable (at least @code_warntype did not show any ::Any types), here is the code I used:
#Import modules using Distributions using BenchmarkTools function fun_rowwise(t::Int, distr) x = zeros(Float64, (t, Distributions.length(distr)) ) for i in 1:size(x,1) x[i,:] = rand(distr, 1) end x end function fun_colwise(t::Int, distr) x = zeros(Float64, (Distributions.length(distr), t) ) for i in 1:size(x,2) x[:,i] = rand(distr, 1) end x end #Assign time index and distributions d_univ = Normal(0., 1.) d_multiv = MvNormal([0., 0., 0., 0., 0.],[1., 1., 1., 1., 1.]) t = 10^6 fun_rowwise(t, d_univ) fun_rowwise(t, d_multiv) fun_colwise(t, d_univ) fun_colwise(t, d_multiv) @code_warntype fun_rowwise(t, d_univ) @code_warntype fun_colwise(t, d_univ) @benchmark fun_rowwise(t, d_univ) @benchmark fun_colwise(t, d_univ) @benchmark fun_rowwise(t, d_multiv) @benchmark fun_colwise(t, d_multiv)