I just read the excellent article on improving calculation performance in loops on the Julia blog fast-numeric.
In the article we can read …
Julia arrays are stored in column-major order, which means that the rows of a column are contiguous, but the columns of a row are generally not. It is therefore generally more efficient to access data column-by-column than row-by-row.
The stated principle seems correct to me. However, benchmarking it doesn’t confirm it.
using BenchmarkTools
@btime begin
global a
a = rand(1_000, 1_000);
end;
# Natural implementation (not column-wise)
@btime begin
m, n = size(a)
r = Vector{Float64}(undef, m)
for i = 1:m
s = 0.
for j = 1:n
s += a[i,j]
end
r[i] = s
end
end
# Column-wise implementation
@btime begin
m, n = size(a)
r = Vector{Float64}(undef, m)
for i = 1:m
r[i] = a[i,1]
end
for j = 2:n, i = 1:m
r[i] += a[i,j]
end
end
I get the following results:
The conclusion is:
Using column-wise calculation loops is slower (some 40% in this case) and uses more allocations (25% in this case).
Therefore, column-wise calculations shouls be discouraged.
I am surprised by my own conclusion, am I doing anything wrong?
Your methods are dissimilar. If I make them symmetrical:
using BenchmarkTools
@btime begin
global a
a = rand(1_000, 1_000);
end;
@btime begin
m, n = size(a)
r = Vector{Float64}(undef, m)
for i = 1:m
s = 0.0
for j = 1:n
s += a[i, j]
end
r[i] = s
end
end
@btime begin
m, n = size(a)
r = Vector{Float64}(undef, n)
for j = 1:n
s=0.0
for i = 1:m
s += a[i, j]
end
r[j] = s
end
end
I get:
1.178 ms (2 allocations: 7.63 MiB)
89.509 ms (3980985 allocations: 76.04 MiB)
76.960 ms (3980985 allocations: 76.04 MiB)
And putting things in functions as @John_Gibson suggests:
using BenchmarkTools
function docolumns(a)
m, n = size(a)
r = Vector{Float64}(undef, n)
for j = 1:n
s = 0.0
for i = 1:m
s += a[i, j]
end
r[j] = s
end
return nothing
end
function dorows(a)
m, n = size(a)
r = Vector{Float64}(undef, m)
for i = 1:m
s = 0.0
for j = 1:n
s += a[i, j]
end
r[i] = s
end
return nothing
end
a = rand(1_000, 1_000)
@btime dorows($a)
@btime docolumns($a)
When you say that I should avoid “untyped global variables”, are you referring to “a” ?
Concerning the enclosing of code blocks in functions, you are perfectly right