Hi there,
while optimizing some code of mine I run into sum behavior that I don’t fully understand.
I have an array of point coordinates, which I need to access. I stored the coordinates in column-major order as it is recommended in the official Julia documentation. However, accessing the point coordinates in the array turned out to be a performance bottleneck in my code. I was able to track it down to Julia’s built-in getindex(A::AbstractArray, I...)
method. Which puzzles me a lot because
- I thought my point list is of type
Array{Float64,2}
so why is it defaulting to theAbstracArray
and - Playing around with the way I am indexing into the array, I can significantly boost the computations by using a single index.
Here is a minimum working example:
points=ones(Float64,2,50000)
# indexing by slicing column using the colon operator
function indexing1(points::Array{Float64, 2})
dim=2
x=0.0
for i = 1:50000
x += sum(points[:,i])
end
return x
end
# indexing with a single index
function indexing2(points::Array{Float64, 2})
dim=2
x=0.0
for i=0:50_000-1
for d=1:dim
x += points[i*dim+d]
end
end
return x
end
#slicing without the colon-opetrator
function indexing3(points::Array{Float64, 2})
dim=2
x=0.0
for i=0:50_000-1
x += sum(points[i*dim+1:(i+1)*dim])
end
return x
end
## Run this section twice in order to avoid measuring the compile time
println("scheme 1")
@time indexing1(points)
println("scheme 2")
@time indexing2(points)
println("scheme 3")
@time indexing3(points)
Running the computational section twice in order to avoid measuring the compilation time gives me the following output:
scheme 1
0.002707 seconds (50.00 k allocations: 4.578 MiB)
scheme 2
0.000140 seconds
scheme 3
0.002973 seconds (50.00 k allocations: 4.578 MiB)
Unsurprisingly schemes 1 and 3 are performing identical. However, scheme 2 is notably faster with (zero?) memory allocations.
I really don’t understand why this is the case and would appreciate any explanation.
Bonus point, if you can tell me how to still have a readable two-index scheme achieving the same performance as with the one index in scheme 2.
Thank you very much
G