getindex(A::AbstractArray, I...) slows down computations with array of type Array{Float64,2}

Hi there,

while optimizing some code of mine I run into sum behavior that I don’t fully understand.

I have an array of point coordinates, which I need to access. I stored the coordinates in column-major order as it is recommended in the official Julia documentation. However, accessing the point coordinates in the array turned out to be a performance bottleneck in my code. I was able to track it down to Julia’s built-in getindex(A::AbstractArray, I...) method. Which puzzles me a lot because

  1. I thought my point list is of type Array{Float64,2} so why is it defaulting to the AbstracArray and
  2. Playing around with the way I am indexing into the array, I can significantly boost the computations by using a single index.

Here is a minimum working example:

points=ones(Float64,2,50000)

# indexing by slicing column using the colon operator
function indexing1(points::Array{Float64, 2})
    dim=2
    x=0.0
    for i = 1:50000
        x += sum(points[:,i])
    end
    return x
end

# indexing with a single index
function indexing2(points::Array{Float64, 2})
    dim=2
    x=0.0
    for i=0:50_000-1
        for d=1:dim
            x += points[i*dim+d]
        end
    end
    return x
end

#slicing without the colon-opetrator
function indexing3(points::Array{Float64, 2})
    dim=2
    x=0.0
    for i=0:50_000-1
            x += sum(points[i*dim+1:(i+1)*dim])
    end
    return x
end

## Run this section twice in order to avoid measuring the compile time
println("scheme 1")
@time indexing1(points)
println("scheme 2")
@time indexing2(points)
println("scheme 3")
@time indexing3(points)

Running the computational section twice in order to avoid measuring the compilation time gives me the following output:

scheme 1
  0.002707 seconds (50.00 k allocations: 4.578 MiB)
scheme 2
  0.000140 seconds
scheme 3
  0.002973 seconds (50.00 k allocations: 4.578 MiB)

Unsurprisingly schemes 1 and 3 are performing identical. However, scheme 2 is notably faster with (zero?) memory allocations.

I really don’t understand why this is the case and would appreciate any explanation.
Bonus point, if you can tell me how to still have a readable two-index scheme achieving the same performance as with the one index in scheme 2. :wink:

Thank you very much
G

You are allocating a copy of the slice in your loop here. Use @views to make this non-allocating.

Much better to use a 1d array of static arrays for this kind of thing. For example:

using StaticArrays
points = fill(@SVector[1.0, 1.0], 50000)

function indexing(points)
    x = 0.0
    for i = 1:length(points)
        x += sum(points[i])
    end
    return x
end

Thank you very much.
@views is indeed solving the problem. I will also have a look into 1d static arrays.