Performance of loop over multi-dimensional array limited by getindex

Hey everyone,

I am trying to write a decently fast solver for the electric potential along a mesh given a charge distribution using a Gauss-Seidel method with SOR.

Essentially, I have two 3d arrays within a struct and need to loop over them on every iteration. This has been remarkably slow (it is part of a larger code and takes basically 80-90% of execution time) and I’d like to know if that’s just how it be or if it’s me being stupid.

Here is a MWE:

struct World
    nn::Vector{Int64}
    dh::Vector{Float64}
    ρ::Array{Float64}
    φ::Array{Float64}
end

const global   ϵ0::Float64 = 8.8541878e−12   # vacuum permitivity

N = 20
nn = fill(N, 3)

ρ = zeros(Float64, nn...)
φ = zeros(Float64, nn...)

world = World(fill(N, 3), [0.01, 0.01, 0.01], ρ, φ)

function potential_solver!(world::World, n::Int64)
    idx2 = @. 1/world.dh^2

    φ = world.φ
    ρ = world.ρ

    for _ in 1:n
        for i in 2:(world.nn[1] - 1), j in 2:(world.nn[2] - 1), k in 2:(world.nn[3] - 1)
            r = ρ[i, j, k]
            p1 = φ[i - 1, j, k]
            p2 = φ[i + 1, j, k]
            p3 = φ[i, j - 1, k]
            p4 = φ[i, j + 1, k]
            p5 = φ[i, j, k - 1]
            p6 = φ[i, j, k + 1]

            φ_new = (r/ϵ0 + idx2[1]*(p1 + p2) + idx2[2]*(p3 + p4) + idx2[3]*(p5 + p6))/(2. *sum(idx2))

            φ[i, j, k] += 1.4*(φ_new - φ[i, j, k])
        end
    end
end

After profiling this code, I have found that about 25% of the time is spent on line 36 (setting the new value of φ) and another 70% of it is spent between lines 26-32, just accessing the values in the arrays (I isolated those so that it became more clear what was happening). Also, @profview_allocs shows nothing while @btime says there’s 89 MiB of allocations (think it’s because those allocations happen and disappear so fast that the sample_rate doesn’t catch them so not too worried about that).

using BenchmarkTools, Profile

@btime potential_solver!($world, 1000)
@profview potential_solver!(world, 1000)
@profview_allocs sample_rate=0.01 potential_solver!(world, 1000)

I also noticed that the profiler flags it as time spent in GC in the getindex() function so I’m trying to figure out if there is some way to speed up this bit of code.

I’ve already tried @inbounds, messing around with the nesting of the loops and enclosing the loop between GC.enable(false) and GC.enable(true). The values assigned to the variables between line 26-32 were originally in the definition of φ_new so defining those variables also doesn’t affect performance (they’re local so that shouldn’t be an issue anyways but I’m grasping at straws here).

Essentially, just wanted to know if there’s something I’m missing or if this is just as fast as this bit of code will get.

1 Like

You have a type instability due to the fact that the type of the fields φ and ρ is abstract. You probably want Array{Float64, 3} for both of them.

On my laptop, this change gives a ~22 factor speedup:

1.541 s (58320003 allocations: 889.89 MiB) # before

69.437 ms (3 allocations: 96 bytes) # after the change
6 Likes

Good catch.

I think the other 3 allocations come from the idx2 line

idx2 = @. 1/world.dh^2

It doesn’t change during the loop so OP could just save idx2 in world instead of dh.

Yeah, that’s there cause that’s the only part of the code that world use that value so it makes more sense to allocate it within the function alone.

Depending on how far down I optimize the final code I might just store it in the world struct but it’ll prolly stay like that for now.

You sir are an absolute godsend, completely solved it. Thank you very much!

1 Like

Is the role of field nn to store the size of the arrays? That is redundant, and even unsafe, if you were to use @inbounds. It’s much better and more idiomatic to access the sizes using size(φ) etc.

Yep, that’s what it does, it’s there because there are a lot of arrays in the larger code that have the same size so I used it since I’m sure that their size never changes and so probably safe to use it like this. I might rewrite that at some point but I don’t think there’s enough reason to do it right now.