Hey everyone,
I am trying to write a decently fast solver for the electric potential along a mesh given a charge distribution using a Gauss-Seidel method with SOR.
Essentially, I have two 3d arrays within a struct and need to loop over them on every iteration. This has been remarkably slow (it is part of a larger code and takes basically 80-90% of execution time) and I’d like to know if that’s just how it be or if it’s me being stupid.
Here is a MWE:
struct World
nn::Vector{Int64}
dh::Vector{Float64}
ρ::Array{Float64}
φ::Array{Float64}
end
const global ϵ0::Float64 = 8.8541878e−12 # vacuum permitivity
N = 20
nn = fill(N, 3)
ρ = zeros(Float64, nn...)
φ = zeros(Float64, nn...)
world = World(fill(N, 3), [0.01, 0.01, 0.01], ρ, φ)
function potential_solver!(world::World, n::Int64)
idx2 = @. 1/world.dh^2
φ = world.φ
ρ = world.ρ
for _ in 1:n
for i in 2:(world.nn[1] - 1), j in 2:(world.nn[2] - 1), k in 2:(world.nn[3] - 1)
r = ρ[i, j, k]
p1 = φ[i - 1, j, k]
p2 = φ[i + 1, j, k]
p3 = φ[i, j - 1, k]
p4 = φ[i, j + 1, k]
p5 = φ[i, j, k - 1]
p6 = φ[i, j, k + 1]
φ_new = (r/ϵ0 + idx2[1]*(p1 + p2) + idx2[2]*(p3 + p4) + idx2[3]*(p5 + p6))/(2. *sum(idx2))
φ[i, j, k] += 1.4*(φ_new - φ[i, j, k])
end
end
end
After profiling this code, I have found that about 25% of the time is spent on line 36 (setting the new value of φ) and another 70% of it is spent between lines 26-32, just accessing the values in the arrays (I isolated those so that it became more clear what was happening). Also, @profview_allocs
shows nothing while @btime
says there’s 89 MiB of allocations (think it’s because those allocations happen and disappear so fast that the sample_rate
doesn’t catch them so not too worried about that).
using BenchmarkTools, Profile
@btime potential_solver!($world, 1000)
@profview potential_solver!(world, 1000)
@profview_allocs sample_rate=0.01 potential_solver!(world, 1000)
I also noticed that the profiler flags it as time spent in GC in the getindex()
function so I’m trying to figure out if there is some way to speed up this bit of code.
I’ve already tried @inbounds
, messing around with the nesting of the loops and enclosing the loop between GC.enable(false)
and GC.enable(true)
. The values assigned to the variables between line 26-32 were originally in the definition of φ_new so defining those variables also doesn’t affect performance (they’re local so that shouldn’t be an issue anyways but I’m grasping at straws here).
Essentially, just wanted to know if there’s something I’m missing or if this is just as fast as this bit of code will get.