Alternative permutation/reducing for loops

Hello, I am writing this function where I need to combine all terms in this structure (called Basis):

struct GaussianBasisSet <: AbstractBasisSet
    R
    α::Matrix{Float64}
    d::Matrix{Float64}
    ℓ::Int
    m::Int
    n::Int
end

Example:

13-element Vector{BasisSets.GaussianBasisSet}:
 BasisSets.GaussianBasisSet([0.0 -0.007156 0.965491], [5484.67166 825.234946 … 16.8975704 5.79963534], [0.00183107443 0.0139501722 … 0.470192898 0.358520853], 0, 0, 0)
 BasisSets.GaussianBasisSet([0.0 -0.007156 0.965491], [15.53961625 3.599933586 1.01376175], [-0.1107775495 -0.1480262627 1.130767015], 0, 0, 0)
 BasisSets.GaussianBasisSet([0.0 -0.007156 0.965491], [15.53961625 3.599933586 1.01376175], [0.07087426823 0.3397528391 0.7271585773], 1, 0, 0)
 BasisSets.GaussianBasisSet([0.0 -0.007156 0.965491], [15.53961625 3.599933586 1.01376175], [0.07087426823 0.3397528391 0.7271585773], 0, 1, 0)
 BasisSets.GaussianBasisSet([0.0 -0.007156 0.965491], [15.53961625 3.599933586 1.01376175], [0.07087426823 0.3397528391 0.7271585773], 0, 0, 1)
 BasisSets.GaussianBasisSet([0.0 -0.007156 0.965491], [0.2700058226;;], [1.0;;], 0, 0, 0)
 BasisSets.GaussianBasisSet([0.0 -0.007156 0.965491], [0.2700058226;;], [1.0;;], 1, 0, 0)
 BasisSets.GaussianBasisSet([0.0 -0.007156 0.965491], [0.2700058226;;], [1.0;;], 0, 1, 0)
 BasisSets.GaussianBasisSet([0.0 -0.007156 0.965491], [0.2700058226;;], [1.0;;], 0, 0, 1)
 BasisSets.GaussianBasisSet([0.0 0.001486 -0.003471], [18.73113696 2.825394365 0.6401216923], [0.03349460434 0.2347269535 0.8137573261], 0, 0, 0)
 BasisSets.GaussianBasisSet([0.0 0.001486 -0.003471], [0.1612777588;;], [1.0;;], 0, 0, 0)
 BasisSets.GaussianBasisSet([0.0 0.931026 1.207929], [18.73113696 2.825394365 0.6401216923], [0.03349460434 0.2347269535 0.8137573261], 0, 0, 0)
 BasisSets.GaussianBasisSet([0.0 0.931026 1.207929], [0.1612777588;;], [1.0;;], 0, 0, 0)
function overlap(basis)
    K = length(basis)
    S = zeros(K, K)

    for (i, basisᵢ) in enumerate(basis)
        for (j, basisⱼ) in enumerate(basis)
            for (αᵢ, dᵢ) in zip(basisᵢ.α, basisᵢ.d)
                for (αⱼ, dⱼ) in zip(basisⱼ.α, basisⱼ.d)

                    Rᵢ = basisᵢ.R
                    Rⱼ = basisⱼ.R

                    ℓᵢ, mᵢ, nᵢ = basisᵢ.ℓ, basisᵢ.m, basisᵢ.n
                    ℓⱼ, mⱼ, nⱼ = basisⱼ.ℓ, basisⱼ.m, basisⱼ.n

                    S[i, j] += (
                        exp(-αᵢ * αⱼ * distance(Rᵢ, Rⱼ) / (αᵢ + αⱼ)) *
                        normalization(αᵢ, ℓᵢ, mᵢ, nᵢ) *
                        normalization(αⱼ, ℓⱼ, mⱼ, nⱼ) *
                        dᵢ *
                        dⱼ *
                        Sxyz(Rᵢ, Rⱼ, αᵢ, αⱼ, ℓᵢ, ℓⱼ, mᵢ, mⱼ, nᵢ, nⱼ)
                    )
                end
            end
        end
    end

    return S
end

However, this is expensive and I would like to reduce the number of for loops to make it more performant. Any tips?

First things to do are to put a type on your R parameter which will make this more type stable. The next is to move your computation of distance(Rᵢ, Rⱼ) out two loops.

And use a SVector{3,Float64} for R

Does SVector() allocate less memory?

No (but also yes). You still have to allocate space for the 3 numbers, but in a type stable context, you can often inline that allocation into the parent object which will reduce the amount of space the object takes up.

Also arithmetic operations with SVectors return SVectors, and these resulting vectors then do not allocate (on the heap, which is what matters). Thus it may be possible that you get speedups depending on what you have inside Sxyz.

1 Like

I have moved the distance(R_i, R_j) out of the loops, but nothing has changed in terms of allocations or computing time.

julia> @benchmark overlap_2(basis, hydrogen)
BenchmarkTools.Trial: 10000 samples with 1 evaluation.
 Range (min … max):  254.334 μs …  26.683 ms  ┊ GC (min … max): 0.00% … 98.85%
 Time  (median):     276.459 μs               ┊ GC (median):    0.00%
 Time  (mean ± σ):   300.024 μs ± 558.434 μs  ┊ GC (mean ± σ):  6.98% ±  3.78%

       ▂▅▇▆▃ ▂▄▆▇█▆▅▅▂▁
  ▂▂▃▅█████████████████████▇█▆▆▅▆▄▄▄▃▃▄▃▃▃▃▃▃▂▂▂▂▂▂▂▂▂▂▂▂▂▁▂▂▂▂ ▄
  254 μs           Histogram: frequency by time          340 μs <

 Memory estimate: 317.41 KiB, allocs estimate: 13672.

julia> @benchmark overlap(basis, hydrogen)
BenchmarkTools.Trial: 10000 samples with 1 evaluation.
 Range (min … max):  252.375 μs …  31.246 ms  ┊ GC (min … max): 0.00% … 98.93%
 Time  (median):     278.708 μs               ┊ GC (median):    0.00%
 Time  (mean ± σ):   300.581 μs ± 616.198 μs  ┊ GC (mean ± σ):  6.59% ±  3.26%

     ▂    ▂▄▅█▇▆▄▃▃▃▂▂▁▁▁
  ▁▂▇█▇▄▅▇███████████████▇▆▅▄▃▂▂▂▂▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁ ▃
  252 μs           Histogram: frequency by time          366 μs <

 Memory estimate: 317.41 KiB, allocs estimate: 13672.

Any other suggestions?

Did you fix the type of R in the struct? It will be easier to help if you provide the updated code and a running example.

2 Likes