Reducing allocations

lancejnelson · May 28, 2025, 4:11pm

I have a function that is going to be called many hundreds of thousands of times so I want it optimized maximally. To reduce allocations, I decided I would create a struct the stored some buffer arrays, then I would assign values to them “in-place” to avoid repeated allocations. Here is the function:

function propose_shear_step(atoms::ase.atoms,stepsize::Float64; buffers::Union{NS_buffers,Nothing} = nothing)
    rmv_vec = rand(1:3)
#  rmv_vec = rand(Xoshiro(),1:3)
    # Don't allocate a vector of indices, a tuple is better for memory management.
    remaining_vecs = rmv_vec == 1 ? (2,3) : rmv_vec == 2 ? (1,3) : (1,2)
    new_cell = buffers !== nothing ? buffers.new_cell : MMatrix{3,3,Float64,9}(undef)
    new_cell .= atoms.lVecs
    
    v1 = SVector{3,Float64}(atoms.lVecs[:,remaining_vecs[1]])
    v2 = SVector{3,Float64}(atoms.lVecs[:,remaining_vecs[2]])
    norm_v1 = sqrt(dot(v1,v1))
    v1 = v1 / norm_v1
    v2 = v2 -  v1 * dot(v1, v2)  # Make v2 orthogonal to v1
    norm_v2 = sqrt(dot(v2,v2))
    v2 = v2/norm_v2

    if abs(dot(v1, v2)) > 1e-4
        println("v1 and v2 are not orthogonal, something is wrong.")
    end
    rv = MVector{2,Float64}(undef)
    rand!(Xoshiro(), Normal(0, stepsize), rv)
    rv1, rv2 = rv
#  rv1,rv2 = rand(Normal(0,stepsize),2)
    new_cell[:,rmv_vec] = atoms.lVecs[:,rmv_vec] + rv1 * v1 + rv2 * v2
    inv_lVecs = buffers !== nothing ? buffers.inv_lVecs : inv(SMatrix{3,3,Float64,9}(atoms.lVecs))
      if buffers !== nothing
        mul!(buffers.transform,new_cell,buffers.inv_lVecs)
        return (1.0,SMatrix{3,3,Float64,9}(buffers.transform))
    else
        inv_lVecs = inv(SMatrix{3,3,Float64,9}(atoms.lVecs))
        transform = new_cell * inv_lVecs
        return (1.0,SMatrix{3,3,Float64,9}(transform))
    end
    
end

and here is what ase.atoms, and NS_buffers look like:

struct NS_buffers
    new_cell :: MMatrix{3,3,Float64,9}
    transform :: MMatrix{3,3,Float64,9}
    inv_lVecs :: SMatrix{3,3,Float64,9}
    
end

mutable struct atoms
    title::String
    latpar::Float64
    lVecs::MMatrix{3,3,Float64,9}
    coordSys::Vector{String} # 'D' for Direct or 'C' for cartesian
    positions:: Vector{SVector{3,Float64}}  # List of all the atomic basis vector separated by type 
    velocities:: Vector{SVector{3,Float64}} # Velocities of the atoms
    masses:: Vector{Float64}
    atomTypes:: Vector{Int64} # Integers representing the type for each basis atom
    species:: Vector{String}  # Species of the atoms 
    FP_total_energy:: Float64  # First principles total or peratom energy
    model_energy:: Float64  # Energy that should be used in the fitting process
    lj_vec::Vector{Float64} # Vector of 1/r^6 and 1/r^12 values for each unique pair of atom types
end

But the results of @btime indicate that the performance is worse when I send in the optional buffers as compared to not sending buffers in at all. I’ve always had a hard time identifying allocations. Any help here? Thanks in advance

baggepinnen · May 28, 2025, 5:19pm

These lines allocate memory. The first one can be solved by @views, the second one by

@views @. new_cell[:,rmv_vec] = atoms.lVecs[:,rmv_vec] + rv1 * v1 + rv2 * v2

I’ve always had a hard time identifying allocations.

The allocation profile @profview_allocs in julia vscode plugin is very helpful.

Jake · May 28, 2025, 5:49pm

I am curious what the type String does in a struct for efficiency.

mutable struct atoms
    title::String

Does this force the whole struct to be on the heap making it less efficient?

Oscar_Smith · May 28, 2025, 5:58pm

Just the fact that atoms is mutable means it has to be on the heap.

Jake · May 28, 2025, 7:20pm

What if it were a struct rather than a mutable struct?

lancejnelson · May 28, 2025, 7:21pm

I might be able to make that work but there are float variables inside there that are immutable and I need to be able to change them on the fly.

lancejnelson · May 28, 2025, 7:25pm

Maybe store FP_total_energy and model_energy in a Vector and just remember which one slots in first and which one second?

lancejnelson · May 28, 2025, 7:35pm

Maybe it would be better if I just stepped back and ask the question: I have this function that is going to be called many, many times (I have others that will do the same but if someone can help me see what to look for I can apply to the other ones too) . What are some opportunities for efficiency improvements that you can see. Generally speaking, how can I know how close I am to optimal efficiency?

Originally, I thought that the repeated creation of new_cell would be a bottleneck and if I could use a pre-allocated buffer that would help things out. Not convinced that is the case now.

function propose_shear_step(atoms::ase.atoms,stepsize::Float64)
    rmv_vec = rand(1:3)
    remaining_vecs = rmv_vec == 1 ? (2,3) : rmv_vec == 2 ? (1,3) : (1,2)
    new_cell = MMatrix{3,3,Float64,9}(undef)
    new_cell .= atoms.lVecs
    
    v1 = atoms.lVecs[:,remaining_vecs[1]]
    v2 = atoms.lVecs[:,remaining_vecs[2]]
    norm_v1 = sqrt(dot(v1,v1))
    v1 = v1 / norm_v1
    v2 = v2 -  v1 * dot(v1, v2)  # Make v2 orthogonal to v1
    norm_v2 = sqrt(dot(v2,v2))
    v2 = v2/norm_v2

    if abs(dot(v1, v2)) > 1e-4
        println("v1 and v2 are not orthogonal, something is wrong.")
    end

    rv1,rv2 = rand(Normal(0,stepsize),2)
    new_cell[:,rmv_vec] = atoms.lVecs[:,rmv_vec] + rv1 * v1 + rv2 * v2
    inv_lVecs = inv(SMatrix{3,3,Float64,9}(atoms.lVecs))
    transform = new_cell * inv_lVecs
    return (1.0,SMatrix{3,3,Float64,9}(transform))

end

gdalle · May 28, 2025, 8:03pm

Did you profile it to see precisely where the bottlenecks are?

lancejnelson · May 28, 2025, 8:06pm

I have not had any luck using profilers and understanding the output. I’ve tried using @allocated line by line with some amount of success.

JonasWickman · May 28, 2025, 8:20pm

If the sizes are fixed the way they seem to be, you could simply do it all with SVectors and SMatrices without bothering with any buffers.

function propose_shear_step(atoms, stepsize::Float64)
    rmv_vec = rand(1:3)
    remaining_vecs = rmv_vec == 1 ? (2,3) : rmv_vec == 2 ? (1,3) : (1,2)

    l = SMatrix(atoms.lVecs)
    
    v1 = l[:, remaining_vecs[1]]
    v2 = l[:, remaining_vecs[2]]
    norm_v1 = sqrt(dot(v1,v1))
    v1 = v1 / norm_v1
    v2 = v2 -  v1 * dot(v1, v2)  # Make v2 orthogonal to v1
    norm_v2 = sqrt(dot(v2, v2))
    v2 = v2/norm_v2

    if abs(dot(v1, v2)) > 1e-4
        println("v1 and v2 are not orthogonal, something is wrong.")
    end

    rv1 = rand(Normal(0, stepsize))
    rv2 = rand(Normal(0, stepsize))
    
    new_vec = l[:, rmv_vec] + rv1 * v1 + rv2 * v2
    l1 = l[:, 1]
    l2 = l[:, 2]
    l3 = l[:, 3]
    if rmv_vec == 1
        new_cell = hcat(new_vec, l2, l3)
    elseif rmv_vec == 2
        new_cell = hcat(l1, new_vec, l3)
    elseif rmv_vec == 3
        new_cell = hcat(l1, l2, new_vec)
    else
        error("BAD")
    end

    inv_lVecs = inv(l)
    transform = new_cell * inv_lVecs
    return 1.0, transform

end

No allocations for this one. I think I followed the logic of your code, but I’m not 100%. Regardless, I think something like this should be achievable.

I removed the type annotation for atoms so I could send in a named tuple rather than a struct, but I don’t think that should matter.

lancejnelson · May 28, 2025, 8:32pm

I like it. I’m surprised you didn’t allocate when you created new_cell. Was that just because it was constructed out of SVectors?

atoms.lVecs is an MMatrix and if I just copy it over to new_cell and then modify it, I can get it down to 1 allocation for the copy.

I think I just need to learn a little bit more about SVectors/SMatrices. For example, whey doesn’t this line produce allocations:

new_vec = l[:, rmv_vec] + rv1 * v1 + rv2 * v2

Is it just because everything on the right hand side is a static array/vector?

Also, does

l = SMatrix(atoms.lVecs)

create a copy of atoms.lVecs or is it by reference. I suppose it doesn’t matter since l never gets modified but…

JonasWickman · May 28, 2025, 8:44pm

Yes, new_cell does not allocate because the output is a 3x3 SMatrix, and because the by using an SVector as each argument for hcat, the output type (the size) of the SMatrix is entirely inferrable from the input.

The new_vec line does not allocate because everything is an SVector.

The l = SMatrix(atoms.lVecs) creates a copy, but it does not allocate because it’s an SMatrix, and the type of the SMatrix can be inferred from the type of the MMatrix.

It’s possible that julia in the future might be able to elide the allocation used in temporary MMatrices that do not escape the function in which it is used, but it seems to be a bit hit and miss at the moment.

Edit: I just realized that this may not have been clear, but generally, creating an SVector will not allocate any memory, but creating an MVector will allocate memory. To not get allocations in creating the SVector, one must make sure that its size is known at compile time though.

gdalle · May 29, 2025, 12:53am

Do these help?

nsajko · May 29, 2025, 10:11am

This surely does not do what you intend. For one, Xoshiro is a mutable struct so this possibly causes heap allocation:

julia> dump(Xoshiro)
mutable struct Xoshiro <: AbstractRNG
  s0::UInt64
  s1::UInt64
  s2::UInt64
  s3::UInt64
  s4::UInt64

More importantly, the semantics are surely not what you want. Instead of creating a new random generator state each time you call rand!, why not just let it use the default random generator, by ommiting the argument? It might make sense to provide a RNG explicitly, but then you want to also take it as argument to propose_shear_step.

nsajko · May 29, 2025, 10:16am

Why not norm from LinearAlgebra?

Instead of inv and * (multiplying by the inverse), I guess just dividing directly, using /, should be more efficient.

lancejnelson:

    new_cell[:,rmv_vec] = atoms.lVecs[:,rmv_vec] + rv1 * v1 + rv2 * v2
    inv_lVecs = inv(SMatrix{3,3,Float64,9}(atoms.lVecs))
    transform = new_cell * inv_lVecs
    return (1.0,SMatrix{3,3,Float64,9}(transform))

It should be more efficient to convert to a static array before doing the multiplication/division, instead of after.

JADekker · May 29, 2025, 11:09am

I think I’ve not seen this mentioned yet in the above, but you may want to replace rand(Normal(0, stepsize)) with sqrt(stepsize)*randn(), this avoids needing to create a Normal(0, stepsize) object

JADekker · May 29, 2025, 1:01pm

My bad, Normal assumes a standard deviation, so you’d use stepsize*randn(), no square root!

lancejnelson · May 29, 2025, 4:46pm

Didn’t I do that:

inv_lVecs = inv(SMatrix{3,3,Float64,9}(atoms.lVecs))

Turn it into an SMatrix first, then take the inverse?

nsajko · May 29, 2025, 5:43pm

Ah, sorry, I thought that new_cell is not statically sized. Disregard, then.

Topic		Replies	Views
Reducing allocations when subtracting vectors and multiplying by scalars Performance memory-allocation , optimization	9	443	November 9, 2021
Help to reduce number of allocations General Usage	23	5458	January 6, 2019
Best practices to reduce allocations in Julia Performance	6	1523	September 23, 2023
Reduce allocations for shaped views Performance	9	991	February 13, 2019
Help to reduce memory allocations in a function Performance	2	228	July 10, 2023

Reducing allocations

Related topics