# Stop allocating memory when replacing structs in for-loops

I’m working on a project where I have to simulate many systems and track some observable across the many systems. When I try to create new systems in for-loops, I end up getting allocations of `O(n_trials)` when it isn’t necessary. Here is my very contrived MWE:

``````
mutable struct InputData{T}
couplings::Vector{T}
constant::Float64
end

mutable struct OutputData{T}
couplings::Vector{T}
constant::Float64

OutputData(N) = new{Float64}(Vector{Float64}(undef, N), 0.0)
end

N = 1_000
input = InputData(rand(N), 0.5)

function solve_system(input::InputData)
N = length(input.couplings)
# Make new output
out = OutputData(N)
# To solve the system, out.couplings
# is the ordered input.couplings plus some randomness
out.couplings .= (input.couplings .+ rand(N)) #should take about 3μs
# out constant is the penultimate coupling
out.constant = input.couplings[N-1]
out
end

# Now I would like to generate many systems,
# and measure the average of the second coupling

function observe_second_coupling(trials, N)
second_coupling_mean = 0.0
input = InputData(rand(N), 0.5)
for trial in 1:trials
output = solve_system(input) #This is the problematic point
second_coupling_mean += output.couplings[2] / trials
end
second_coupling_mean
end

using BenchmarkTools
@benchmark solve_system(\$input)
@benchmark observe_second_coupling(\$1000, \$N)
``````

Solving one system takes about 30μs on my machine. The solving arithmetic adds up to about 6, so I think a lot of the execution time comes from allocations. Running the observable code takes Nx30μs, so I suspect that I’m paying for reallocating `output` every time. I am actually running N = 2000, ntrials = 1e6 for the full code and it takes about an hour, which is a pain. Over loops of ntrials I would like to avoid this allocation and hopefully get a speed up.

The problem seems to be the line `output = solve_system(input)`. It is type stable and size stable given the input, so it should be possible to just overwrite the memory used by the old output system and thus avoid any allocations. I don’t know how to do this in Julia yet. Some ideas are:

1. Use a mutable buffer struct that is passed into a mutating `solve_system!` function. This seems to work, but gets very messy, and means maintaining a mutating and n0n-mutating version of `solve_system` in parallel.
2. Pass everything around as arrays to be mutated, which seems to be an even less coherent version of 1.
3. Find a fancy package that fixes the problem - I looked at Accessors.jl and Setfields.jl but I don’t think they work like this.
4. Find a way of telling Julia enough about the memory requirements of OutputData.new() that it goes “oh this is fine we can reuse this memory”.

I think 4 would be my preferred solution. Does anyone have any ideas? TIA!

Just make `InputData` and `OutputData` immutable, and construct new ones (that share `couplings`.

``````function solve_system(input::InputData)
N = length(input.couplings)
# Make new output
out = OutputData(N)
# To solve the system, out.couplings
# is the ordered input.couplings plus some randomness
out.couplings .= (input.couplings .+ rand(N)) #should take about 3μs
# out constant is the penultimate coupling
out = OutputData(out.couplings, input.couplings[N-1])
out
end
``````

Well, you can allocate OutputData as a global variable and directly reuse this mutable struct. And it’s your responsibility to ensure that you won’t overwrite the struct in usage…
A simple solution like this:

``````const GlobalCache = OutputData(0)
const isGlobalCacheAllocated = Ref{Bool}(false)
function allocateOne()
@assert !isGlobalCacheAllocated[]
isGlobalCacheAllocated[] = true
return GlobalCache
end

function deallocate()
@assert isGlobalCacheAllocated[]
isGlobalCacheAllocated[] = false
end
``````

Thanks for this @Oscar_Smith! This seems to get rid of one of the allocs in the solve step:

``````struct SInputData{T}
couplings::Vector{T}
constant::Float64
end

struct SOutputData{T}
couplings::Vector{T}
constant::Float64

SOutputData(N) = new{Float64}(Vector{Float64}(undef, N), 0.0)
SOutputData(C, K) = new{Float64}(C, K)
end

N = 1_000
input = SInputData(rand(N), 0.5)

function solve_system(input::SInputData)
N = length(input.couplings)
# Make new output
out = SOutputData(N)
# To solve the system, out.couplings
# is the ordered input.couplings plus some randomness
out.couplings .= (input.couplings .+ rand(N)) #should take about 3μs
# out constant is the penultimate coupling
out = SOutputData(out.couplings, input.couplings[N-1])
out
end
``````

But I still get ~2000 allocs for ntrials = 1000.

Your other allocation is `rand(N)` which you can replace with `rand.()` to remove the allocation.

1 Like

You can use a broadcasting trick here that avoids allocations:

``````out.couplings .= (input.couplings .+ rand.())
``````

Notice the dot in `rand.()`.