I’m working on a project where I have to simulate many systems and track some observable across the many systems. When I try to create new systems in for-loops, I end up getting allocations of O(n_trials)
when it isn’t necessary. Here is my very contrived MWE:
mutable struct InputData{T}
couplings::Vector{T}
constant::Float64
end
mutable struct OutputData{T}
couplings::Vector{T}
constant::Float64
OutputData(N) = new{Float64}(Vector{Float64}(undef, N), 0.0)
end
N = 1_000
input = InputData(rand(N), 0.5)
function solve_system(input::InputData)
N = length(input.couplings)
# Make new output
out = OutputData(N)
# To solve the system, out.couplings
# is the ordered input.couplings plus some randomness
out.couplings .= (input.couplings .+ rand(N)) #should take about 3μs
sort!(out.couplings) #should take about 3μs
# out constant is the penultimate coupling
out.constant = input.couplings[N-1]
out
end
# Now I would like to generate many systems,
# and measure the average of the second coupling
function observe_second_coupling(trials, N)
second_coupling_mean = 0.0
input = InputData(rand(N), 0.5)
for trial in 1:trials
output = solve_system(input) #This is the problematic point
second_coupling_mean += output.couplings[2] / trials
end
second_coupling_mean
end
using BenchmarkTools
@benchmark solve_system($input)
@benchmark observe_second_coupling($1000, $N)
Solving one system takes about 30μs on my machine. The solving arithmetic adds up to about 6, so I think a lot of the execution time comes from allocations. Running the observable code takes Nx30μs, so I suspect that I’m paying for reallocating output
every time. I am actually running N = 2000, ntrials = 1e6 for the full code and it takes about an hour, which is a pain. Over loops of ntrials I would like to avoid this allocation and hopefully get a speed up.
The problem seems to be the line output = solve_system(input)
. It is type stable and size stable given the input, so it should be possible to just overwrite the memory used by the old output system and thus avoid any allocations. I don’t know how to do this in Julia yet. Some ideas are:
- Use a mutable buffer struct that is passed into a mutating
solve_system!
function. This seems to work, but gets very messy, and means maintaining a mutating and n0n-mutating version ofsolve_system
in parallel. - Pass everything around as arrays to be mutated, which seems to be an even less coherent version of 1.
- Find a fancy package that fixes the problem - I looked at Accessors.jl and Setfields.jl but I don’t think they work like this.
- Find a way of telling Julia enough about the memory requirements of OutputData.new() that it goes “oh this is fine we can reuse this memory”.
I think 4 would be my preferred solution. Does anyone have any ideas? TIA!