Hi everyone,
I have to execute a flux model inside a Monte Carlo simulation. I am currently working on the CPU; I am facing the problem of executing a model(configuration)for each step of the Monte Carlo, which allocates memory. Since this runs on a nonparallelizable loop (each iteration depends on the result of the previous one, so using batches is not a solution), I got tons of allocations that make the GC hit continuously.
By inspecting the Profiling of the allocation, I get the following fireplot
Wow, AllocArrays seems to effectively reduce by far the memory allocated by Flux with almost zero effort. Some memory is still allocated but reduced by a factor of 50. I will implement it in the actual program and see how it goes.
using Bumper, Flux, AllocArrays
x = zeros(Float32, 32, 32, 1, 1)
x_new = AllocArray(x)
model = Chain(Conv((3,3), 1=>4, relu; pad=1),
x ->reshape(x, :),
Dense(4*32*32=>4))
function simple_run(model, data, iterations)
result = model(data)
tmp_input = similar(data)
for i in 2:iterations
tmp_input .= data
tmp_input .+= i
result .+= model(tmp_input)
end
return result
end
function bumper_run(model, data, iterations)
b = UncheckedBumperAllocator(2^20)
result = model(data)
tmp_input = similar(data)
with_allocator(b) do
for i in 2:iterations
tmp_input .= data
tmp_input .+= i
result .+= model(tmp_input)
reset!(b)
end
end
return result
end
@time simple_run(model, x, 1)
# 0.000129 seconds (58 allocations: 76.188 KiB)
@time simple_run(model, x, 10000)
# 0.501996 seconds (560.00 k allocations: 703.434 MiB, 1.98% gc time)
@time bumper_run(model, x, 1)
# 0.000151 seconds (79 allocations: 1.075 MiB)
@time bumper_run(model, x, 10000)
# 0.554891 seconds (580.02 k allocations: 37.540 MiB, 1.21% gc time)