I have a rather naive question. I have a function computing heuristic value of a state in planning using graph neural network. This function takes state, converts it to a graph uses network and return a single scalar value. So it seems like an ideal candidate for Bumper.jl.
So ideally, I would like to do something like @no_escape(my code goes here) and magically all allocation would happen through the bumper.
Is it possible? Is there a plan for this? I can see how this can be useful for example for training neural networks.
That would make sense. I have tried to naively use AllocArrays, but no luck
julia> @benchmark map(states) do s
with_allocator(buffer) do
r = model(pddle(s))
AllocArrays.reset!(buffer)
r
end
end
BenchmarkTools.Trial: 6 samples with 1 evaluation.
Range (min β¦ max): 190.468 ms β¦ 201.215 ms β GC (min β¦ max): 13.92% β¦ 17.44%
Time (median): 193.559 ms β GC (median): 14.00%
Time (mean Β± Ο): 194.199 ms Β± 3.948 ms β GC (mean Β± Ο): 14.67% Β± 1.60%
β β β ββ β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
190 ms Histogram: frequency by time 201 ms <
Memory estimate: 985.04 MiB, allocs estimate: 20556.
julia> @benchmark map(model β pddle, states)
BenchmarkTools.Trial: 6 samples with 1 evaluation.
Range (min β¦ max): 182.934 ms β¦ 192.560 ms β GC (min β¦ max): 10.02% β¦ 14.59%
Time (median): 188.578 ms β GC (median): 11.65%
Time (mean Β± Ο): 188.544 ms Β± 3.385 ms β GC (mean Β± Ο): 11.97% Β± 1.61%
β β β β β β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
183 ms Histogram: frequency by time 193 ms <
Memory estimate: 985.04 MiB, allocs estimate: 20137.
I think we should probably ask @ericphanson for more help. But in general you are still at the mercy of Bumper.jl limitations. For a full on control we should wait until mmtk (custom GC heuristics)is integrated as a gc backend then we can see how developers are able to use it.
Well itβs hard to say without your actual code. The way AllocArrays work is that you create an AllocArray yourself, pass it into your code, and then whenever the code calls similar, that call will use the bump allocator to produce a new AllocArray with bump-allocated memory. For example,
using Flux, AllocArrays
model = Chain(Dense(1 => 23, tanh), Dense(23 => 1; bias=true), only)
data = [[x] for x in -2:0.001f0:2]
alloc_data = AllocArray.(data)
function run_model(b, model, data)
with_allocator(b) do
result = sum(model, data)
reset!(b)
return result
end
end
b = BumperAllocator(2^25) # 32 MiB
run_model(b, model, data)
The idea here is we can skip the intermediate allocations which may be hidden in library code, as long as that library code uses similar.
See here for a more complicated example, using this for low-allocation inference of a Flux model.
edit: though make sure r is a scalar here, not a 1-element array or such! You can use a CheckedAllocArray for a slow version that verifies all memory accesses are safe (useful for testing).
BTW if you get it working, Iβd be interested to see the before-and-after benchmarking . When I tried it on that Flux model I saw 100x less allocation, but similar runtime; Iβm curious how it fares in other settings.
it seems pretty featureful at this point and none of the open issues look like bad bugs, so my assumption is that it is reasonably mature and code changes would be motivated by compat updates, new ideas, or bugfixes. Therefore 7 months of inactivity doesnβt seem like a bad sign. But @Mason could probably say more.