Seamless integration of Bumper.jl

Tomas_Pevny · June 20, 2024, 11:53am

Hi All,

I have a rather naive question. I have a function computing heuristic value of a state in planning using graph neural network. This function takes state, converts it to a graph uses network and return a single scalar value. So it seems like an ideal candidate for Bumper.jl.

So ideally, I would like to do something like @no_escape(my code goes here) and magically all allocation would happen through the bumper.

Is it possible? Is there a plan for this? I can see how this can be useful for example for training neural networks.

ufechner7 · June 20, 2024, 12:56pm

Probably difficult, because the packages you use need to have explicit Bumper support, which most packages currently don’t have.

AMJ · June 20, 2024, 2:30pm

You can use AllocArrays.jl.

Tomas_Pevny · June 21, 2024, 8:42am

That would make sense. I have tried to naively use AllocArrays, but no luck

julia> @benchmark map(states) do s
       with_allocator(buffer) do
               r = model(pddle(s))
               AllocArrays.reset!(buffer)
               r
           end
           end
BenchmarkTools.Trial: 6 samples with 1 evaluation.
 Range (min … max):  190.468 ms … 201.215 ms  ┊ GC (min … max): 13.92% … 17.44%
 Time  (median):     193.559 ms               ┊ GC (median):    14.00%
 Time  (mean ± σ):   194.199 ms ±   3.948 ms  ┊ GC (mean ± σ):  14.67% ±  1.60%

  █   █    █               ██                                 █
  █▁▁▁█▁▁▁▁█▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁██▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁█ ▁
  190 ms           Histogram: frequency by time          201 ms <

 Memory estimate: 985.04 MiB, allocs estimate: 20556.

julia> @benchmark map(model ∘ pddle, states)
BenchmarkTools.Trial: 6 samples with 1 evaluation.
 Range (min … max):  182.934 ms … 192.560 ms  ┊ GC (min … max): 10.02% … 14.59%
 Time  (median):     188.578 ms               ┊ GC (median):    11.65%
 Time  (mean ± σ):   188.544 ms ±   3.385 ms  ┊ GC (mean ± σ):  11.97% ±  1.61%

  █                           █ █          █          █       █
  █▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁█▁█▁▁▁▁▁▁▁▁▁▁█▁▁▁▁▁▁▁▁▁▁█▁▁▁▁▁▁▁█ ▁
  183 ms           Histogram: frequency by time          193 ms <

 Memory estimate: 985.04 MiB, allocs estimate: 20137.

AMJ · June 21, 2024, 8:57am

I think we should probably ask @ericphanson for more help. But in general you are still at the mercy of Bumper.jl limitations. For a full on control we should wait until mmtk (custom GC heuristics)is integrated as a gc backend then we can see how developers are able to use it.

ericphanson · June 21, 2024, 9:05am

It’s hard to tell, are you actually using an AllocArray there?

Tomas_Pevny · June 21, 2024, 9:14am

How do I test this?
It seems to me that I do not use it.

ericphanson · June 21, 2024, 9:31am

Well it’s hard to say without your actual code. The way AllocArrays work is that you create an AllocArray yourself, pass it into your code, and then whenever the code calls similar, that call will use the bump allocator to produce a new AllocArray with bump-allocated memory. For example,

using Flux, AllocArrays
model = Chain(Dense(1 => 23, tanh), Dense(23 => 1; bias=true), only)
data = [[x] for x in -2:0.001f0:2]
alloc_data = AllocArray.(data)
function run_model(b, model, data)
    with_allocator(b) do
        result = sum(model, data)
        reset!(b)
        return result
    end
end

b = BumperAllocator(2^25) # 32 MiB
run_model(b, model, data)

The idea here is we can skip the intermediate allocations which may be hidden in library code, as long as that library code uses similar.

See here for a more complicated example, using this for low-allocation inference of a Flux model.

Tomas_Pevny · June 21, 2024, 9:33am

Now I see, thanks a lot, this makes perfect sense.

That might be a bit complicated to change though.

ericphanson · June 21, 2024, 9:35am

would it just be this?

Tomas_Pevny:

map(states) do s
       with_allocator(buffer) do
               r = model(pddle(AllocArray(s)))
               AllocArrays.reset!(buffer)
               r
           end
           end

edit: though make sure r is a scalar here, not a 1-element array or such! You can use a CheckedAllocArray for a slow version that verifies all memory accesses are safe (useful for testing).

Tomas_Pevny · June 21, 2024, 10:01am

Nice, this is neat.

ericphanson · June 21, 2024, 10:35am

BTW if you get it working, I’d be interested to see the before-and-after benchmarking . When I tried it on that Flux model I saw 100x less allocation, but similar runtime; I’m curious how it fares in other settings.

svretina · June 21, 2024, 11:08am

sorry for the slight offtopic question,
is Bumber.jl still developed? it seems that
the last activity is 7 months ago

ericphanson · June 21, 2024, 12:01pm

it seems pretty featureful at this point and none of the open issues look like bad bugs, so my assumption is that it is reasonably mature and code changes would be motivated by compat updates, new ideas, or bugfixes. Therefore 7 months of inactivity doesn’t seem like a bad sign. But @Mason could probably say more.

Topic		Replies	Views
[ANN] ArrayAllocators.jl: Integrating calloc and aligned memory into Array construction Package Announcements package , announcement , array , zeros	14	1958	May 5, 2022
[ANN] ArrayAllocators.jl v0.3 composes with OffsetArrays.jl v1.12.1+ for faster zeros with offset indexing Package Announcements announcement	4	891	June 30, 2022
Which library supports a non-allocating neural network model Specific Domains package , gpu , machine-learning	13	796	August 17, 2022
[ANN] AcceleratedKernels.jl - Cross-architecture parallel algorithms for Julia's GPU backends Package Announcements package , announcement , gpu , performance , parallel	16	1378	September 27, 2024
[ANN] Metaheuristics.jl Package Announcements announcement , optimization	13	2268	December 9, 2024

Seamless integration of Bumper.jl

Related topics