Avoid allocation of a Flux model on the CPU

mesonepigreco · October 26, 2024, 1:14pm

Hi everyone,
I have to execute a flux model inside a Monte Carlo simulation. I am currently working on the CPU; I am facing the problem of executing a model(configuration)for each step of the Monte Carlo, which allocates memory. Since this runs on a nonparallelizable loop (each iteration depends on the result of the previous one, so using batches is not a solution), I got tons of allocations that make the GC hit continuously.
By inspecting the Profiling of the allocation, I get the following fireplot

It seems that most of the allocation here happens in the Convolutional layers of the neural network.

The model is defined as

model = Chain(x -> 2x .- 1,
          x -> reshape(x, (8, 8, 1,1)),
		  Conv((3,3), 1=>4, tanh; pad=1),
		  Conv((3,3), 4=>8, tanh; pad=1),
		  x -> reshape(x, :),
		  Dense(8*64, n_observables, tanh))

And the function monte_carlo_run! calls the model continuously inside a for loop.

How can I avoid the allocations inside the convolutioinnal layers?

Tomas_Pevny · October 26, 2024, 3:08pm

If it is only for inference, you can try this GitHub - ericphanson/AllocArrays.jl: Arrays that use a dynamically-scoped allocator. If it works, report the results.

CarloLucibello · October 27, 2024, 5:04am

If you don’t have to train the network, you can define your own custom Conv layer that calls the non-allocating NNlib.conv!(y, x, w, cdims). See here:

github.com

FluxML/NNlib.jl/blob/master/src/conv.jl

## Convolution API
#
#  We provide the following generic methods, for 3d, 4d, and 5d tensors, calculating 1d,
#  2d and 3d convolutions, based on the rank of the input tensors, in both mutating and
#  non-mutating auto-allocating variants:
#   - Convolution:
#     - conv(x, w, cdims)
#     - conv!(y, x, w, cdims)
#   - Convolution data backpropagation
#     - ∇conv_data(dy, w, cdims)
#     - ∇conv_data!(dx, dy, w, cdims)
#   - Convolution filter backpropagation
#     - ∇conv_filter(x, dy, cdims)
#     - ∇conv_filter!(dw, x, dy, cdims)
#
#   All methods require a `ConvDims` object to define the dimensions and optional
#   elements of the convolution (padding, stride, dilation, kernel-flipping, etc...),
#   which is easily constructable through something like `DenseConvDims(x, w)`.  All
#   methods take in the `ConvDims` of the associated normal, forward-pass convolution,
#   that is, the following is legal:

This file has been truncated. show original

mesonepigreco · October 27, 2024, 8:12am

Wow, AllocArrays seems to effectively reduce by far the memory allocated by Flux with almost zero effort. Some memory is still allocated but reduced by a factor of 50. I will implement it in the actual program and see how it goes.

using Bumper, Flux, AllocArrays

x = zeros(Float32, 32, 32, 1, 1)
x_new  = AllocArray(x)

model = Chain(Conv((3,3), 1=>4, relu; pad=1), 
              x ->reshape(x, :), 
              Dense(4*32*32=>4))


function simple_run(model, data, iterations)
       result = model(data)
       tmp_input = similar(data)
       
       for  i in 2:iterations
           tmp_input .= data
           tmp_input .+= i
           result .+= model(tmp_input)
       end
       return result
end

function bumper_run(model, data, iterations)
       b = UncheckedBumperAllocator(2^20)
       result = model(data)
       tmp_input = similar(data)
       with_allocator(b) do
       for  i in 2:iterations
           tmp_input .= data
           tmp_input .+= i
           result .+= model(tmp_input)
           reset!(b)
       end
       end
       return result
end

@time simple_run(model, x, 1)
#  0.000129 seconds (58 allocations: 76.188 KiB)

@time simple_run(model, x, 10000)
#  0.501996 seconds (560.00 k allocations: 703.434 MiB, 1.98% gc time)

@time bumper_run(model, x, 1)
#  0.000151 seconds (79 allocations: 1.075 MiB)

@time bumper_run(model, x, 10000)
#  0.554891 seconds (580.02 k allocations: 37.540 MiB, 1.21% gc time)

Topic		Replies	Views
Is there a way to avoid allocations when calling a Flux model? Especially on a GPU Machine Learning memory-allocation , flux	5	877	May 4, 2021
Which library supports a non-allocating neural network model Specific Domains package , gpu , machine-learning	13	799	August 17, 2022
Flux: How to minimise the garbage collection time? Machine Learning question , flux	22	716	April 20, 2023
Allocation of Memory while evaluate a model Machine Learning flux	7	613	November 8, 2021
Flux runs out of memory Machine Learning memory-allocation , flux	25	4330	June 1, 2023

Avoid allocation of a Flux model on the CPU

Related topics