Reusing temporary variables (ed: Object Pools)

In most of my producton codes I am very cautious about making performance critical parts non-allocating and passing in temporary variables (e.g. large arrays, NamedTuples of arrays, etc). This results in code of the form

myf(model, x) = myf!(alloc_tmp(model, x), model, x) 
function myf!(tmp, model::MyModel, x)
   # implement the model
end

This is mostly fine, but it results in quite a bit of manual extra code management, and as the complexity of the codebase grows I am running into more and more difficult edge cases.

Are there any code patterns or packages that help manage and simplify this kind of strategy? E.g. I’m thinking of some kind of “heap” of allocated temporaries where a function can retrieve them when called again.

Apologies if the question is a bit vague - I’m not 100% certain what I’m looking for.

1 Like

You are probably looking for Object pool pattern - Wikipedia

1 Like

yes - that sounds exactly right. Has anybody experimented with this in Julia?

Can I think of it a little bit has hacking the garbage collector?

I see

but i says “experimental”. Any experience with this or similar packages

1 Like

It’s not hacking the garbage collector, but circumventing it with ostensibly more efficient management. I’d use an object pool only as a last resort. The advantage over your current practice is that it isolates all the management to one point of contact.

Thanks for your comments. You say “as a last resort”, but also that this has an advantage over my current practise, though I didn’t understand your point about “isolating all the management to one point of contact”.

“Morally” a pool and what I do seem very similar/same, but it feels like I’m manually implementing something like an object pool again and again and again.

I say that partly because of the corner-case problems you are encountering or are yet to encounter, and because it may not out perform GC.

Yes, the single point of contact is that the code is implemented once and reusable, and you can encapsulate all the corner case handling, such as thread safety.

1 Like

Thanks - I appreciate your thoughts - that really helps.

I’m still hoping somebody will tell me about practical experience with a julia package. ??!!

@Tamas_Papp can you comment why your package isn’t registered?

Can you share some of these difficult edge cases? For the most part, it seems that all you have to do is write a non-allocating function foo!() and then add an allocating version foo() which is usually a one-liner.

Almost always related to AD - when the input is e.g. a vector of duals, I have to come up with the derived types needed for temporary arrays. I have no MWE I’m afraid. So as we are starting to switch to ChainRules this may well never be a problem anymore.

Sometime things also get weird when the temporary variables depend too much on the input.

Note it is not quite as easy as having an allocating and a non-allocating version. The issue is that the allocation code must normally be called by an outer routine that makes many calls to myf!. Hence the “interface” must be reasonably generic. A Pool would completely solve that problem and get rid of the need for a generic interface for alloc_temp

I remember reading somewhere on this forum that the garbage collector already does some form of memory pooling, i.e. if you allocate and free a large number of equally sized arrays, then the garbage collector is smart enough to just reuse the same chunk of memory.

We can test this hypothesis in two ways. First, we can simply allocate and free a large number of equally sized arrays and take a look at the memory addresses:

julia> a = [pointer(Vector{Int}(undef, 100_000)) for i = 1:10_000]
       length(unique(a))
532  # <- Much less than 10_000!

julia> a = [pointer(Vector{Int}(undef, 1_000_000)) for i = 1:10_000]
       length(unique(a))
56  # <- Even fewer if we increase the array size. 

Second, we can check whether allocating and freeing equally sized arrays is actually faster than allocating and freeing varyingly sized arrays:

# Constant size
julia> @btime for i = 1:10_000; Vector{Float64}(undef, 55_000); end
  7.807 ms (20000 allocations: 4.10 GiB)

# Varying size
julia> @btime for i = 1:10_000; Vector{Float64}(undef, rand(10_000:100_000)); end
  13.001 ms (20000 allocations: 4.05 GiB)

Both of these experiments indicate that the vanilla garbage collector indeed does some form of memory pooling. It may of course still be possible to improve on this by writing your own memory manager which exploits domain-specific information, but doing so is quite a bit of work, incurs a high risk of introducing subtle bugs, and probably requires a lot of hand-tuning to really be worthwhile. So all in all, I’d say your options are 1) write your code to be allocating and trust that the garbage collector will handle temporary memory for you, or 2) write your code to be non-allocating and eliminate all the guesswork regarding whether or not memory management will be a performance bottleneck in your application.

2 Likes

so all things considered would you revise your thoughts on ZuLIP and conclude it may just not be worth it worrying about allocations?

I guess it’s a trade-off between having ultimate control of what’s going on in your code and ease of coding. I don’t have much experience in how these two aspects play out in your application. In linear algebra, writing non-allocating code is mostly a question of quickly checking what temporaries you actually need and then following some simple design patterns to separate all the allocations. For the purpose of automatic differentiation, I could imagine things get more complicated, but maybe we need some concrete examples to get to the bottom of this.

In the end I rewrote the code so that I didn’t end up using this approach. I still think it is viable but I expect corner cases will turn up with use that would need some careful handling. If someone wants to follow up and contribute to this approach, I am happy to register it, but can’t provide support at the moment.

3 Likes

Thank you for the explanation

I explored a similar concept in this thread and you can find the code I used at GitHub - halleysfifthinc/SafeBuffers.jl: Concurrency/multi-threading safe pre-allocated mutable buffers (e.g. arrays, etc.)

It is usable as a package but is not registered nor do I have any plans at the moment to register/maintain it. That said, it solved my problem. After a quick perusal of Tamas’ ObjectPools.jl, two main differences are that my interface is thread safe and that the type of the pool (e.g. Ref's would be valid, etc) isn’t restricted.

3 Likes