Say my package needs some global buffers to avoid allocations of many small vectors of a fixed type but with variable length (with benchmark results showing that this gives indeed a nice speedup). This could look as follows:
const BUFFER = Vector{Vector{Int}}()
const BUFFER_LENGTH = 64
function __init__()
Threads.resize_nthreads!(BUFFER, Vector{Int}(undef, BUFFER_LENGTH))
end
function do_some_work(args...)
if some_length <= BUFFER_LENGTH
buffer = BUFFER[Threads.threadid()]
else
buffer = # allocate
end
# use buffer etc.
end
This worked with static scheduling in the past. Now, we have two more challenges:
- Dynamic scheduling of threads
- Precompilation with executing work such as SnoopPrecompile.jl does not see buffers set up in
__init__
.
What is currently the best way to set up some global buffers for a case like this?