Reusing buffer arrays by providing them as arguments to functions

I have a function that needs a Vector{Int} for buffering, which I would like to reuse when I call that function over and over again in a loop. My idea was to create that vector outside and clearing it before every use. I had hoped that empty! doesn’t change the vector’s capacity (similar to C++) and would have constant complexity (omit calling Int destructors) without allocations (simply overwriting some length attribute).

However, empty! isn’t “free”:

julia> tmp = Int[];

julia> @time empty!(tmp);
  0.000036 seconds (4 allocations: 160 bytes)

How do I prevent allocation when clearing buffers?
What is the proper way to reuse those buffers?

A more elaborate example: consider computing the prime factorization of an integer

function factorize(n,
                   primes = Primes(2_000_000),
                   factors = Int[],
                   exponents = Int[])
  n >= 4 || return [n], [1]

  limit = floor(Int, sqrt(n))
  for p in primes
    p <= limit || break
    n % p == 0 || continue

    push!(factors, p)
    push!(exponents, 0)
    while n % p == 0
      exponents[end] += 1
      n ÷= p
      n % p == 0 || break
      exponents[end] += 1
      n ÷= p
      limit ÷= p
    n != 1 || break

  if n != 1
    push!(factors, n)
    push!(exponents, 1)

  return factors, exponents

which is called very often when looking for the first number x = bar(n) having at least 500 divisors:

function foo()
  primes = Primes(2_000_000)

  # buffers
  f = Array{Int}(undef, length(primes)÷2)
  e = similar(f)

  n = 1
  x = bar(n)
  _, e = factorize(x, f, e)
  while numdivisors(e) < 500
    n += 1
    x = bar(n)
    _, e = factorize(t, primes, f, e)

An alternative would be to rewrite factorize to be used like for (f, e) in primefactors(n, primes) but that would be a rather big undertaking. Another way would be to replace all the push!es and do all the index operations manually (plus returning the length in some form as resize! isn’t free either).

1 Like

However, empty! isn’t “free”

It actually is; you’re seeing some overhead from using tmp as a global variable. Here’s a better way to benchmark:

julia> tmp = Int[];

julia> using BenchmarkTools

julia> @btime empty!($tmp)
  3.054 ns (0 allocations: 0 bytes)
0-element Array{Int64,1}

Also, running @show itself in global scope will allocate.

1 Like

This might be how it’s currently implemented, but is that behavior documented? If not, I’d be quite wary of relying on that.

I would instead do something similar to your final suggestion (indexed operations), but instead of doing it all manually, create a small wrapper around an array and use that as a reusable buffer. Indexed operations are also slightly faster than push!, since push! does a library ccall for each invocation. Usually that’s not something you need to worry about, but since you’re concerned about allocations and buffering in the first place, it sounds like you’re after maximum performance.

You can read about the performance issue associated with push! here, and on that page there’s also a sample implementation of a wrapped array (PushVector) that might serve as inspiration.

Before refactoring too much though – please make sure that you’ve benchmarked your code and established that these allocations are actually a bottleneck and the right thing to optimize. I’m a bit concerned that you fret over 160 bytes allocated which doesn’t even seem to be in a hot code path.

1 Like

Thanks for the hints! I was wrong about where the allocations came from. After some proper profiling I discovered that return factors, exponents was the issue, so I refactored my code:

function factorize(n, primes = Primes(2_000_000))
  f = Int[]
  e = Int[]
  factorize!(f, e, n, primes)
  return f, e

function factorize!(factors, exponents, n, primes)
  # deleted: n >= 4 || return [n], [1]
  if n < 4
    push!(factors, n)
    push!(exponents, 1)

  # ...

  # deleted: return factors, exponents

No, it’s not documented I think. Calling empty! eventually leads to jl_array_del_at_end(a, n - dec, dec, n); (in src/array.c). My guess was that it behaved as I assumed originally, but I didn’t research how STORE_ARRAY_LEN is used/resolved.

Yes I am, I really enjoy squeezing all the performance I can get out of my code (without “hacking” too much :wink:). I’m learning the language while solving some Project Euler riddles.

Your link mentions some effort porting arrays to Julia. As I understand it, the whole runtime is written in C. Are there any plans to port the runtime to Julia?

Nice! All that is left is

@inline function Base.setindex!(v::PushVector, val, i)
  @boundscheck checkbounds(v, i)
  @inbounds v.v[i] = val

function Base.empty!(v::PushVector)
    v.l = 0

function Base.similar(v::PushVector)
  PushVector(similar(v.v), 0)

which leads to a ~5% performance increase in my example (measured after the ~50% increase due to reusing the buffers). In another riddle I earned ~18% by using PushVectors.

1 Like