Pool allocation at run time - how hard is it?



The main loop of nonlinear conjugate gradient looks like this:

  g = grad_f(x)
  <compute beta>
  p = -g + beta * p
  <compute alpha>
  x += alpha * p

Here, g, p, x are all n-vectors. If implemented in Julia exactly as written, then the above loop will cause many allocations and deallocations as x, p, g are recomputed. There are a number of techniques available in Julia to avoid this overhead, but all of them obfuscate the plain meaning of the code.

In principle, these allocations and deallocations could be completely avoided if the run-time system notices that it is repeatedly allocating and deallocating vectors of length n, and in response creates a finite-length pool of such vectors and quickly selects the first free vector in the pool for each new assignment statement.

This pattern of frequently reusing vectors and matrices of a particular size occurs commonly in scientific computation. How difficult would it be for the run-time system to detect it and to switch to a finite pool when it would be useful?


Not impossible but very hard. And you can achieve that with broadcast fusion, once the dot operators are parsed this way.

  <compute beta>
  p .= -g .+ beta .* p
  <compute alpha>
  x .+= alpha .* p

in v0.6 once broadcast fuses shouldn’t allocate as @yuyichao said. I don’t that that’s bad at all.


another idea: the runtime could spot that p has only one reference (and will be marked for GC), and the new value has the same size (this, even the compiler could determine in some cases), thus the array can simply be recycled.


That had not occurred to us, Dude.

Seriously, though, this is a well-known optimization technique that we will eventually implement, but it’s not exactly a “why don’t you just” kind of thing: