The main loop of nonlinear conjugate gradient looks like this:
g = grad_f(x)
p = -g + beta * p
x += alpha * p
Here, g, p, x are all n-vectors. If implemented in Julia exactly as written, then the above loop will cause many allocations and deallocations as x, p, g are recomputed. There are a number of techniques available in Julia to avoid this overhead, but all of them obfuscate the plain meaning of the code.
In principle, these allocations and deallocations could be completely avoided if the run-time system notices that it is repeatedly allocating and deallocating vectors of length n, and in response creates a finite-length pool of such vectors and quickly selects the first free vector in the pool for each new assignment statement.
This pattern of frequently reusing vectors and matrices of a particular size occurs commonly in scientific computation. How difficult would it be for the run-time system to detect it and to switch to a finite pool when it would be useful?
another idea: the runtime could spot that p has only one reference (and will be marked for GC), and the new value has the same size (this, even the compiler could determine in some cases), thus the array can simply be recycled.