Function closure with large arrays/matrices scaling horribly


So I am trying to set up a useful function f to pass around the rest of my code as follows:

function f_prepostproc(params)
   # do stuff
   f = f_generator(params)
   # evaluate f at certain x values, save to file
   return f

function f_generator(params)
   A = A_gen(params)
   B = B_gen(params)
   function f(x)
      # do an operation on A, B(x), get C (large matrix)
     return C #
  return f

function main(params)
   f = f_prepostproc(params)

Unfortunately, this takes upwards of an hour to return from this call stack, especially as the size of the matrices A and array B get large (~14,000 x ~14,000 and ~80,000 respectively). After this, the code takes a negligible amount of time to actually evaluate the f(x) as it is passed around in the main function. I am new to function closure in julia, which I understand is invoked here. Is there anything immediately obvious with this setup, or should I rework it to define f(x) under the main scope?

Many thanks

Are you sure it is not a swapping issue?


So OP should probably scale down the problem a bit and post a complete MWE?

I doubt it, as I should have more than enough memory to deal with these sparse matrices.

But that is a good guess, and I think I was running into that before I got an extra 32Gb stick in my machine. I am not sure how to check the swapfile usage in julia, will look into that.

Swap file is part of your OS. So maybe you should look into that manual? Without a complete MWE we can only guess about possible memory usage.

Since the inner function captures the outer variables, my guess would be that this capturing requires redefinition (and thus recompilation) of the inner function. This would be consistent with the observed behavior of f being fast after the initial call, as the function is already compiled.

Do you have timings/proper benchmarks to compare to? An avenue for investigating which part is actually slow is PPerf.jl, the Profile stdlib or tools like ProfView.jl.

Makes sense. Precompilation with large arrays can be slow. To monitor the swap usage, one can just use htop interactively (on *nix systems).
@vivian-rogers perhaps calling f with smaller matrix sizes (just for precompilation) might speed up things.