Interpreting profiling flame charts: function runs dramatically faster after calling unrelated function

I am working on a code base to solve a large dynamic economic model with heterogeneous agents.

I coded up two different iterative algorithms for solving the model. These are wrapped in functions that I’ll call f1 and f2. These each make repeated calls to functions that carry out a single iteration step, call these g1 and g2.

So f1 looks somethign like:

function f1!(x,tol)
        while resid<tol
                 xnew = g1(x)
                 resid = abs(xnew - xold)
                 @show resid
                 x = xnew

f2 would have the same structure, but calling g2 instead of g1.

g1 and g2 inturn make calls to some shared helper functions (also to functions that are unique to each) but do not calls each other.

I’m finding that the execution speed of f2 (and by extension g2) is dramatically faster if I call if g1 just once before executing f2. Since f2 performs many iteration, it seems unlikely that this would be a compile time issue; timing using @time shows that compile time is minimal either when g1 has been called first or otherwise.

I tried profiling the code and got quite strange results. Here is the profile for g2 when g1 has been called first (using ProfileView.jl):

And here is the profile for g2 when g1 was not called at all (i.e., start julia, then call only g2):

The red areas in both profiles are “boot.jl, eval”, which from what I understand is to be expected given I was calling the functions from the REPL.

I don’t know what to make of the large white space to the left of the second flame chart. It almost looks like the first chart was simply squeezed to the right but a close inspection shows this is not exactly the case.

Grateful for any ideas anyone might have.

What happens if you run g2 twice, without ever running g1? Do you get the second profile and then the first?

More interesting behaviour. I had been running these functions from inside a script that I would execute from the REPL using include("... .

If I just write two calls to g2 in a script and then profile, I get the second flame chart.

If I then execute g2 from the REPL and then again profile, I get a flame chart that looks more like the first one. In addition, if I then execute f2, I get the fast run time that I was getting when my script included a call to g1 before executing f2.

This is a conjecture but it seems as though g2 (or more likely one of the helper functions that is shared with g1) is failing to properly compile until I run either g1 first or g2 from the REPL.

Honestly that behavior doesn’t make much sense to me. Would you be able to create a reproducible example?

1 Like

Doesn’t make sense to me either. I’ll try to make a reproducible example but I’m not optimistic that it’ll be able to.

Still, that seems like the only path towards figuring this out.