Struggling to understand how to use PrecompileTools

Hello!

I wrote a bunch of libs in Julia that all work together. One lib in particular exists just to read and stream a certain data from HDF5 files. One of the functions in this lib has the job of, given one of these HDF5 files, read the contents of its summary and display it. Note, it doesn’t compute the summary from the data, that has already been done and stored inside the HDF5 alongside the data. This function reads only the summary and displays it on the screen. It’s just a few lines long.

Now, the thing I just described, after I restart Julia, and after all the imports have run, takes 15 seconds to run the first time. This is a crazy amount of time. My guess is that this time is spent compiling something, so I thought I would look into PrecompileTools so that this is done just once and doesn’t have to happen every time I restart Julia. Schematically I have:

import ...
(...)

@time printstats(filename)

The last line prints

  2.680033 seconds (847.55 k allocations: 44.609 MiB, 99.26% compilation time);

First off, this isn’t accurate. I timed it on my wristwatch and that line takes 15 seconds the first time around. No idea why @time says 2.7 secs.

Second time around it prints

  0.016743 seconds (162.97 k allocations: 6.526 MiB)

Alright, this is what I would expect.

So I tried using PrecompileTools. I add it to the package’s env, I add using PrecompileTools and then I write:

@setup_workload begin
    @compile_workload begin
        printstats(filename)
    end
end

I restart Julia, do my imports, and the run the timed line, which now prints

  0.105645 seconds (168.01 k allocations: 6.763 MiB, 82.82% compilation time)

Again not accurate, my wristwatch shows 9 and half seconds.

Let me emphasize what a microscopic amount of work reading and printing this data is. None of these times can be explained by the work that actually matters to me.

What’s going on here? Why are the @times wrong? Why does it seem that PrecompileTools has very little effect on the times? What can I do to avoid these constant recompilations?

Thank you

PrecompileTools likely did its job there, taking 2.68 seconds down to 0.106. PrecompileTools can’t precompile everything, the typical reason being runtime dispatches of call signatures that weren’t cached by @compile_workload.

Could you clarify when you started and stopped the wristwatch exactly? Are you doing it for a script ran from a command prompt, are you starting the moment you hit Enter to execute that specific line in the REPL and stopping when the timing printout comes, etc?

That said, there are a few things that can be said. @time expands to a lot of code besides the input expression, and it only starts its timer before a compilation-specific timer and the input expression, then stops after. The prior setup and the timing printout won’t be counted, though a discrepancy of >10 seconds isn’t normal, you can check with @time @time .... The compiler may also do some work on the input expression before the timer starts and fail to get counted, which can be remedied by @time @eval ..., but again, >10 seconds isn’t normal especially for a single call, though it’s worth a try whenever the compilation timing seems strangely low.

I also don’t think it could be explained by your system pausing the Julia process for a weirdly long time to work on other things, @time takes a difference in the time_ns and that uses the system clock AFAIK. If there is a deeper issue with your system clock, you can compare a wristwatch with other timers outside Julia to see, give 1-2 seconds leeway for printing and human reaction latency.

Could you clarify when you started and stopped the wristwatch exactly? Are you doing it for a script ran from a command prompt, are you starting the moment you hit Enter to execute that specific line in the REPL and stopping when the timing printout comes, etc?

The second. I started the moment I pressed enter on the @time expression, and stopped it when it returned. So I didn’t include the imports above it.

it only starts its timer before a compilation-specific timer and the input expression, then stops after.

I’m surprised by this. I thought that the point of @time is to time everything after it? Are you saying that not matching a wristwatch is intended behaviour?

I also didn’t understand the part about @time @time. Are you suggesting that I literally run

@time @time printstats(filename)

?

@time @eval is what I would suggest to include all of the compilation that might happen.

That’s what I’d do, just on my phone because I don’t have a wristwatch.

@time is intended to time the input expression of interest, not the timing setup or timing printouts. However, it should be negligible compared to human reaction latency.

Yes, the outer @time will set up a timing expression for the inner @time, thus timing the inner timing setup and printout on top of the input expression. On my system, it demonstrates that if I stopped a stopwatch at the inner @time printout, my reaction time delay (tens or hundreds of milliseconds) is much larger than what the inner @time added.

julia> @time @time nothing
  0.000000 seconds
  0.000334 seconds (17 allocations: 904 bytes)

In your case where something is going >10s wrong, it might be different.

Ok so.

I did: restarted Julia, ran imports, ran @time @time .... The I restarted Julia again, ran imports, then run @time @eval ...

The results are, respectively:

  3.071135 seconds (847.55 k allocations: 44.607 MiB, 99.29% compilation time)
  3.095743 seconds (849.02 k allocations: 44.678 MiB, 99.27% compilation time)
 8.923549 seconds (12.02 M allocations: 610.802 MiB, 1.40% gc time, 99.78% compilation time)

If I may make an observation… The last datapoint says that total is 9sec and 99.78% of it is compilation. That implies that 9sec * (1-0.9978) = 20ms is my code, which is consistent with the run time the second time around. So I don’t think this is a measurement problem, it’s just an actually long compilation time. And we’re drifting into investigating @time’s behavior.

How are you making the package available in the Julia session? Are you running in the package’s own env, or in an env where the package is added, or in an env where it’s deved, or are you using include?

I’m running the above experiments in an environment where the package in question is dev’d.

If you think this matters, then I’ll give you more detail. I have package A which defines an API and a function, and which I have in my local registry. I add this package into this environment. Then I have package B which implements that function, and I have dev’d this package into this environment. So in the set up part of the experiment, alongside the imports I have

(...)
import A: printstats
import B
(...)
@time printstats(filename)

So the function belongs to A which is added, but the code belongs to B which is dev’d.

Sounds like there may be some invalidation afoot. I assume filename is a String? In that case, the method definition in package B is type piracy: B doesn’t own either printstats or String. This can reduce the value of precompilation, depending what else A and B contain. Does A add any methods to printstats at all, or does it only define the empty function function printstats end?

It’s not really type piracy - I use Val types to dispatch depending on file extension.

The code in A is:

function printstats(path)
    _, ext = splitext(path)
    if ext == ".h5"
        printstats(Val{:H5}(), path)
    elseif ext == ".zst"
        printstats(Val{:ZST}(), path)
    else
        printstats(Val{:NOEXT}(), path)
    end
end

printstats(::Val{:ZST}, path) =
    ImplementationA(path; compression = :zstd)
printstats(::Val{:NOEXT}, path) =
    ImplementationA(path; compression = nothing)

The code in B is:

A.printstats(::Val{:H5}, path) = ImplementationB(path)

The reason I did it like this is that B implementation reads HDF5 files and this is a huge dependency. If I had put it into A directly, which contains the API definition, then any time I need to depend on the API it would pull in HDF5.jl as well.

I have no idea how any of this affects invalidations. Thoughts?

I’m not the most knowledgeable about these things, but the first thing you can try is wrapping the method definition in B in a @recompile_invalidations block. That is,

@recompile_invalidations begin
    A.printstats(::Val{:H5}, path) = ImplementationB(path)
end

Keep the @compile_workload block as-is. Does this make a difference?

And yes, this is a package structure that incurs invalidations. A printstats(::String) method that was compiled with only A loaded would have to be thrown out and recompiled after loading B. However, it’s not clear that this would have any implications for you in practice if the only precompilation workload is in B.

1 Like

Ugh. The difference that it makes is that @time @eval ... now shows 3 seconds, however it still takes 10 seconds to run (wristwatch). I mean this with the compile_workload, with your recompile_invalidations suggestion, after restart and after imports. Just the printstats line by itself still takes 10 seconds. What a nightmare.

Did you also run a Pkg.update() or similar in your test environment so that precompilation actually had a chance to happen?

EDIT: never mind, I guess import/using should also trigger that, so that it’s done by the time you get to @time ... either way

I hadn’t, I didn’t realize it was required. I did now, restarted, no difference.

Given the structure you’ve described, it sounds like B may be better suited as a package extension to A triggered by loading HDF5.jl. Maybe that would make it easier to achieve successful precompilation?

In any case, it would be easier to diagnose this with some runnable example code, like minimized versions of A and B that still exhibit the problem (even if the timings are different/less bad) that I can download and try myself. But I understand that minimizing is often difficult and laborious.

Ok so I was considering how much effort it would require for me to craft a “minimized” version of the above.

The first thing I tried was removing the unnecessary imports that existed in that notebook.

So now @time @eval ... prints 3 seconds and it actually takes 3 seconds. I interpret this as: one of the other imports causes some re-compilation to be necessary even tho package B is pre-compiling.

Let me ask you something, earlier you said this:

And yes, this is a package structure that incurs invalidations. A printstats(::String) method that was compiled with only A loaded would have to be thrown out and recompiled after loading B. However, it’s not clear that this would have any implications for you in practice if the only precompilation workload is in B.

Both A and B depend on other packages. Are you saying that e.g. if I ran B @compile_workload with e.g. 6 packages in its environment (basically, B’s dependencies), then when I tried to use B in any environment that has anything but those same 6 packages, the pre-compilation will be invalidated?

I currently have 14 packages in this project, it’s unlikely that the packages that I have loaded when doing research (e.g. using the packages) will be the same packages that I had loaded when I pre-compiled B…

No, this is not a general problem with having different environments. The problem with your specific structure is that the methodinstance that’s compiled if I do

using A
printstats("myfile")

is different from the one that’s compiled if I do

using A
using B
printstats("myfile")

Thus, if I do the following, an already compiled methodinstance has to be thrown away:

using A
printstats("myfile")  # this triggers compilation of printstats(::String)
using B  # this _invalidates_ the compiled methodinstance for printstats(::String)
printstats("myotherfile")  # this re-triggers compilation of printstats(::String)

The reason for the invalidation is that Bs definition of printstats(::Val{:H5}, path) changes the code that should be compiled for the first branch in printstats(::String)

(Everything above assumes no precompilation at all, but you can imagine how invalidations can also mess with precompiled methodinstances.)

1 Like

Ok, but the compiled code without the HDF5 branch is not part of this conversation, right? I’m precompiling with HDF5, and calling with HDF5. So it’s a surprised to me that removing other packages’ imports makes a difference.

I guess that at this stage this is too messy and too me-specific for you to be able to help any more… I guess what I have to do now I re-introduce other imports to see which one causes re-compilation. Maybe that will inspire a cause/solution.

Yeah it sounds like the problem may be some of your other dependencies invalidating stuff that’s deeper down in your implementation. Especially because even if a compiled instance of printstats(::String) were invalidated, recompiling that simple method should only take an instant as long as printstats(::Val{:H5}, ::String) and everything inside it has already been compiled.

I’d definitely try starting from the barest possible environment and building up from there to see when the problem occurs.