What is the proper way to benchmark a using statement?

If calling a using statement for the first time since restart, it takes longer than calling after that in a different Julia instance:
First instance since restart

julia> @time using SpecialFunctions
  1.596852 seconds (68.60 k allocations: 4.347 MiB, 0.33% compilation time)

Second instance

julia> @time using SpecialFunctions
  0.188694 seconds (84.58 k allocations: 5.072 MiB, 5.71% compilation time)

So what is the proper to benchmark the time it takes to call using on a module? Do you actually have to restart the computer every time to get an accurate measurement?

For using I would be more interested in the first execution time (time to first plot), because I wouldn’t be using a package twice.

That’s the thing. If you restart your computer and @time the first using in two different Julia processes you will get significantly different times, the first always being much longer than the second like in the above.

Are you using Windows in a corporate environment by any chance?

I suffered a similar issue (but much worse latency) and concluded it was because of corporate IT security policies. Extended discussion here.

2 Likes

I think if you want worst-case measurements, you can start julia with JULIA_DEPOT_PATH set to a temporary (empty) folder. Then you can @time both Pkg.add("SpecialFunctions") and using SpecialFunctions.

This just sounds like HDD latency, are you using an SSD or HDD?

The second time you do using, presumably the data is not read directly from the HDD, rather it’s still cached in RAM.

2 Likes

There are several different things you might wish to measure. This is covered in one subsection of the Performance tips in the Manual:

The in-development version of the docs is more expansive:

You may also want to try @time_imports:

5 Likes

This is the behavior I see on my computer and would explain the different timings in different Julia processes – once Windows has scanned a file, it doesn’t have to intercept read operations from other processes.

3 Likes

I don’t see it in the case of SpecialFunctions, but sometimes @time_imports adds up to more than the first @time using. I’m using Downloads as an example: 159ms > 120ms, more or less consistent across several restarts. When I tried this a couple months ago, the difference was even greater (205ms >> 74.3ms), which I can’t explain because I was using v1.11.5 both times.

One process:

julia> @time_imports using Downloads
               ┌ 0.0 ms NetworkOptions.__init__()
    115.0 ms  NetworkOptions 97.13% compilation time
      9.9 ms  ArgTools
               ┌ 0.3 ms nghttp2_jll.__init__()
      3.3 ms  nghttp2_jll
               ┌ 1.7 ms LibCURL_jll.__init__()
      4.6 ms  LibCURL_jll
               ┌ 0.0 ms MozillaCACerts_jll.__init__()
      4.2 ms  MozillaCACerts_jll
               ┌ 0.0 ms LibCURL.__init__()
      2.2 ms  LibCURL
               ┌ 0.4 ms Downloads.Curl.__init__()
     20.0 ms  Downloads

julia> +(115,9.9,3.3,4.6,4.2,2.2,20) # ms
159.2

julia> @time_imports using Downloads # already loaded so nothing happens

julia> @time using Downloads # only needs to handle names, 1.3ms
  0.001338 seconds (730 allocations: 46.570 KiB)

Another process:

julia> @time using Downloads # < 0.1592s
  0.119868 seconds (68.07 k allocations: 3.933 MiB, 39.20% gc time)

julia> @time using Downloads # consistent with previous process
  0.001373 seconds (730 allocations: 46.570 KiB)

This is unrelated to the timing on first process vs other processes, but I wonder if it could also be explained by interruptions.

Can’t test myself because I only have SSDs, but would it be right to presume that the cache is eventually cleared if the OS starts thinking we won’t start Julia anytime soon?

1 Like