Slowdown _after_ package load

I have a somewhat odd “my package is slow to load” problem that I cannot find another example of, because the most serious slowdown occurs after the package loads:

julia> using Dates
julia> now()
2020-01-18T19:22:19.392

julia> @time using MyPkg
@time print("HI")
now()
 33.425458 seconds (35.61 M allocations: 1.904 GiB, 2.54% gc time)

julia> @time print("HI")
HI  0.000068 seconds (10 allocations: 464 bytes)

julia> now()
2020-01-18T19:33:11.904

I typed the commands as quickly as possible, one after the other. You can see that I only have ~33.5 seconds accounted for of the 11 minutes that my commands took to execute. The @time print("HI") was the line the interpreter appeared to be processing during the unaccounted-for time. It doesn’t actually matter what that line is and sometimes I don’t get to finish typing it before the interpreter lags. No other windows are affected. Also, this doesn’t happen after loading other packages, only MyPkg that I’m making.

If run a script containing the same commands, it takes ~35 seconds.

Here are my toaster’s details:

versioninfo()
Julia Version 1.3.1
Commit 2d5741174c (2019-12-30 21:36 UTC)
Platform Info:
  OS: Linux (x86_64-pc-linux-gnu)
      Ubuntu 18.04.3 LTS
  uname: Linux 5.0.0-37-generic #40~18.04.1-Ubuntu SMP Thu Nov 14 12:06:39 UTC 2019 x86_64 x86_64
  CPU: Intel(R) Core(TM) i7-7500U CPU @ 2.70GHz: 
              speed         user         nice          sys         idle          irq
       #1  2904 MHz    3142631 s       3458 s      74752 s    5914124 s          0 s
       #2  2904 MHz    2673208 s       2577 s      70172 s    6380178 s          0 s
       
  Memory: 7.7875518798828125 GB (1355.8671875 MB free)
  Uptime: 92171.0 sec
  Load Avg:  1.00146484375  1.1064453125  1.28125
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-6.0.1 (ORCJIT, skylake)
Environment:
  JULIA_NUM_THREADS = 2
  JULIA_CMDSTAN_HOME = /home/dev/cmdstan-2.21.0
  HOME = /home/dev
  WINDOWPATH = 2
  TERM = xterm-256color
  PATH = /usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/usr/local/games:/snap/bin
  JULIA_CMDSTAN_HOME = /home/dev/cmdstan-2.21.0

I do not use OMyRepl and Revise was not loaded for this example (just Julia accessed from terminal). Also, the package was already compiled and does nothing with InteractiveUtils. A single CPU goes to 100% during the mystery time, so it is doing something and I’m just not sure what.

What could cause that ~9.5 minute interpreter lag?

Most likely something in MyPkg is invalidating much of the code in Base. I’m guessing that you’re adding a method that applies very high in the hierarchy of some fundamental and widely used function. For example, if it redefines Base.getindex(a::Vector{T}, i::Int) where T, then every part of Julia that indexes a vector will have to be recompiled. convert methods might be another candidate.

You might be able to debug this with SnoopCompile’s @snoopi so you can see what inference is working on.

1 Like

I do extend:

  • Broadcast.broadcastable
  • length
  • show
  • print
  • convert
  • size
  • getindex
  • promote_rule

Sometimes I extend an abstract type, but never a type that I didn’t define.

As a first pass, I commented out all import Base: ... statements so my definitions should no longer extend anything. I still observe the same interpreter behavior.

I will check out SnoopCompile.jl later. Thanks for the reply.

You have a genuine mystery on your hands. If you can share the package publicly it would likely help. Otherwise, it’s hard to guess what might be going on. I’d still try SnoopCompile if you want to debug it. (You can alternatively start Julia with --trace-compile=compiles.log and compiled functions will be logged.)

2 Likes

I used the --trace-compile=compiles.log as follows:

  1. Started julia --trace-compile=compiles.log
  2. using MyPkg
  3. After it completed (in ~30 seconds), copied compiles_precompile.jl elsewhere
  4. type print("HI") and hit enter
  5. wait 10 minutes for “HI” to show up
  6. compared the logs

The logs were, unfortunately, identical. No new entries were made after I copied it.

  • I also tried @snoopi t_min=0.1 using MyPkg and it gave me similar results in terms of what it thinks was taking time / being compiled.
  • I tried @profile using MyPkg, but it didn’t really hint to what the problem is either as far as I could tell.

I think my main issue when troubleshooting is that the lag I’m experiencing is coming after my package loads, which means that all of these profiling attempts are poorly targeted because they profile a command that has, in the view of julia, finished. The CPU is at 100%, so something is clearly still happening, but I’m not sure what it is.

Anyway, the MyPkg code is very modular, so my next route of investigation is going to be chopping it in half and seeing if both halves have the problem or only one.

Before taking the steps described above, I decided to activate the dev environment (keep in mind that I have loaded this package before) and using ___ all of its components one by one to see if any given one of them causes the problem. I found a few things that struck me as odd:

  • Some of the packages had to precompile themselves (why? I have loaded MyPkg before and the versions haven’t changed, so shouldn’t they already be precompiled?)
  • CmdStan caused massive input lag after load. Digging through the CmdStan github issues, it appears I am not the only one having trouble on Julia 1.3: https://github.com/StanJulia/CmdStan.jl/issues/79#issuecomment-570215597

Jumping to CmdStan, I repeated the trick to identify NamedArrays as the culprit. Hopping to NamedArrays, none of the dependencies are problematic, but the package itself is. NamedArrays seems to extend large swaths of Base for AbstractDict type in addition to the NamedArray type. I think this is the problem.

I am going to try to remove it from CmdStan and see what happens.

1 Like

I’ve confirmed that removing NamedArrays from CmdStan fixes the problem. I will write an issue and link to this thread.

1 Like

It is an interaction of MCMCChains (one of CmdStan’s dependencies) and NamedArrays (another of the dependencies). Order doesn’t matter, but loading both causes the problem.

Trying to see if anyone else can replicate this issue.

For me, just using MCMCChains causes a multi-minute latency for tab-completion, but not if I’d first usinged all of its dependencies. Might be completely unrelated, but see https://github.com/JuliaLang/julia/issues/34098.

I ran Pirate Hunter on MCMCChains. It came up with several hits, but none of them looked like obvious candidates for the cause.

@tim.holy I’m a bit stumped by this as well – I think we’re finally at the point of frustration that we’re prioritizing fixing this, but it’s really not obvious to me what’s going on. See the issue.

The issue doesn’t really matter that much if you call an import/using statement at global scope. Using one line we know causes issues (using MCMCChains has the same behavior)

import MCMCChains: Chains

If you do this at global scope, there’s not really an issue. Load time in that case is basically the same as for anything else. It only appears to be the case when import MCMCChains: Chains is in a module of some imported package.

There’s a repository that replicates this behavior: https://github.com/itsdfish/HangTest. It’s approximately 111 seconds to call using HangTest, which is defined as

module HangTest
    import MCMCChains: Chains
end

Why should it matter if the import is used at global scope vs. module scope?

You can try with https://github.com/JuliaLang/julia/pull/34596 that fixes a similar issue (https://github.com/JuliaLang/julia/issues/34098).

Hi @Peter_Adelman, did you write up your findings. I removed NamedArrays as a dependency in CmdStan but don’t think that solved the issue completely.

With https://github.com/JuliaLang/julia/pull/34596 (Version 1.5.0-DEV.207 (2020-01-30) jb/repllatency/e797c78c5a (fork: 1 commits, 0 days)) and MCMCChains#master and HangTest#master I get

julia> @time import MCMCChains: Chains
 10.259800 seconds (18.49 M allocations: 964.282 MiB, 4.18% gc time)

and

julia> @time using HangTest
 10.610315 seconds (18.51 M allocations: 965.310 MiB, 4.28% gc time)

So it seems the PR fixes the MCMCChains/HangTest problem (on Travis the problems with HangTest could be reproduced also with Julia master, so I guess it’s actually due to the PR) :partying_face: However, IMO 10 seconds just for loading MCMCChains still doesn’t feel right, so I also timed loading MCMCChain’s dependencies:

julia> @time using AbstractMCMC                                    
  1.286339 seconds (1.86 M allocations: 89.479 MiB, 1.08% gc time)

julia> @time using AxisArrays
  0.351461 seconds (269.31 k allocations: 15.214 MiB)

julia> @time using DataFrames                                                                                                          
  1.076051 seconds (1.43 M allocations: 90.064 MiB, 1.66% gc time)

julia> @time using Distributions
  2.720949 seconds (2.75 M allocations: 148.510 MiB, 0.67% gc time)

julia> @time using KernelDensity
  4.961526 seconds (6.81 M allocations: 361.708 MiB, 1.85% gc time)

julia> @time using RecipesBase
  1.201428 seconds (1.83 M allocations: 86.964 MiB, 2.53% gc time)

julia> @time using Showoff
  0.007566 seconds (6.55 k allocations: 387.750 KiB)

julia> @time using SpecialFunctions
  0.000162 seconds (325 allocations: 18.469 KiB)

julia> @time using StatsBase
  0.000184 seconds (335 allocations: 18.969 KiB)

It seems a bit surprising how long it takes to load KernelDensity, but clearly that’s not a very detailed analysis…

Yip. similar numbers here:

julia> 
➜  ~ julia

               _
   _       _ _(_)_     |  Documentation: https://docs.julialang.org
  (_)     | (_) (_)    |
   _ _   _| |_  __ _   |  Type "?" for help, "]?" for Pkg help.
  | | | | | | |/ _` |  |
  | | |_| | | | (_| |  |  Version 1.5.0-DEV.212 (2020-01-31)
 _/ |\__'_|_|_|\__'_|  |  Commit 51f1710d7e (0 days old master)
|__/                   |

julia> @time using HangTest
  7.778694 seconds (19.24 M allocations: 999.451 MiB, 4.43% gc time)

and

               _
   _       _ _(_)_     |  Documentation: https://docs.julialang.org
  (_)     | (_) (_)    |
   _ _   _| |_  __ _   |  Type "?" for help, "]?" for Pkg help.
  | | | | | | |/ _` |  |
  | | |_| | | | (_| |  |  Version 1.5.0-DEV.212 (2020-01-31)
 _/ |\__'_|_|_|\__'_|  |  Commit 51f1710d7e (0 days old master)
|__/                   |

julia> @time using KernelDensity
  4.431102 seconds (9.09 M allocations: 491.456 MiB, 3.48% gc time)

julia> 

Good news!

1 Like

Interesting, it seems you get similar results even though you didn’t use the PR. So maybe not the PR but some other recent change caused the improvements. Actually the CI tests of HangTest (https://travis-ci.com/itsdfish/HangTest/builds/146948402) show the same timings on Julia 1.3 and the latest Julia nightly, which is exactly the version you used. So I’m not sure why and how you could get the improved performance :confused:

Ever since this issue came up I felt like chasing my tail …

I think that my issue might have been slightly different from the one described here, in that I could never catch the delay with an @time statement and was seeing much longer than 110 second delays.

I believe the PR now has been merged. I still see similar results as I showed previously (I think).

For CmdStan.jl, Stan.jl and soon StatisticalRethinking.jl (v2.0.0) I’ve now removed most packages from the respective Project.toml and put glue code behing a Requires.jl barrier.

In my opinion this makes these packages usable again and pass through Travis, but basically makes my favorite workflow, working in project environments, useless. I wonder if somehow this could be fixed by using artifacts?

               _
   _       _ _(_)_     |  Documentation: https://docs.julialang.org
  (_)     | (_) (_)    |
   _ _   _| |_  __ _   |  Type "?" for help, "]?" for Pkg help.
  | | | | | | |/ _` |  |
  | | |_| | | | (_| |  |  Version 1.5.0-DEV.230 (2020-02-03)
 _/ |\__'_|_|_|\__'_|  |  Commit 3720edfc06 (0 days old master)
|__/                   |

julia> @time using HangTest
  7.372015 seconds (19.71 M allocations: 1021.494 MiB, 3.32% gc time)

julia> @time using MCMCChains
  0.883836 seconds (1.96 M allocations: 93.365 MiB, 12.77% gc time)

julia> @time using KernelDensity
  0.000266 seconds (493 allocations: 25.938 KiB)

julia> @time using Optim
  0.000329 seconds (573 allocations: 30.047 KiB)

Just KernelDensity:

               _
   _       _ _(_)_     |  Documentation: https://docs.julialang.org
  (_)     | (_) (_)    |
   _ _   _| |_  __ _   |  Type "?" for help, "]?" for Pkg help.
  | | | | | | |/ _` |  |
  | | |_| | | | (_| |  |  Version 1.5.0-DEV.230 (2020-02-03)
 _/ |\__'_|_|_|\__'_|  |  Commit 3720edfc06 (0 days old master)
|__/                   |

julia> @time using KernelDensity
  5.355851 seconds (10.91 M allocations: 578.610 MiB, 2.79% gc time)

Just Optim:

               _
   _       _ _(_)_     |  Documentation: https://docs.julialang.org
  (_)     | (_) (_)    |
   _ _   _| |_  __ _   |  Type "?" for help, "]?" for Pkg help.
  | | | | | | |/ _` |  |
  | | |_| | | | (_| |  |  Version 1.5.0-DEV.230 (2020-02-03)
 _/ |\__'_|_|_|\__'_|  |  Commit 3720edfc06 (0 days old master)
|__/                   |

julia> @time using Optim
  3.096964 seconds (7.38 M allocations: 392.797 MiB, 4.15% gc time)

That PR was only about delay in having to recompile a bunch of code after a package was loaded. You wouldn’t see it just timing the load time of the package and it wasn’t intended to help with that.