Precompilation gridlock on HPC cluster

I’m running a batch script on an HPC cluster where each Julia execution is expected to be <1 min, within a bash loop. I ran into this weird “precompilation gridlock”, shown below. Has anybody experienced something like this?

Precompiling MLJBase
  Progress [=======================================> ]  35/36
  ✓ FixedPointNumbers
  ✓ ColorTypes
  ? Distributions → DistributionsChainRulesCoreExt
  ◐ MLJBase Being precompiled by another machine (hostname: worker6062, pid: 603518, pidfile: /mnt/home/mcranmer/.julia/compiled/v1.10/MLJBase/jaWQl…

Basically it looks like all the workers wait for one worker to finish precompiling. Then when that worker finally finishes[1], the next one decides that the precompilation cache was invalidated, and it needs to precompile again. This process repeats over and over.

The result is that out of 3200 cores across the cluster, only 1 is ever in use, since precompilation ends up taking longer the processing itself:

How can I prevent this? Or is there a way I can force each worker to avoid waiting on another process to finish precompilation?

Alternatively, is there a way to disable precompilation altogether, so that only “compilation” occurs?

  1. Which takes forever as this machine has many very slow cores. ↩︎

I might be jumping to conclusion a bit but at first glance it looks like a variant of:

I think they started to pop up in 1.9 because that’s when we tightened the criteria for invalidation (i.e. more easily invalid)

and the solution is to play with JULIA_CPU_TARGET `compilecache` failed when `@everywhere using` from remote machines · Issue #48217 · JuliaLang/julia · GitHub


One other clue is that this gridlock only started when I tried to interact with the environment from another interactive node to visualize some stuff. That seemed to invalidate the cache (maybe due to -O2 vs -O3). After that point the gridlock started (it’s a shared filesystem, so the cache would be shared by both my workers and interactive REPL).

That seemed to make the workers go into a loop where they would keep invalidating the cache of the previous one.

1 Like

yeah, I want to add that IMO it’s overly tight – in the github issue (last link above), the login and remote nodes are using the same CPUs but one has 2 NUMA nodes the other has only 1, but I don’t believe that should have changed the compile cache hash

You have a heterogeneous cluster and your processes are overwriting the precompation of the other.

You might want to use a distinct JULIA_DEPOT for your visualization node or somehow set JULIA_CPU_TARGET appropriately.

1 Like

I don’t know what’s going on because I never got to work with clusters, but I could search relevant sections in the docs from some keywords here.

Just to note; my cluster is not heterogenous. The node I am visualising things from is the same type as the nodes I am running from.

My guess is that it’s either:

  • My login shell prescribes different Julia env variables (like -O2) which triggered a re-precompilation, (and thus caused workers to get stuck while waiting for it to finish) or
  • I added a package while the batch script was already running.

The weird thing is even after one worker pre-compiled, the other ones started precompiling again, one after the other, (even though they are identical worker nodes). I don’t understand that. Maybe it’s about where they were in the precompilation process the moment the global mutable cache got changed, and so their new precompilation layer was somehow immediately invalid?

In either case I’d like to figure out how to prevent this. Is there a way I can freeze the precompilation cache when I execute my job, so that it doesn’t interact with a global mutable cache? Or, just turn off precompilation?

@ianshmean is the pidlock per package or per file? The later includes in the hash the optimization flags, but the former would maybe cause this issue.

ENV["JULIA_DEBUG"] ="loading" may help debug why it is invalidating the cache.

It’s specific to everything that the resulting hash in the cache filename is, except:

  • ignores the active project, so two processes with different projects will only result in one doing work
  • ignores preferences because they cannot be hashed before spawning the precompilation process