Julia 1.5.1 slows down when running multiple instances in parallel

Hello, I noticed a weird behavior in julia 1.5.1 performance (e. g. loading packages) compared to julia 1.1.0.

I have a batch script which starts various julia scripts simultaneously on a cluster.
In a simplified version it looks like

...
#SBATCH --ntasks 16
...

for ((i=1; i<=16; i++))
do
    srun -n 1 -N 1 --mem=500 /path/to/julia_version-number/bin/julia -O3 -- script.jl $i > $i.log &
done
wait
...

As a testing script I use

# script.jl
using Dates 
println(now())
@time using DataFrames 

Now, if I use my old julia 1.1.0 installation I get the following times from each job in dependence of the number of tasks provided

1 task

2020-09-02T08:38:58.465
  1.940183 seconds (3.31 M allocations: 204.015 MiB, 5.53% gc time)

4 tasks

2020-09-02T08:39:49.865
  1.856393 seconds (3.31 M allocations: 204.015 MiB, 5.14% gc time)

2020-09-02T08:39:49.912
  2.126351 seconds (3.31 M allocations: 204.019 MiB, 5.49% gc time)

2020-09-02T08:39:49.987
  1.908475 seconds (3.31 M allocations: 204.019 MiB, 6.49% gc time)

2020-09-02T08:39:50.014
  2.124697 seconds (3.31 M allocations: 204.015 MiB, 5.55% gc time)

16 tasks 

2020-09-02T08:41:25.748
  2.578776 seconds (3.31 M allocations: 204.021 MiB, 5.83% gc time)

2020-09-02T08:41:25.819
  2.109553 seconds (3.31 M allocations: 204.021 MiB, 4.57% gc time)

2020-09-02T08:41:25.636
  2.213523 seconds (3.31 M allocations: 204.023 MiB, 5.78% gc time)

2020-09-02T08:41:25.422
  2.611317 seconds (3.31 M allocations: 204.021 MiB, 5.21% gc time)
...

As you can see the time remains identical for all cases and as expected all jobs start at the same time.

However, this are now times with julia 1.5.1:

1 task 

2020-09-02T08:32:12.112
  1.209848 seconds (1.43 M allocations: 90.376 MiB)


4 tasks 

2020-09-02T08:30:08.977
  5.184917 seconds (1.43 M allocations: 90.380 MiB)

2020-09-02T08:30:09.131
  5.163921 seconds (1.43 M allocations: 90.382 MiB)

2020-09-02T08:30:09.074
  5.063213 seconds (1.43 M allocations: 90.384 MiB)

2020-09-02T08:30:09.145
  5.118585 seconds (1.43 M allocations: 90.380 MiB)

16 tasks 

2020-09-02T08:33:07.054
 20.093713 seconds (1.43 M allocations: 90.380 MiB)

2020-09-02T08:33:04.616
 20.067529 seconds (1.43 M allocations: 90.382 MiB)

2020-09-02T08:33:06.904
 19.924340 seconds (1.43 M allocations: 90.378 MiB)

2020-09-02T08:33:05.201
 20.263399 seconds (1.43 M allocations: 90.376 MiB)
...

The jobs start simultaneously again but the times get much larger as I increase the number of tasks.

Does someone has an idea on what has changed in julia 1.5.1 so that is shows this weird behavior.

Someone remind me please - there is a Julia package which deals with race conditions when packages are being compiled. this was referenced recently.

While someone can answer your question, do you suggest that I recompile everything on 1.5.1? Could that help?

Unfortunately it does affect every code, not only the using SomePackege, changing the script to

using Dates 
println(now())
@time A = rand(10_000_000)

gives the very same increase in time approximately proportional to the number of tasks.

number of tasks time in s
40 1.663574
20 0.728958
10 0.378481
1 0.030064

I think you are refering to this:

1 Like

I have found the part which is responsible for the slowdown. It is the memory allocation (I accidentally did not included it in the batch script in this post, it is now edited). When I do not set the memory, the times are fine…

Edit: sorry, this does not solve the issue because if I do not provide the memory limit, the tasks start one after the other…

Related to @johnh’s comment, the issue may be due to a recent change that makes the generation of the precompilation caches atomic, to avoid race conditions.

If that’s the issue, you can take a look at the docs for possible workarounds. The easiest thing to try is to start Julia with --compiled-modules=no.

Hello, thanks for the explanation.

I was able to finally solve it by using

srun -n 1 --exclusive /path/to/julia_version-number/bin/julia -O3 -- script.jl $i > $i.log &

and including #SBATCH --mem-per-cpu=500 in the batch script.

Could you imagine that this memory/CPU management somehow influences the julia 1.5.1 behavior in another way than it was the case in julia 1.1.0?