Weird fluctuation in running time

I’ve been observing a weird behaviour with a piece of code I’ve been working on. The running time, as returned by BenchmarkTools, will suddenly jump from around 60ms to over a minute. Once that happens, the running time stays around a minute and does not return to the previous value. I’ve been wrecking my brain trying to figure out what might be going on. One clue is that, once the running time jump happens, profiling the code using Profile reveals that the vast majority of time appears to be spent in @Base/task.jl. If I profile the exact same code into different REPL sessions, one in which the running time is normal and one in which it is long, it appears that those lines do indeed account for the difference.
I can see how something like that could happen by chance intermittently; a hiccup in the task scheduling mechanism, or something, but as I mentioned, once the running time enters the “long regime”, it tends to stay there.
I initially discovered this when running a lengthy calculation, where this specific code plays an important part, and noticed that running time would fluctuate from around 2-3 minutes to over an hour. This suggests that whatever triggers the change in running time is not intermittent, but that some thing else is going on.
I realize this is all a big vague, and I’m trying to come up with a scenario that can reliably trigger the change, but so far I haven’t been able to nail it down. If anyone has any idea of what might be going on, or any suggestions of how to debug it, I would be grateful.
The piece of code the undergoes the running time weirdness can be found here, and this is how I’ve been testing it:

using StandardBasisVectors, BenchmarkTools, Profile, LinearAlgebra
q,r = qr(randn(75,75))
Q = Matrix{Float64}(q)
@benchmark standardize_basis($Q)
BenchmarkTools.Trial: 77 samples with 1 evaluation.
 Range (min … max):  57.890 ms … 67.230 ms  ┊ GC (min … max): 0.00% … 0.00%
 Time  (median):     65.026 ms              ┊ GC (median):    0.00%
 Time  (mean ± σ):   64.961 ms ±  1.300 ms  ┊ GC (mean ± σ):  0.05% ± 0.43%

                                     ▁▃▁▁  ▆▁█▃▆██▃  ▁▃        
  ▄▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▄▇████▄▄████████▇▄██▄▄▇▄▄▇ ▁
  57.9 ms         Histogram: frequency by time        67.2 ms <

 Memory estimate: 804.23 KiB, allocs estimate: 8642.

# after running time change

@benchmark StandardBasisVectors.standardize_basis($Q)
BenchmarkTools.Trial: 5 samples with 1 evaluation.
 Range (min … max):  1.066 s …   1.101 s  ┊ GC (min … max): 0.00% … 0.00%
 Time  (median):     1.072 s              ┊ GC (median):    0.00%
 Time  (mean ± σ):   1.077 s ± 13.807 ms  ┊ GC (mean ± σ):  0.00% ± 0.00%

  █    █   █     █                                        █  
  █▁▁▁▁█▁▁▁█▁▁▁▁▁█▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁█ ▁
  1.07 s         Histogram: frequency by time         1.1 s <

 Memory estimate: 804.23 KiB, allocs estimate: 8642.

The result of running @profile StandardBasisVectors.standardize_basis(Q) can be seen here

I’m sure there is something obvious wrong here that I just can’t see.

Edit: Julia version info

Julia Version 1.10.0
Commit 3120989f39b (2023-12-25 18:01 UTC)
Build Info:
  Official release
Platform Info:
  OS: macOS (x86_64-apple-darwin22.4.0)
  CPU: 12 × Intel(R) Core(TM) i7-8700B CPU @ 3.20GHz
  LIBM: libopenlibm
  LLVM: libLLVM-15.0.7 (ORCJIT, skylake)
  Threads: 1 on 12 virtual cores

A couple of questions:

  • What Julia version are you running?
  • Just to confirm: If you keep on running @benchmark StandardBasisVectors.standardize_basis($Q) then at some point the code will slow down? You don’t do anything else?
  • The link to your profile does not work for me.

While reading, my first suspicion was an interaction between GC, threading and calls to external libraries (like in this recent thread). But your code is just some normal Julia loops, no long BLAS operations and no multi-threading… So it indeed seems very odd that Task scheduling should be to blame for this slowdown. I am not even sure where Tasks come into the picture… Maybe the profile sheds some light on this.

Apologies for leaving out the version info and the broken link; I’ve edited the post to fix both.

As I mentioned in the post, the code showing the slow-down is part of a larger pipeline, and it looks like only after running the larger pipeline a few times do I see the slow-down. The larger code can be found ParallelTransportUnfolding.jl. I was able to trigger the slow-down with a toy example, where at least the dimensions of the input data match my actual data. Here is the code:

using ParallelTransportUnfolding
using ParallelTransportUnfolding: StandardBasisVectors
using LinearAlgebra

Xf = randn(75,2011)
pq = fit(PTU, Xf;k=15,K=30, maxoutdim=3)

For the first couple of runs of the last line, the running time is around 2-3 minutes which is expected. But after a variable number of runs, e.g. 6 when I tested it just now, the running time jumps to around 50 minutes. To check that it is indeed the code in StandardBasisVectors that slows down, the code in the fit-function prints out running time estimates for the different parts of the computation (this is in the detailed_timing branch of the repo), and it is indeed the call to standardize_basis that slows down. I can also confirm this by the fact that running the same benchmark as before now shows that standardize_basis(Q) takes around 1.7 s instead of around 65 ms.

Initially, I thought that maybe multithreading could have something to do with it. Even though my code doesn’t use multithreading, I usually start up julia with 12 threads. However, I see the slow-down even when starting julia using a single thread.