Windows 11 KB5089573 multithreading issues

Today, I’ve installed the KB5089573 update which brings the so called “Low Latency Profile” feature to Windows 11 (more information here). Up to now I’ve never had any issues with multithreading. My usage case is a while loop which numerically integrates two trajectories simultaneously by spawning two threads each iteration:

@inbounds while t[it-1] < tf
    dt = min(dt0, tf - t[it-1])

    thr1 = Threads.@spawn velocityverlet!(...)
    thr2 = Threads.@spawn velocityverlet!(...)

    wait(thr1)
    wait(thr2)

    # Perform some tasks over the results

end

I usually start Julia in VSCode with the argument --threads=32, and the computational speed is about 100 iterations/1.5 seconds with a CPU usage of about 28%. Today, after the update, I’ve run the same simulation and I’ve noticed a significant performance degradation: 100 iterations/4 seconds with a CPU usage of 80% and almost 14/16 cores maxed out.

After some trial and error, I’ve found out that starting julia with --threads=auto fixes the issue.

Am I missing something? Thanks in advance for your help.

Additional information

julia> versioninfo()
Julia Version 1.12.6
Commit 15346901f0 (2026-04-09 19:20 UTC)
Build Info:
  Official https://julialang.org release
Platform Info:
  OS: Windows (x86_64-w64-mingw32)
  CPU: 16 × Intel(R) Core(TM) Ultra 7 255H
  WORD_SIZE: 64
  LLVM: libLLVM-18.1.7 (ORCJIT, arrowlake)
  GC: Built with stock GC
Threads: 1 default, 1 interactive, 1 GC (on 16 virtual cores)

Can you give more details?

From the current info, you are only running at most 2 tasks in parallel, then how can the CPU usage be 80%? (I assume that the velocityverlet! is at most occupying 1 thread).

If the velocityverlet! is a cpu-bound task, then perhaps it should be associated with a physical core rather than a virtual processor, so maybe --threads=32 is not ideal.

Yes. The issue is exactly that: the tasks are just 2 and the CPU is almost maxed out, which is indeed strange. As I said, before the update I’ve started julia with --threads=32 without issues. Now to achieve the same performance I need to start it with --threads=auto.

The following are the some tests I’ve done before discovering that --threads=auto solves the issue.

No running simulation

Simulation with --threads=2

Simulation with --threads=32

Simulation with --threads=auto

The performance of --threads=2 and --threads=32 are almost equal (about 100 iterations/4 seconds), while --threads=auto gives the usual performance I’ve got before the update, that is 100 iterations/1.5 seconds.

Edit: With --threads=auto, Threads.nthreads() = 16

The GC could automatically use more threads (all tasks would be idle or paused), but I’d expect it to be faster. It’s also not certain how many interactive, worker, and GC threads are being used in these scenarios; the documented default is 1 additional interactive thread, don’t know where the default GC threads is described (appears to be the number of worker threads if less than the number of virtual cores, number of virtual cores otherwise on my system). At least we know auto means 16 worker threads in that one run (nthreads defaults to only counting the default pool, versioninfo reports all).

Another possibility is that the CPU is being triggered into running other processes. I have no idea what other processes are around that could warrant so much activity, but the KB5089573 update’s description of “low latency” might be hinting at that. Very uncertain because it’s mostly describing boosting the clock for interactive apps, so I wouldn’t have expected it to affect a program that’s left alone to run. Yet it has, might as well check the Processes tab of Task Manager.