Thanks everyone for your help. Seeing how most of you can’t reproduce this behavior, I thought of another experiment: installing Julia directly on Windows instead of WSL (on the same machine). The code runs a bit more slowly and, there seems to be a slight improvement from my_useless_func, but nowhere near as dramatic as before.
julia> @benchmark my_func!($useless, $x)
BenchmarkTools.Trial: 10000 samples with 1 evaluation per sample.
Range (min … max): 134.300 μs … 784.400 μs ┊ GC (min … max): 0.00% … 0.00%
Time (median): 145.200 μs ┊ GC (median): 0.00%
Time (mean ± σ): 147.662 μs ± 14.218 μs ┊ GC (mean ± σ): 0.00% ± 0.00%
█ ▁▂ █▅ ▁█ ▁▁▇ ▁▁█ ▁▂▆ ▁▇ ▁▆ ▁▅ ▅ ▁ ▁ ▃
█▅███▇██▇██▇███████▇███▇███▇████▇▇██▆▆▇▇█▅▇▇▆█▄▂▆▅█▄▆▆▆▅▇▄▃▄▄ █
134 μs Histogram: log(frequency) by time 181 μs <
Memory estimate: 0 bytes, allocs estimate: 0.
julia> @benchmark my_useless_func!($useless, $x)
BenchmarkTools.Trial: 10000 samples with 1 evaluation per sample.
Range (min … max): 125.300 μs … 452.600 μs ┊ GC (min … max): 0.00% … 0.00%
Time (median): 135.500 μs ┊ GC (median): 0.00%
Time (mean ± σ): 139.672 μs ± 22.715 μs ┊ GC (mean ± σ): 0.00% ± 0.00%
▅█▃▄▄
▆▃█████▄▅▄▃▂▂▃▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▁▂▂▂▂▂▂▂▂▂▂▁▂▂▂▁▂▂▂▂▁▂▂▁▂ ▃
125 μs Histogram: frequency by time 269 μs <
Memory estimate: 0 bytes, allocs estimate: 0.
julia> versioninfo()
Julia Version 1.11.7
Commit f2b3dbda30 (2025-09-08 12:10 UTC)
Build Info:
Official https://julialang.org/ release
Platform Info:
OS: Windows (x86_64-w64-mingw32)
CPU: 32 × 13th Gen Intel(R) Core(TM) i9-13900HX
WORD_SIZE: 64
LLVM: libLLVM-16.0.6 (ORCJIT, alderlake)
Threads: 1 default, 0 interactive, 1 GC (on 32 virtual cores)
julia>
This strange outcome is highly platform specific.