@debug has massive performance impact on Windows

Well, it’s absolutely cheap, not relatively. There is some small but fixed overhead (use @macroexpand to see the expression generated by @debug), and because the workload in your example is trivial, it becomes very noticeable. Make x a hundred times larger, and the impact of the debug statement is “only” about 3% (on my machine).

Please also note that you should always interpolate external variables into benchmark expressions

@btime my_sum($x)

for an accurate measurement.

Edit: I concur with Mason below. The overhead in the original example is ~2x for me as well. 10x seems really excessive.

1 Like

Thanks!

Yep, I edited the “$”.

I see that the overhead for a single evaluation is not much, but in my special case it had a noticable effect, because there where thousands of @debug evaluations per second (which I thought were totally ignored as long as debugging is disabled) - I basically used it for easy print-debugging a entire software library.

Hm interesting, I can’t reproduce your benchmark. Here’s what I see:

Version info
julia> versioninfo()
Julia Version 1.9.3
Commit bed2cd540a* (2023-08-24 14:43 UTC)

Platform Info:
  OS: Linux (x86_64-pc-linux-gnu)
  CPU: 12 × AMD Ryzen 5 5600X 6-Core Processor
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-14.0.6 (ORCJIT, znver3)
  Threads: 6 on 12 virtual cores
Environment:
  JULIA_NUM_THREADS = 6

This text will be hidden

Definitions
using BenchmarkTools

function my_sum(x)
    return sum(x)
end

function my_sum_debug(x)
    @debug "The given x is " x 
    return sum(x)
end
x = rand(1000)

This text will be hidden

julia> @btime my_sum($x);
  51.152 ns (0 allocations: 0 bytes)

julia> @btime my_sum_debug($x);
  114.565 ns (0 allocations: 0 bytes)

Importantly, I’m not incurring any allocations from the debug statements, and they’re only taking ~60 extra nanoseconds rather than ~600 extra nanoseconds.

Are you sure you’re not in a mode where the debug statements are being saved to a file or something?

Just started with a blank REPL (Julia 1.9.2) and ENV["JULIA_DEBUG"] is not set.
I give 1.9.3 a try…

EDIT:
Same allocations/timing in Julia 1.9.3 with undefinded ENV["JULIA_DEBUG"]

And even more intersting: If I set ENV["JULIA_DEBUG"] = nothing, benchmark worsens to 5 allocations and 264 bytes…

julia> @btime my_sum($x)
  35.591 ns (0 allocations: 0 bytes)
506.50792005648816

julia> @btime my_sum_debug($x)
  56.696 ns (0 allocations: 0 bytes)
506.50792005648816

for me on 1.9.3.

Hm… so am I doing something wrong?

Julia Version 1.9.3
Commit bed2cd540a (2023-08-24 14:43 UTC)
Build Info:
  Official https://julialang.org/ release
Platform Info:
  OS: Windows (x86_64-w64-mingw32)
  CPU: 8 × Intel(R) Core(TM) i7-8565U CPU @ 1.80GHz
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-14.0.6 (ORCJIT, skylake)
  Threads: 1 on 8 virtual cores

Even in 1.9.3. I have:

julia> @btime my_sum($x)
  61.876 ns (0 allocations: 0 bytes)
500.0120162080791

julia> @btime my_sum_debug($x)
  479.487 ns (2 allocations: 112 bytes)
500.0120162080791

What is the output of you running

using Logging
Logging.min_enabled_level(global_logger())

?

Do you have anything in your startup that could affect this for any reason? Tried starting julia with julia --startup-file=no?

julia> using Logging
julia> Logging.min_enabled_level(global_logger())
Info

Thank you all for the fast replies BTW!

same result with julia --startup-file=no

Just as an extra data point you’re not alone:

julia> @btime my_sum($x)
  67.076 ns (0 allocations: 0 bytes)
481.4370737577461

julia> @btime my_sum_debug($x)
  755.372 ns (2 allocations: 112 bytes)
481.4370737577461

julia> using Logging

julia> Logging.min_enabled_level(global_logger())
Info

on 1.10-beta2, Windows 10 (I assume Kristoffer and Mason you are on Linux?)

1 Like

Another Windows data point with a fresh Julia v1.9.3:

julia> @btime my_sum($x)
  94.249 ns (0 allocations: 0 bytes)
487.6221750988597

julia> @btime my_sum_debug($x)
  650.575 ns (2 allocations: 112 bytes)
487.6221750988597

julia> using Logging

julia> Logging.min_enabled_level(global_logger())
Info
1 Like

@mihalybaci @nilshg @ThummeTo do you see similar performance problems on earlier versions of julia? e.g. 1.6 or 1.0? I’m curious if this is something that’s always been this way or is a recent regression.

Julia 1.6.7 is similiar for me …

julia> @btime my_sum($x) 
  63.544 ns (0 allocations: 0 bytes)
494.6890586136828

julia> @btime my_sum_debug($x)
  461.224 ns (2 allocations: 144 bytes)
494.6890586136828

Julia 1.0.5:

julia> @btime my_sum($x) 
  59.858 ns (0 allocations: 0 bytes)
489.8734842999011

julia> @btime my_sum_debug($x)
  471.429 ns (4 allocations: 176 bytes)
489.8734842999011

Try Logging.disable_logging(Debug) to filter debug messages at the earliest time. The remaining overhead should really be tiny.

On a tangent, does anybody know why sum is seemingly really slow on an M1…? That’s a ~3 fold increase over an 8th generation mobile i7 (from OP).

julia> @btime sum($x) # 1000 elements Float64
  174.389 ns (0 allocations: 0 bytes)
versioninfo
julia> versioninfo()
Julia Version 1.9.3
Commit bed2cd540a1 (2023-08-24 14:43 UTC)
Build Info:
  Official https://julialang.org/ release
Platform Info:
  OS: macOS (arm64-apple-darwin22.4.0)
  CPU: 8 × Apple M1
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-14.0.6 (ORCJIT, apple-m1)
  Threads: 4 on 4 virtual cores
Environment:
  JULIA_NUM_THREADS = 4
  JULIA_EDITOR = code
  JULIA_CONDAPKG_BACKEND = Current

Compiler failing to simd?

1 Like

On 1.0:

julia> x = rand(1000);

julia> @btime my_sum($x)
  33.971 ns (0 allocations: 0 bytes)
497.3346280689935

julia> @btime my_sum_debug($x)
  664.145 ns (4 allocations: 176 bytes)
497.3346280689935

The difference is that on Windows, accessing the environment seems to allocate:

julia> f() = get(ENV, "JULIA_DEBUG", "")
f (generic function with 1 method)

julia> @btime f(); # Linux
  26.734 ns (0 allocations: 0 bytes)

julia> @btime f(); # Windows
  112.293 ns (2 allocations: 112 bytes)
7 Likes

I also see the slowdown on v1.6.7, but @skleinbo 's trick of disabling logging pretty much eliminates it on the older and newer versions. There is a persistent ~20 ns difference (roughly 20%), but maybe that’s within the error?

I can confirm that Logging.disable_logging(Debug) solves this in Julia 1.6.7 as well as in 1.9.3, the timing difference almost vanishes and we have zero allocations:

Julia 1.9.3 (Windows)

 @btime my_sum($x) 
  62.016 ns (0 allocations: 0 bytes)
511.22678570928684

julia> @btime my_sum_debug($x) 
  69.152 ns (0 allocations: 0 bytes)
511.22678570928684

So could this be Julia-Default?

Don’t think so. It’s important that it primarily respect the setting in ENV by default, but checking ENV is what causes the problem.

1 Like