@debug has massive performance impact on Windows

Hey guys,

I found that the @debug macro has a massive performance impact on my code. I use the macro very often, because the documentation says:

[The macro] will produce no output by default. Furthermore, it’s very cheap to leave debug statements like this in the source code because the system avoids evaluating the message if it would later be ignored.

But it looks like this is not cheap, see the MWE, that is 10x slower by placing the macro in a fast evaluating function from the documentation examples.

Am I doing something wrong? Otherwise there should be a note in the docu that the macro should not be used in (performance) optimized code.

Thank you & best regards!

MWE (Julia 1.9.2)

using BenchmarkTools

function my_sum(x)
    return sum(x)
end

function my_sum_debug(x)
    @debug "The given x is " x 
    return sum(x)
end

x = rand(1000)
@btime my_sum($x) # 61.702 ns (0 allocations: 0 bytes)
@btime my_sum_debug($x) # 755.752 ns (2 allocations: 112 bytes)
2 Likes

Well, it’s absolutely cheap, not relatively. There is some small but fixed overhead (use @macroexpand to see the expression generated by @debug), and because the workload in your example is trivial, it becomes very noticeable. Make x a hundred times larger, and the impact of the debug statement is “only” about 3% (on my machine).

Please also note that you should always interpolate external variables into benchmark expressions

@btime my_sum($x)

for an accurate measurement.

Edit: I concur with Mason below. The overhead in the original example is ~2x for me as well. 10x seems really excessive.

1 Like

Thanks!

Yep, I edited the “$”.

I see that the overhead for a single evaluation is not much, but in my special case it had a noticable effect, because there where thousands of @debug evaluations per second (which I thought were totally ignored as long as debugging is disabled) - I basically used it for easy print-debugging a entire software library.

Hm interesting, I can’t reproduce your benchmark. Here’s what I see:

Version info
julia> versioninfo()
Julia Version 1.9.3
Commit bed2cd540a* (2023-08-24 14:43 UTC)

Platform Info:
  OS: Linux (x86_64-pc-linux-gnu)
  CPU: 12 × AMD Ryzen 5 5600X 6-Core Processor
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-14.0.6 (ORCJIT, znver3)
  Threads: 6 on 12 virtual cores
Environment:
  JULIA_NUM_THREADS = 6

This text will be hidden

Definitions
using BenchmarkTools

function my_sum(x)
    return sum(x)
end

function my_sum_debug(x)
    @debug "The given x is " x 
    return sum(x)
end
x = rand(1000)

This text will be hidden

julia> @btime my_sum($x);
  51.152 ns (0 allocations: 0 bytes)

julia> @btime my_sum_debug($x);
  114.565 ns (0 allocations: 0 bytes)

Importantly, I’m not incurring any allocations from the debug statements, and they’re only taking ~60 extra nanoseconds rather than ~600 extra nanoseconds.

Are you sure you’re not in a mode where the debug statements are being saved to a file or something?

Just started with a blank REPL (Julia 1.9.2) and ENV["JULIA_DEBUG"] is not set.
I give 1.9.3 a try…

EDIT:
Same allocations/timing in Julia 1.9.3 with undefinded ENV["JULIA_DEBUG"]

And even more intersting: If I set ENV["JULIA_DEBUG"] = nothing, benchmark worsens to 5 allocations and 264 bytes…

julia> @btime my_sum($x)
  35.591 ns (0 allocations: 0 bytes)
506.50792005648816

julia> @btime my_sum_debug($x)
  56.696 ns (0 allocations: 0 bytes)
506.50792005648816

for me on 1.9.3.

Hm… so am I doing something wrong?

Julia Version 1.9.3
Commit bed2cd540a (2023-08-24 14:43 UTC)
Build Info:
  Official https://julialang.org/ release
Platform Info:
  OS: Windows (x86_64-w64-mingw32)
  CPU: 8 × Intel(R) Core(TM) i7-8565U CPU @ 1.80GHz
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-14.0.6 (ORCJIT, skylake)
  Threads: 1 on 8 virtual cores

Even in 1.9.3. I have:

julia> @btime my_sum($x)
  61.876 ns (0 allocations: 0 bytes)
500.0120162080791

julia> @btime my_sum_debug($x)
  479.487 ns (2 allocations: 112 bytes)
500.0120162080791

What is the output of you running

using Logging
Logging.min_enabled_level(global_logger())

?

Do you have anything in your startup that could affect this for any reason? Tried starting julia with julia --startup-file=no?

julia> using Logging
julia> Logging.min_enabled_level(global_logger())
Info

Thank you all for the fast replies BTW!

same result with julia --startup-file=no

Just as an extra data point you’re not alone:

julia> @btime my_sum($x)
  67.076 ns (0 allocations: 0 bytes)
481.4370737577461

julia> @btime my_sum_debug($x)
  755.372 ns (2 allocations: 112 bytes)
481.4370737577461

julia> using Logging

julia> Logging.min_enabled_level(global_logger())
Info

on 1.10-beta2, Windows 10 (I assume Kristoffer and Mason you are on Linux?)

1 Like

Another Windows data point with a fresh Julia v1.9.3:

julia> @btime my_sum($x)
  94.249 ns (0 allocations: 0 bytes)
487.6221750988597

julia> @btime my_sum_debug($x)
  650.575 ns (2 allocations: 112 bytes)
487.6221750988597

julia> using Logging

julia> Logging.min_enabled_level(global_logger())
Info
1 Like

@mihalybaci @nilshg @ThummeTo do you see similar performance problems on earlier versions of julia? e.g. 1.6 or 1.0? I’m curious if this is something that’s always been this way or is a recent regression.

Julia 1.6.7 is similiar for me …

julia> @btime my_sum($x) 
  63.544 ns (0 allocations: 0 bytes)
494.6890586136828

julia> @btime my_sum_debug($x)
  461.224 ns (2 allocations: 144 bytes)
494.6890586136828

Julia 1.0.5:

julia> @btime my_sum($x) 
  59.858 ns (0 allocations: 0 bytes)
489.8734842999011

julia> @btime my_sum_debug($x)
  471.429 ns (4 allocations: 176 bytes)
489.8734842999011

Try Logging.disable_logging(Debug) to filter debug messages at the earliest time. The remaining overhead should really be tiny.

On a tangent, does anybody know why sum is seemingly really slow on an M1…? That’s a ~3 fold increase over an 8th generation mobile i7 (from OP).

julia> @btime sum($x) # 1000 elements Float64
  174.389 ns (0 allocations: 0 bytes)
versioninfo
julia> versioninfo()
Julia Version 1.9.3
Commit bed2cd540a1 (2023-08-24 14:43 UTC)
Build Info:
  Official https://julialang.org/ release
Platform Info:
  OS: macOS (arm64-apple-darwin22.4.0)
  CPU: 8 × Apple M1
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-14.0.6 (ORCJIT, apple-m1)
  Threads: 4 on 4 virtual cores
Environment:
  JULIA_NUM_THREADS = 4
  JULIA_EDITOR = code
  JULIA_CONDAPKG_BACKEND = Current

Compiler failing to simd?

1 Like

On 1.0:

julia> x = rand(1000);

julia> @btime my_sum($x)
  33.971 ns (0 allocations: 0 bytes)
497.3346280689935

julia> @btime my_sum_debug($x)
  664.145 ns (4 allocations: 176 bytes)
497.3346280689935

The difference is that on Windows, accessing the environment seems to allocate:

julia> f() = get(ENV, "JULIA_DEBUG", "")
f (generic function with 1 method)

julia> @btime f(); # Linux
  26.734 ns (0 allocations: 0 bytes)

julia> @btime f(); # Windows
  112.293 ns (2 allocations: 112 bytes)
7 Likes

I also see the slowdown on v1.6.7, but @skleinbo 's trick of disabling logging pretty much eliminates it on the older and newer versions. There is a persistent ~20 ns difference (roughly 20%), but maybe that’s within the error?

I can confirm that Logging.disable_logging(Debug) solves this in Julia 1.6.7 as well as in 1.9.3, the timing difference almost vanishes and we have zero allocations:

Julia 1.9.3 (Windows)

 @btime my_sum($x) 
  62.016 ns (0 allocations: 0 bytes)
511.22678570928684

julia> @btime my_sum_debug($x) 
  69.152 ns (0 allocations: 0 bytes)
511.22678570928684

So could this be Julia-Default?