@debug allocate even when not in DEBUG mode

I would like to add debug statement in a function that is meant to be call in loops, so I would like the performance to be independent of the @debug statement when not in DEBUG mode. As an MRE, I have the function foo, and I would like to add @debug statement.

function foo()
    return 0
end

function foo_debug()
    @debug ""
    return 0
end

I want to call the function foo in a loop.

function loop_foo()
    for _ in 1:1000
        foo()
    end
end

function loop_foo_debug()
    for _ in 1:1000
        foo_debug()
    end
end

Even when not in DEBUG mode the function with @debug allocate and took a lot more times.

julia> using BenchmarkTools

julia> @btime loop_foo()
  1.608 ns (0 allocations: 0 bytes)

julia> @btime loop_foo_debug()
  69.707 μs (2000 allocations: 62.50 KiB)

How can I add debug statement in function meant to by call in loops without losing performance when not in DEBUG mode?

Can you please share versioninfo()? Are you running this from terminal and not vscode (vscode interferes with logging)? For me loop_foo_debug is not allocating also timing is similar.

julia> @btime loop_foo_debug()
  49.583 μs (0 allocations: 0 bytes)

Note that loop_foo is optimized to a no-op

julia> @code_llvm loop_foo()
define void @julia_loop_foo_1202() #0 {
top:
  ret void
}

I use the latest version:

julia> versioninfo()
Julia Version 1.11.7
Commit f2b3dbda30a (2025-09-08 12:10 UTC)
Build Info:
  Official https://julialang.org/ release
Platform Info:
  OS: Linux (x86_64-linux-gnu)
  CPU: 22 × Intel(R) Core(TM) Ultra 7 165H
  WORD_SIZE: 64
  LLVM: libLLVM-16.0.6 (ORCJIT, alderlake)
Threads: 1 default, 0 interactive, 1 GC (on 22 virtual cores)
Environment:
  JULIA_EDITOR = code
  JULIA_VSCODE_REPL = 1

That is true that, in my original post I was inside VScode, and if I use it as a script, I don’t get the allocation but the performance cost is still huge.

$ julia main.jl 
  0.981 ns (0 allocations: 0 bytes)
  33.058 μs (0 allocations: 0 bytes)

In my true setting, since most people (including myself) use julia in VSCode, I can not use @debug.

Regarding vscode: I raised this issue some time ago https://github.com/julia-vscode/julia-vscode/issues/3414. The solution mentioned there of wrapping your code in a with_logger does not seem to work.

But I think the bigger problem is that @debug is quite costly in runtime even when run normally. I can’t find an issue about this so could be worthwhile to raise one? GitHub · Where software is built

Yes, @debug is expensive even in not debug mode for performant code. My belief was that @debug will be compile away when not in debug mode which is not the case. It might be worth to raise an issue, at least add something in the documentation that, for performance code, you should not use @debug.

However, I have found a weird way to achieved what I want to do.

module SRC

debug() = false

function foo()
    return 0
end

function foo_debug()
    debug() && println("debug information...")
    return 0
end

end

Whan debug() = false, the two functions have the exactly same code.

julia> @code_llvm SRC.foo()
define i64 @julia_foo_8868() #0 {
top:
  ret i64 0
}

julia> @code_llvm SRC.foo_debug()
define i64 @julia_foo_debug_8879() #0 {
top:
  ret i64 0
}

When debug() = true then you get the print statement.

julia> SRC.debug() = true

julia> SRC.foo_debug()
debug information...
0

To be clear, debug messages are controlled at runtime with the environment variable JULIA_DEBUG, so they can never be removed by the compiler, unless you put a compile-time barrier in front of them such as that debug() function (but this is true for any code, not specific to logging messages). See Logging · The Julia Language for more details.

Note that you can also just write your own @debug-like macro.

Parse-time version

When should_print_pt is set to false, println will not be put into the Julia code.

const should_print_pt = false

macro debug_pt(ex)
    should_print_pt ? esc(:(println($ex))) : nothing
end

function foo_debug_pt()
    @debug_pt "Hello"
    return 0
end

function loop_foo_debug_pt()
    for _ in 1:1000
        foo_debug_pt()
    end
end
julia> @code_lowered foo_debug_pt()
CodeInfo(
1 ─     return 0
)

julia> @btime loop_foo_debug_pt()  # (No idea why we don't get 1.2 ns like in the loop_foo case on my machine)
  16.316 ns (0 allocations: 0 bytes)

julia> @code_llvm loop_foo_debug_pt()
; Function Signature: loop_foo_debug_pt()
;  @ REPL[4]:1 within `loop_foo_debug_pt`
; Function Attrs: uwtable
define void @julia_loop_foo_debug_pt_7305() #0 {
top:
;  @ REPL[4]:4 within `loop_foo_debug_pt`
  ret void
}

The downside is that this is not flexible at all: to set should_print_pt to true you have to reload the code.

julia> const should_print_pt = true
WARNING: redefinition of constant Main.should_print_pt. This may fail, cause incorrect answers, or produce other errors.
true

julia> foo_debug_pt()  # Does not print
0

julia> macro debug_pt(ex)
           should_print_pt ? esc(:(println($ex))) : nothing
       end;

julia> function foo_debug_pt()
           @debug_pt "Hello"
           return 0
       end;

julia> foo_debug_pt()
Hello
0
Compile-time version

When should_print_ct is the false function, println will be put into the Julia code, but gets compiled away.

should_print_ct() = false

macro debug_ct(ex)
    esc(:((should_print_ct() && println($ex)); nothing))
end

function foo_debug_ct()
    @debug_ct "Hello"
    return 0
end

function loop_foo_debug_ct()
    for _ in 1:1000
        foo_debug_ct()
    end
end
julia> @code_lowered foo_debug_ct()  # Still has a println
CodeInfo(
1 ─ %1 = Main.should_print_ct()
└──      goto #3 if not %1
2 ─      Main.println("Hello")
└──      goto #3
3 ┄      Main.nothing
└──      return 0
)

julia> @btime loop_foo_debug_ct() 
  1.200 ns (0 allocations: 0 bytes)

julia> @code_llvm loop_foo_debug_ct()  # No more println
; Function Signature: loop_foo_debug_ct()
;  @ REPL[4]:1 within `loop_foo_debug_ct`
; Function Attrs: uwtable
define void @julia_loop_foo_debug_ct_7774() #0 {
top:
;  @ REPL[4]:4 within `loop_foo_debug_ct`
  ret void
}

Changing should_print_ct will automatically trigger recompilation of foo_debug_pt, making this more flexible, at the cost of recompilations.

julia> foo_debug_ct()  # We still have should_print_ct() = false
0

julia> @time begin should_print_ct() = true; foo_debug_ct() end
Hello
  0.003950 seconds (751 allocations: 36.633 KiB, 93.69% compilation time: 100% of which was recompilation)
0
Runtime version

The println will always remain in the compiled code, but will not be executed when !should_print_rt[]. This will make this version a bit slower.

const should_print_rt = Ref(false)

macro debug_rt(ex)
    esc(:((should_print_rt[] && println($ex)); nothing))
end

function foo_debug_rt()
    @debug_rt "Hello"
    return 0
end

function loop_foo_debug_rt()
    for _ in 1:1000
        foo_debug_rt()
    end
end
julia> @code_llvm foo_debug_rt()
; Function Signature: foo_debug_rt()
;  @ REPL[3]:1 within `foo_debug_rt`
; Function Attrs: uwtable
define i64 @julia_foo_debug_rt_7422() #0 {
top:
;  @ REPL[3]:2 within `foo_debug_rt`
; ┌ @ refvalue.jl:59 within `getindex`
; │┌ @ Base.jl:49 within `getproperty`
    %0 = load i8, ptr @"jl_global#7426.jit", align 16
    %1 = and i8 %0, 1
    %.x.not = icmp eq i8 %1, 0
; └└
  br i1 %.x.not, label %L5, label %L4

L4:                                               ; preds = %top
  call void @j_println_7428(ptr nonnull @"jl_global#7429.jit")
  br label %L5

L5:                                               ; preds = %L4, %top
;  @ REPL[3]:3 within `foo_debug_rt`
  ret i64 0
}

julia> @btime loop_foo_debug_rt()
  255.251 ns (0 allocations: 0 bytes)

So this is still much faster than the full @debug, mainly because we’re using a RefValue{Bool} instead of the ENV dictionary.

Altering should_print_rt[] does not have any cost, making this the most flexible.

julia> foo_debug_rt()  # should_print_rt[] == false
0

julia> @time begin should_print_rt[] = true; foo_debug_rt() end
Hello
  0.000162 seconds (5 allocations: 80 bytes)
0