@debug has massive performance impact on Windows

Don’t think so. It’s important that it primarily respect the setting in ENV by default, but checking ENV is what causes the problem.

1 Like

One could do something ugly and special case this in the environment lookup on Windows to prevent allocation for this specific case with the argument that the overhead of the string compare is negligible to the allocation from cwstring. But I am not sure that is acceptable.

diff --git a/base/env.jl b/base/env.jl
index 077b3a4ed2..c3c60da9c0 100644
--- a/base/env.jl
+++ b/base/env.jl
@@ -7,8 +7,10 @@ if Sys.iswindows()
     _hasenv(s::Vector{UInt16}) = _getenvlen(s) != 0 || Libc.GetLastError() != ERROR_ENVVAR_NOT_FOUND
     _hasenv(s::AbstractString) = _hasenv(cwstring(s))
 
+    const JULIA_DEBUG_CWSTRING = cwstring("JULIA_DEBUG")
+
     function access_env(onError::Function, str::AbstractString)
-        var = cwstring(str)
+        var = str == "JULIA_DEBUG" ? JULIA_DEBUG_CWSTRING : cwstring(str)
         len = _getenvlen(var)
         if len == 0
             return Libc.GetLastError() != ERROR_ENVVAR_NOT_FOUND ? "" : onError(str)
3 Likes

Nice idea @kristoffer.carlsson. I’ve created a PR with that diff since I know you’re a busy guy, and it’s probably best to discuss the merits or alternatives to this idea on github than here.

3 Likes

Confirm @skleinbo on M1.

julia> @btime my_sum($x);
  173.135 ns (0 allocations: 0 bytes)

julia> @btime my_sum_debug($x);
  285.606 ns (0 allocations: 0 bytes)

julia> @btime sum($x)
  173.192 ns (0 allocations: 0 bytes)
versioninfo
julia> versioninfo()
Julia Version 1.9.3
Commit bed2cd540a (2023-08-24 14:43 UTC)

Platform Info:
  OS: macOS (arm64-apple-darwin21.6.0)
  CPU: 10 × Apple M1 Max
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-14.0.6 (ORCJIT, apple-m1)
  Threads: 5 on 8 virtual cores
Environment:
  JULIA_NUM_THREADS = 4
  JULIA_EDITOR = Emacs
  JULIA_PKG_DEVDIR = /Users/foo/Documents/julia/dev

So cool that this (from thread and feedbacks to PR) was resolved within hours :+1:
That’s something that impresses me again and again about Julia (and its great community).
Thank you all!

5 Likes

The allocation you get are not Windows specific. Still, while I get them too on Linux with 1.9.2, I do not get 10x slowdown, but they go away in 1.9.3, so likely on Windows too.

I noticed @btime my_sum_debug($x) gave different results when repeating, so I tested, and both it and non-debug have very large variance despite allocations gone:

julia> @benchmark my_sum_debug($x)
BenchmarkTools.Trial: 10000 samples with 343 evaluations.
 Range (min … max):  259.067 ns … 501.595 ns  ┊ GC (min … max): 0.00% … 0.00%
 Time  (median):     285.913 ns               ┊ GC (median):    0.00%
 Time  (mean ± σ):   291.185 ns ±  20.333 ns  ┊ GC (mean ± σ):  0.00% ± 0.00%

And I get only 501.595/161.899 3.1x slower max for degub vs min for non-debug.

Can you post Windows results for 1.9.3? Maybe allocations and/or @debug is unusually slow on Windows, even though I saw nothing Windows specific with the macro definition. Then it’s just slow because of allocations, also other code?