Logging performance

In order to turn verbosity on or off, I used to write functions in the old-fashioned style

julia> using Printf
julia> function do_something(verbose=false)
         for i in 1:10000
           verbose && @printf("print some info")
         end
       end

and I’m considering rewriting such functions as

julia> using Logging
julia> function do_something_with_logger()
         for i in 1:10000
           @debug("print some info")
         end
       end

I was under the impression that logging should essentially be free unless the log level instructs that something should indeed be logged (that’s what the MicroLogging.jl README seems to say). But in Julia 0.7 and 1.0 I observe the following:

julia> using BenchmarkTools

julia> @btime do_something()
  4.849 ns (0 allocations: 0 bytes)

julia> my_null_logger = Logging.NullLogger();

julia> @btime with_logger(my_null_logger) do
         do_something_with_logger()
       end
  976.781 μs (1 allocation: 32 bytes)

julia> my_custom_logger = Logging.ConsoleLogger(stderr, Logging.Error);

julia> @btime with_logger(my_custom_logger) do
         do_something_with_logger()
       end
  974.228 μs (1 allocation: 32 bytes)

So it turns out that the difference is pretty stark. Is this expected? Adding a second @debug in the loop basically adds another millisecond to the time. Am I doing something wrong?

Thanks.

2 Likes

This might not answer your question, but the first timing (4.8 ns) is probably not measuring what you think it’s measuring. A 3 GHz CPU executes 3 clock cycles in 1 ns, so 4.8 ns means ~15 clock cycles, which obviously is not enough to iterate 10,000 times. So it seems that the compiler is clever enough to remove the entire for loop in the first case.

2 Likes

Thanks for pointing that out. Adding the computation of sin(i) in the loop in both functions still shows a large discrepancy though: about 132 µs without logger vs about 1.1 ms with logger. I just tried to cook up a minimum example here.

Hi, I think the main problem you’re seeing here is a super nasty bug affecting log ids which is now fixed in julia-1.0.2, so you should install that before proceeding further.

Even so, the 1.0.2 the default performance profile for early-filtered logging is not what I hoped for and has significantly regressed compared to MicroLogging in 0.6. In 0.6 I have

julia> using MicroLogging

julia> function do_something_with_logger()
           for i in 1:10000
               @debug "print some info"
           end
       end

julia> @time do_something_with_logger()
  0.000533 seconds (4 allocations: 160 bytes)

which is respectable enough, though I think we can do better in the future. However, in 1.0.2 I have

julia> @time do_something_with_logger()
  0.003034 seconds (4 allocations: 160 bytes)

As a workaround, you can disable Debug logging globally which should speed things up greatly - on 1.0.2:

julia> disable_logging(Logging.Debug)

julia> @time do_something_with_logger()
  0.000030 seconds (4 allocations: 160 bytes)

The global flag is a definite design compromise though and shouldn’t be necessary to get reasonable performance so I’ve created an issue for this: https://github.com/JuliaLang/julia/issues/30003

1 Like

Not really related to the early-filtering from https://github.com/JuliaLang/julia/issues/30003, but maybe relevant: Presence of debugging macro’s also significantly bloats the code of the containing function, preventing inlining of the simplest of functions:

using InteractiveUtils

function foo()
    1+1
end
xfoo() = foo()
@code_warntype xfoo()

function foobar()
    println(42)
    1+1
end
xfoobar() = foobar()
@code_warntype xfoobar()

function bar()
    @debug 42
    1+1
end
xbar() = bar()
@code_warntype xbar()
Body::Int64
6 1 ─     return 2

Body::Int64
13 1 ─     invoke Main.println(42::Int64)
   └──     return 2

Body::Int64
20 1 ─ %1 = invoke Main.bar()::Core.Compiler.Const(2, false)
   └──      return %1 

I guess it’s a compromise between code size and having specialized code to construct the log message, but the increase in code seems pretty bad.

1 Like

Yes, it’s all tradeoffs, at least with the macros we have the option of modifying the implementation in the future to make it more efficient, but without changing the semantics and breaking anyone’s code.

Early on, I had the log message generation in a closure which might or might not have helped with inlining. Sombody (I think it was Jeff) noted this was hard on the compiler, so it’s all inlined for now.

Part of the high inlining cost is possibly that the logging is wrapped in a try/catch to avoid the situation where you enable a debug statement at runtime and it causes your program to crash due to an untested code path. I’ve seen this happen in practice and I thought it best to avoid by design. There’s a certain performance hit with the current system though.

Regarding optimizations, the following seems related: https://github.com/JuliaLang/julia/issues/29688

1 Like

Hope you might appreciate this slightly off topic - but (hopefully) still enlightening observation. :stuck_out_tongue_winking_eye:

Light travels approximately one foot in one nanosecond (ns).

And so I find it simply amazing that (the speed of light C is a limiting factor for (non-quantum) transfer of information) may be another way to know that 10,000 iterations could NOT have occurred in that time frame span of space-time.

IOW (implied, but maybe not immediately obvious from this discussion) Big-O algorithmic design utilizing more intelligence, memory and parallel processing (like our brains) is likely the best way forward for compiler design and faster code execution.