In order to turn verbosity on or off, I used to write functions in the old-fashioned style
julia> using Printf
julia> function do_something(verbose=false)
for i in 1:10000
verbose && @printf("print some info")
end
end
and I’m considering rewriting such functions as
julia> using Logging
julia> function do_something_with_logger()
for i in 1:10000
@debug("print some info")
end
end
I was under the impression that logging should essentially be free unless the log level instructs that something should indeed be logged (that’s what the MicroLogging.jl README seems to say). But in Julia 0.7 and 1.0 I observe the following:
julia> using BenchmarkTools
julia> @btime do_something()
4.849 ns (0 allocations: 0 bytes)
julia> my_null_logger = Logging.NullLogger();
julia> @btime with_logger(my_null_logger) do
do_something_with_logger()
end
976.781 μs (1 allocation: 32 bytes)
julia> my_custom_logger = Logging.ConsoleLogger(stderr, Logging.Error);
julia> @btime with_logger(my_custom_logger) do
do_something_with_logger()
end
974.228 μs (1 allocation: 32 bytes)
So it turns out that the difference is pretty stark. Is this expected? Adding a second @debug in the loop basically adds another millisecond to the time. Am I doing something wrong?
This might not answer your question, but the first timing (4.8 ns) is probably not measuring what you think it’s measuring. A 3 GHz CPU executes 3 clock cycles in 1 ns, so 4.8 ns means ~15 clock cycles, which obviously is not enough to iterate 10,000 times. So it seems that the compiler is clever enough to remove the entire for loop in the first case.
Thanks for pointing that out. Adding the computation of sin(i) in the loop in both functions still shows a large discrepancy though: about 132 µs without logger vs about 1.1 ms with logger. I just tried to cook up a minimum example here.
Hi, I think the main problem you’re seeing here is a super nasty bug affecting log ids which is now fixed in julia-1.0.2, so you should install that before proceeding further.
Even so, the 1.0.2 the default performance profile for early-filtered logging is not what I hoped for and has significantly regressed compared to MicroLogging in 0.6. In 0.6 I have
julia> using MicroLogging
julia> function do_something_with_logger()
for i in 1:10000
@debug "print some info"
end
end
julia> @time do_something_with_logger()
0.000533 seconds (4 allocations: 160 bytes)
which is respectable enough, though I think we can do better in the future. However, in 1.0.2 I have
The global flag is a definite design compromise though and shouldn’t be necessary to get reasonable performance so I’ve created an issue for this: https://github.com/JuliaLang/julia/issues/30003
Not really related to the early-filtering from https://github.com/JuliaLang/julia/issues/30003, but maybe relevant: Presence of debugging macro’s also significantly bloats the code of the containing function, preventing inlining of the simplest of functions:
using InteractiveUtils
function foo()
1+1
end
xfoo() = foo()
@code_warntype xfoo()
function foobar()
println(42)
1+1
end
xfoobar() = foobar()
@code_warntype xfoobar()
function bar()
@debug 42
1+1
end
xbar() = bar()
@code_warntype xbar()
Yes, it’s all tradeoffs, at least with the macros we have the option of modifying the implementation in the future to make it more efficient, but without changing the semantics and breaking anyone’s code.
Early on, I had the log message generation in a closure which might or might not have helped with inlining. Sombody (I think it was Jeff) noted this was hard on the compiler, so it’s all inlined for now.
Part of the high inlining cost is possibly that the logging is wrapped in a try/catch to avoid the situation where you enable a debug statement at runtime and it causes your program to crash due to an untested code path. I’ve seen this happen in practice and I thought it best to avoid by design. There’s a certain performance hit with the current system though.
Hope you might appreciate this slightly off topic - but (hopefully) still enlightening observation.
Light travels approximately one foot in one nanosecond (ns).
And so I find it simply amazing that (the speed of light C is a limiting factor for (non-quantum) transfer of information) may be another way to know that 10,000 iterations could NOT have occurred in that time frame span of space-time.
IOW (implied, but maybe not immediately obvious from this discussion) Big-O algorithmic design utilizing more intelligence, memory and parallel processing (like our brains) is likely the best way forward for compiler design and faster code execution.