Inaccurate sampling of profiler?

I try to benchmark multiplication between matrices and multilication between one matrix and one number. Here’s the code:

using LinearAlgebra

BLAS.set_num_threads(1)

A = rand(ComplexF64, 1000, 1000)
B = rand(ComplexF64, 1000, 1000)

function foo(A, B)
    for i in 1:10
        1.0*A*B
    end
    return nothing
end

function bar(A, B)
    for i in 1:10
        A*B
    end
end

However, I found inconsistencies between the result of the @btime macro from BenchmarkTools and the profiler.

using BenchmarkTools
@btime foo($A, $B)
# 1.808 s (40 allocations: 305.18 MiB)
@btime bar($A, $B)
# 1.764 s (20 allocations: 152.59 MiB)

which indicates the two functions take roughly the same amount of time. But if I use the profiler

function foobar(A, B)
    foo(A, B)
    bar(A, B)
    return nothing
end

foobar(A, B) # compile once

using Profile

@profile foobar(A, B)

Profile.print(mincount=60)

gives

168 ./task.jl:268; (::getfield(Revise, Symbol("##80#82")){REPL.R...
 168 ...es/Revise/439di/src/Revise.jl:975; run_backend(::REPL.REPLBackend)
  168 .../stdlib/v1.2/REPL/src/REPL.jl:86; eval_user_input(::Any, ::REPL.REPLBackend)
   168 ./boot.jl:330; eval(::Module, ::Any)
    106 ./REPL[57]:2; foobar(::Array{Complex{Float64},2}, ::Array{...
     106 ./REPL[3]:3; foo(::Array{Complex{Float64},2}, ::Array{Co...
      106 ./operators.jl:529; *
       67 ...LinearAlgebra/src/matmul.jl:143; *
        67 ...inearAlgebra/src/matmul.jl:145; mul!
         67 ...inearAlgebra/src/matmul.jl:464; gemm_wrapper!(::Array{Complex{Float64},2...
          67 ...LinearAlgebra/src/blas.jl:1131; gemm!(::Char, ::Char, ::Complex{Float64...
    62  ./REPL[57]:3; foobar(::Array{Complex{Float64},2}, ::Array{...
     62 ./REPL[4]:3; bar
      61 ...LinearAlgebra/src/matmul.jl:143; *
       61 ...LinearAlgebra/src/matmul.jl:145; mul!
        61 ...inearAlgebra/src/matmul.jl:464; gemm_wrapper!(::Array{Complex{Float64},2}...
         61 ...LinearAlgebra/src/blas.jl:1131; gemm!(::Char, ::Char, ::Complex{Float64}...

which implies a ~ 40% percent time difference between the two functions.

I tend to believe the result of @btime is correct since @time gives consistent results. What happens to the profiler then?

Edit: the behavior is the same if I run the profiler twice.

1 Like

You are measuring compile time of the profiler

Not sure about your statement. I ran the profiler and cleared the results several times in REPL.

Never mind then. In your code it looked like from your code that you were printing the first use of @profile

Root cause found in BLAS calls not assigned to callers in profiler · Issue #33605 · JuliaLang/julia · GitHub, although workarounds are still needed.

1 Like