@code_lowered vs @code_typed

Where can I find documentation of the difference between these? The main Julia docs aren’t very clear about it.

I have an example of two functions where @code_typed is identical, but @code_lowered is not. There’s a difference in performance, and I’d like to understand what’s going on.

If you’ve not noticed already, the docs for code_lowered and code_typed are better than their @-cousins.

You probably want to initially use optimize=false when comparing the outputs of code_lowered and code_typed. The rough pipeline is lowering->type inference->optimization. After type inference Julia knows which methods will be called (when the types are concretely inferrable), which opens up all sorts of additional possibilities for optimization.

2 Likes

Thanks @tim.holy. What’s the idea of starting with optimize=false? Maybe to compare that with optimized code to see what tricks the compiler can do?

What’s really confusing me here is, how two functions calls have the same @code_typed but different performance?

Here’s the example:

using Setfield

x = (a=1,b=(c=2,d=(e=3,f=(g=4,h=(i=5,j=(k=6,l=(m=7,n=(o=8,p=(q=9,r=(s=10,t=(u=11,v=(w=12,x=(y=13,z=14)))))))))))))

ℓ = @lens _.b.d.f.h.j.l.n.p.r.t.v.x.z
f(x) = x.b.d.f.h.j.l.n.p.r.t.v.x.z
julia> @code_typed get(x,ℓ)
CodeInfo(
1 ─ %1  = Base.getfield(obj, :b)::NamedTuple{(:c, :d),Tuple{Int64,NamedTuple{(:e, :f),Tuple{Int64,NamedTuple{(:g, :h),Tuple{Int64,NamedTuple{(:i, :j),Tuple{Int64,NamedTuple{(:k, :l),Tuple{Int64,NamedTuple{(:m, :n),Tuple{Int64,NamedTuple{(:o, :p),Tuple{Int64,NamedTuple{(:q, :r),Tuple{Int64,NamedTuple{(:s, :t),Tuple{Int64,NamedTuple{(:u, :v),Tuple{Int64,NamedTuple{(:w, :x),Tuple{Int64,NamedTuple{(:y, :z),Tuple{Int64,Int64}}}}}}}}}}}}}}}}}}}}}}}}
│   %2  = Base.getfield(%1, :d)::NamedTuple{(:e, :f),Tuple{Int64,NamedTuple{(:g, :h),Tuple{Int64,NamedTuple{(:i, :j),Tuple{Int64,NamedTuple{(:k, :l),Tuple{Int64,NamedTuple{(:m, :n),Tuple{Int64,NamedTuple{(:o, :p),Tuple{Int64,NamedTuple{(:q, :r),Tuple{Int64,NamedTuple{(:s, :t),Tuple{Int64,NamedTuple{(:u, :v),Tuple{Int64,NamedTuple{(:w, :x),Tuple{Int64,NamedTuple{(:y, :z),Tuple{Int64,Int64}}}}}}}}}}}}}}}}}}}}}}
│   %3  = Base.getfield(%2, :f)::NamedTuple{(:g, :h),Tuple{Int64,NamedTuple{(:i, :j),Tuple{Int64,NamedTuple{(:k, :l),Tuple{Int64,NamedTuple{(:m, :n),Tuple{Int64,NamedTuple{(:o, :p),Tuple{Int64,NamedTuple{(:q, :r),Tuple{Int64,NamedTuple{(:s, :t),Tuple{Int64,NamedTuple{(:u, :v),Tuple{Int64,NamedTuple{(:w, :x),Tuple{Int64,NamedTuple{(:y, :z),Tuple{Int64,Int64}}}}}}}}}}}}}}}}}}}}
│   %4  = Base.getfield(%3, :h)::NamedTuple{(:i, :j),Tuple{Int64,NamedTuple{(:k, :l),Tuple{Int64,NamedTuple{(:m, :n),Tuple{Int64,NamedTuple{(:o, :p),Tuple{Int64,NamedTuple{(:q, :r),Tuple{Int64,NamedTuple{(:s, :t),Tuple{Int64,NamedTuple{(:u, :v),Tuple{Int64,NamedTuple{(:w, :x),Tuple{Int64,NamedTuple{(:y, :z),Tuple{Int64,Int64}}}}}}}}}}}}}}}}}}
│   %5  = Base.getfield(%4, :j)::NamedTuple{(:k, :l),Tuple{Int64,NamedTuple{(:m, :n),Tuple{Int64,NamedTuple{(:o, :p),Tuple{Int64,NamedTuple{(:q, :r),Tuple{Int64,NamedTuple{(:s, :t),Tuple{Int64,NamedTuple{(:u, :v),Tuple{Int64,NamedTuple{(:w, :x),Tuple{Int64,NamedTuple{(:y, :z),Tuple{Int64,Int64}}}}}}}}}}}}}}}}
│   %6  = Base.getfield(%5, :l)::NamedTuple{(:m, :n),Tuple{Int64,NamedTuple{(:o, :p),Tuple{Int64,NamedTuple{(:q, :r),Tuple{Int64,NamedTuple{(:s, :t),Tuple{Int64,NamedTuple{(:u, :v),Tuple{Int64,NamedTuple{(:w, :x),Tuple{Int64,NamedTuple{(:y, :z),Tuple{Int64,Int64}}}}}}}}}}}}}}
│   %7  = Base.getfield(%6, :n)::NamedTuple{(:o, :p),Tuple{Int64,NamedTuple{(:q, :r),Tuple{Int64,NamedTuple{(:s, :t),Tuple{Int64,NamedTuple{(:u, :v),Tuple{Int64,NamedTuple{(:w, :x),Tuple{Int64,NamedTuple{(:y, :z),Tuple{Int64,Int64}}}}}}}}}}}}
│   %8  = Base.getfield(%7, :p)::NamedTuple{(:q, :r),Tuple{Int64,NamedTuple{(:s, :t),Tuple{Int64,NamedTuple{(:u, :v),Tuple{Int64,NamedTuple{(:w, :x),Tuple{Int64,NamedTuple{(:y, :z),Tuple{Int64,Int64}}}}}}}}}}
│   %9  = Base.getfield(%8, :r)::NamedTuple{(:s, :t),Tuple{Int64,NamedTuple{(:u, :v),Tuple{Int64,NamedTuple{(:w, :x),Tuple{Int64,NamedTuple{(:y, :z),Tuple{Int64,Int64}}}}}}}}
│   %10 = Base.getfield(%9, :t)::NamedTuple{(:u, :v),Tuple{Int64,NamedTuple{(:w, :x),Tuple{Int64,NamedTuple{(:y, :z),Tuple{Int64,Int64}}}}}}
│   %11 = Base.getfield(%10, :v)::NamedTuple{(:w, :x),Tuple{Int64,NamedTuple{(:y, :z),Tuple{Int64,Int64}}}}
│   %12 = Base.getfield(%11, :x)::NamedTuple{(:y, :z),Tuple{Int64,Int64}}
│   %13 = Base.getfield(%12, :z)::Int64
└──       return %13
) => Int64


julia> @code_typed f(x)
CodeInfo(
1 ─ %1  = Base.getfield(x, :b)::NamedTuple{(:c, :d),Tuple{Int64,NamedTuple{(:e, :f),Tuple{Int64,NamedTuple{(:g, :h),Tuple{Int64,NamedTuple{(:i, :j),Tuple{Int64,NamedTuple{(:k, :l),Tuple{Int64,NamedTuple{(:m, :n),Tuple{Int64,NamedTuple{(:o, :p),Tuple{Int64,NamedTuple{(:q, :r),Tuple{Int64,NamedTuple{(:s, :t),Tuple{Int64,NamedTuple{(:u, :v),Tuple{Int64,NamedTuple{(:w, :x),Tuple{Int64,NamedTuple{(:y, :z),Tuple{Int64,Int64}}}}}}}}}}}}}}}}}}}}}}}}
│   %2  = Base.getfield(%1, :d)::NamedTuple{(:e, :f),Tuple{Int64,NamedTuple{(:g, :h),Tuple{Int64,NamedTuple{(:i, :j),Tuple{Int64,NamedTuple{(:k, :l),Tuple{Int64,NamedTuple{(:m, :n),Tuple{Int64,NamedTuple{(:o, :p),Tuple{Int64,NamedTuple{(:q, :r),Tuple{Int64,NamedTuple{(:s, :t),Tuple{Int64,NamedTuple{(:u, :v),Tuple{Int64,NamedTuple{(:w, :x),Tuple{Int64,NamedTuple{(:y, :z),Tuple{Int64,Int64}}}}}}}}}}}}}}}}}}}}}}
│   %3  = Base.getfield(%2, :f)::NamedTuple{(:g, :h),Tuple{Int64,NamedTuple{(:i, :j),Tuple{Int64,NamedTuple{(:k, :l),Tuple{Int64,NamedTuple{(:m, :n),Tuple{Int64,NamedTuple{(:o, :p),Tuple{Int64,NamedTuple{(:q, :r),Tuple{Int64,NamedTuple{(:s, :t),Tuple{Int64,NamedTuple{(:u, :v),Tuple{Int64,NamedTuple{(:w, :x),Tuple{Int64,NamedTuple{(:y, :z),Tuple{Int64,Int64}}}}}}}}}}}}}}}}}}}}
│   %4  = Base.getfield(%3, :h)::NamedTuple{(:i, :j),Tuple{Int64,NamedTuple{(:k, :l),Tuple{Int64,NamedTuple{(:m, :n),Tuple{Int64,NamedTuple{(:o, :p),Tuple{Int64,NamedTuple{(:q, :r),Tuple{Int64,NamedTuple{(:s, :t),Tuple{Int64,NamedTuple{(:u, :v),Tuple{Int64,NamedTuple{(:w, :x),Tuple{Int64,NamedTuple{(:y, :z),Tuple{Int64,Int64}}}}}}}}}}}}}}}}}}
│   %5  = Base.getfield(%4, :j)::NamedTuple{(:k, :l),Tuple{Int64,NamedTuple{(:m, :n),Tuple{Int64,NamedTuple{(:o, :p),Tuple{Int64,NamedTuple{(:q, :r),Tuple{Int64,NamedTuple{(:s, :t),Tuple{Int64,NamedTuple{(:u, :v),Tuple{Int64,NamedTuple{(:w, :x),Tuple{Int64,NamedTuple{(:y, :z),Tuple{Int64,Int64}}}}}}}}}}}}}}}}
│   %6  = Base.getfield(%5, :l)::NamedTuple{(:m, :n),Tuple{Int64,NamedTuple{(:o, :p),Tuple{Int64,NamedTuple{(:q, :r),Tuple{Int64,NamedTuple{(:s, :t),Tuple{Int64,NamedTuple{(:u, :v),Tuple{Int64,NamedTuple{(:w, :x),Tuple{Int64,NamedTuple{(:y, :z),Tuple{Int64,Int64}}}}}}}}}}}}}}
│   %7  = Base.getfield(%6, :n)::NamedTuple{(:o, :p),Tuple{Int64,NamedTuple{(:q, :r),Tuple{Int64,NamedTuple{(:s, :t),Tuple{Int64,NamedTuple{(:u, :v),Tuple{Int64,NamedTuple{(:w, :x),Tuple{Int64,NamedTuple{(:y, :z),Tuple{Int64,Int64}}}}}}}}}}}}
│   %8  = Base.getfield(%7, :p)::NamedTuple{(:q, :r),Tuple{Int64,NamedTuple{(:s, :t),Tuple{Int64,NamedTuple{(:u, :v),Tuple{Int64,NamedTuple{(:w, :x),Tuple{Int64,NamedTuple{(:y, :z),Tuple{Int64,Int64}}}}}}}}}}
│   %9  = Base.getfield(%8, :r)::NamedTuple{(:s, :t),Tuple{Int64,NamedTuple{(:u, :v),Tuple{Int64,NamedTuple{(:w, :x),Tuple{Int64,NamedTuple{(:y, :z),Tuple{Int64,Int64}}}}}}}}
│   %10 = Base.getfield(%9, :t)::NamedTuple{(:u, :v),Tuple{Int64,NamedTuple{(:w, :x),Tuple{Int64,NamedTuple{(:y, :z),Tuple{Int64,Int64}}}}}}
│   %11 = Base.getfield(%10, :v)::NamedTuple{(:w, :x),Tuple{Int64,NamedTuple{(:y, :z),Tuple{Int64,Int64}}}}
│   %12 = Base.getfield(%11, :x)::NamedTuple{(:y, :z),Tuple{Int64,Int64}}
│   %13 = Base.getfield(%12, :z)::Int64
└──       return %13
) => Int64

but

julia> @btime get($x,$ℓ)
  1.162 ns (0 allocations: 0 bytes)
14

julia> @btime f($x)
  0.020 ns (0 allocations: 0 bytes)
14
1 Like

Your last measured time is completely spurious; 0.02ns would imply you have a CPU with at least a 50GHz clock and that the whole operation takes a single cycle. Try using the setup argument of @btime.

3 Likes

What do you mean by spurious? The results are consistent over many runs.

It’s really common for me to see 0.01ns or 0.02ns when everything just compiles away. I know at such small values we can’t really assume high accuracy, but we still see

julia> judge(median(@benchmark get($x,$ℓ)), median(@benchmark f($x)))
BenchmarkTools.TrialJudgement: 
  time:   +5860.00% => regression (5.00% tolerance)
  memory: +0.00% => invariant (1.00% tolerance)

What would you suggest including in the setup?

So you are probably seeing benchmarking artifacts. ~1ns is basically the overhead of a call and what you are trying to measure is below the resolution of BenchmarkingTools.

The difference in time you are seeing probably comes down to a difference in how the benchmarking harness got compiled.

3 Likes

Nice thread to learn things. I do reproduce the OP bencharks, but effectively, if we do a larger benchmark by hand, the differences go away:

julia> function bench(f,x...)
         s = 0
         for i in 1:10000
           s = s + f(x...)
         end
         s
       end
bench (generic function with 1 method)

julia> @btime bench(get,$x,$ℓ)
  1.749 ns (0 allocations: 0 bytes)
140000

julia> @btime bench(f,$x)
  1.749 ns (0 allocations: 0 bytes)
140000


Still, I am being tricked by the compiler optimizations, as the time and results do not depend on the number of evaluations.

1 Like

Your benchmark is flawed though, because the loop gets compiled away:

julia> @code_native bench(f, x)
	.text
; ┌ @ REPL[16]:3 within `bench'
	imulq	$10000, 104(%rdi), %rax # imm = 0x2710
; │ @ REPL[16]:6 within `bench'
	retq
	nopl	(%rax)
; └

You can use this instead

function bench(f, x...)
    s = 0
    for s in 1:1_000_000
        s ^= f(x...)
    end
    s
end

With this I see

julia> @btime bench(f, x)
  45.023 ms (2968238 allocations: 152.10 MiB)
0

julia> @btime bench(get, x, ℓ)
  39.292 ms (1968238 allocations: 30.03 MiB)
0

The difference in timing is purely noise, because you can verify that both @code_native bench(f, x) and @code_native bench(get, x, ℓ) generate the same native code (and they retain the loop).

2 Likes

Note those allocations. Those are because of heuristics causing bench not to specialize on the arguments.
We can get rid of them via

using Setfield
function bench(f::F, x::Vararg{Any,K}) where {F,K}
    s = 0
    for s in 1:1_000_000
        s ^= f(x...)
    end
    s
end
x = (a=1,b=(c=2,d=(e=3,f=(g=4,h=(i=5,j=(k=6,l=(m=7,n=(o=8,p=(q=9,r=(s=10,t=(u=11,v=(w=12,x=(y=13,z=14)))))))))))))

ℓ = @lens _.b.d.f.h.j.l.n.p.r.t.v.x.z
f(x) = x.b.d.f.h.j.l.n.p.r.t.v.x.z

@btime bench(f, $x)
@btime bench(get, $x, $ℓ)

yielding

julia> @btime bench(f, $x)
  2.821 ms (0 allocations: 0 bytes)
0

julia> @btime bench(get, $x, $ℓ)
  2.985 ms (0 allocations: 0 bytes)
0
4 Likes

Ah yes, thanks. I often forget about this. If I understand correctly, the following should specialize too, right?

function bench(f::Any, x::Vararg{Any})