@code_lowered vs @code_typed

Where can I find documentation of the difference between these? The main Julia docs aren’t very clear about it.

I have an example of two functions where @code_typed is identical, but @code_lowered is not. There’s a difference in performance, and I’d like to understand what’s going on.

If you’ve not noticed already, the docs for code_lowered and code_typed are better than their @-cousins.

You probably want to initially use optimize=false when comparing the outputs of code_lowered and code_typed. The rough pipeline is lowering->type inference->optimization. After type inference Julia knows which methods will be called (when the types are concretely inferrable), which opens up all sorts of additional possibilities for optimization.

Thanks @tim.holy. What’s the idea of starting with optimize=false? Maybe to compare that with optimized code to see what tricks the compiler can do?

What’s really confusing me here is, how two functions calls have the same @code_typed but different performance?

Here’s the example:

using Setfield

x = (a=1,b=(c=2,d=(e=3,f=(g=4,h=(i=5,j=(k=6,l=(m=7,n=(o=8,p=(q=9,r=(s=10,t=(u=11,v=(w=12,x=(y=13,z=14)))))))))))))

ℓ = @lens _.b.d.f.h.j.l.n.p.r.t.v.x.z
f(x) = x.b.d.f.h.j.l.n.p.r.t.v.x.z
julia> @code_typed get(x,ℓ)
CodeInfo(
1 ─ %1  = Base.getfield(obj, :b)::NamedTuple{(:c, :d),Tuple{Int64,NamedTuple{(:e, :f),Tuple{Int64,NamedTuple{(:g, :h),Tuple{Int64,NamedTuple{(:i, :j),Tuple{Int64,NamedTuple{(:k, :l),Tuple{Int64,NamedTuple{(:m, :n),Tuple{Int64,NamedTuple{(:o, :p),Tuple{Int64,NamedTuple{(:q, :r),Tuple{Int64,NamedTuple{(:s, :t),Tuple{Int64,NamedTuple{(:u, :v),Tuple{Int64,NamedTuple{(:w, :x),Tuple{Int64,NamedTuple{(:y, :z),Tuple{Int64,Int64}}}}}}}}}}}}}}}}}}}}}}}}
│   %2  = Base.getfield(%1, :d)::NamedTuple{(:e, :f),Tuple{Int64,NamedTuple{(:g, :h),Tuple{Int64,NamedTuple{(:i, :j),Tuple{Int64,NamedTuple{(:k, :l),Tuple{Int64,NamedTuple{(:m, :n),Tuple{Int64,NamedTuple{(:o, :p),Tuple{Int64,NamedTuple{(:q, :r),Tuple{Int64,NamedTuple{(:s, :t),Tuple{Int64,NamedTuple{(:u, :v),Tuple{Int64,NamedTuple{(:w, :x),Tuple{Int64,NamedTuple{(:y, :z),Tuple{Int64,Int64}}}}}}}}}}}}}}}}}}}}}}
│   %3  = Base.getfield(%2, :f)::NamedTuple{(:g, :h),Tuple{Int64,NamedTuple{(:i, :j),Tuple{Int64,NamedTuple{(:k, :l),Tuple{Int64,NamedTuple{(:m, :n),Tuple{Int64,NamedTuple{(:o, :p),Tuple{Int64,NamedTuple{(:q, :r),Tuple{Int64,NamedTuple{(:s, :t),Tuple{Int64,NamedTuple{(:u, :v),Tuple{Int64,NamedTuple{(:w, :x),Tuple{Int64,NamedTuple{(:y, :z),Tuple{Int64,Int64}}}}}}}}}}}}}}}}}}}}
│   %4  = Base.getfield(%3, :h)::NamedTuple{(:i, :j),Tuple{Int64,NamedTuple{(:k, :l),Tuple{Int64,NamedTuple{(:m, :n),Tuple{Int64,NamedTuple{(:o, :p),Tuple{Int64,NamedTuple{(:q, :r),Tuple{Int64,NamedTuple{(:s, :t),Tuple{Int64,NamedTuple{(:u, :v),Tuple{Int64,NamedTuple{(:w, :x),Tuple{Int64,NamedTuple{(:y, :z),Tuple{Int64,Int64}}}}}}}}}}}}}}}}}}
│   %5  = Base.getfield(%4, :j)::NamedTuple{(:k, :l),Tuple{Int64,NamedTuple{(:m, :n),Tuple{Int64,NamedTuple{(:o, :p),Tuple{Int64,NamedTuple{(:q, :r),Tuple{Int64,NamedTuple{(:s, :t),Tuple{Int64,NamedTuple{(:u, :v),Tuple{Int64,NamedTuple{(:w, :x),Tuple{Int64,NamedTuple{(:y, :z),Tuple{Int64,Int64}}}}}}}}}}}}}}}}
│   %6  = Base.getfield(%5, :l)::NamedTuple{(:m, :n),Tuple{Int64,NamedTuple{(:o, :p),Tuple{Int64,NamedTuple{(:q, :r),Tuple{Int64,NamedTuple{(:s, :t),Tuple{Int64,NamedTuple{(:u, :v),Tuple{Int64,NamedTuple{(:w, :x),Tuple{Int64,NamedTuple{(:y, :z),Tuple{Int64,Int64}}}}}}}}}}}}}}
│   %7  = Base.getfield(%6, :n)::NamedTuple{(:o, :p),Tuple{Int64,NamedTuple{(:q, :r),Tuple{Int64,NamedTuple{(:s, :t),Tuple{Int64,NamedTuple{(:u, :v),Tuple{Int64,NamedTuple{(:w, :x),Tuple{Int64,NamedTuple{(:y, :z),Tuple{Int64,Int64}}}}}}}}}}}}
│   %8  = Base.getfield(%7, :p)::NamedTuple{(:q, :r),Tuple{Int64,NamedTuple{(:s, :t),Tuple{Int64,NamedTuple{(:u, :v),Tuple{Int64,NamedTuple{(:w, :x),Tuple{Int64,NamedTuple{(:y, :z),Tuple{Int64,Int64}}}}}}}}}}
│   %9  = Base.getfield(%8, :r)::NamedTuple{(:s, :t),Tuple{Int64,NamedTuple{(:u, :v),Tuple{Int64,NamedTuple{(:w, :x),Tuple{Int64,NamedTuple{(:y, :z),Tuple{Int64,Int64}}}}}}}}
│   %10 = Base.getfield(%9, :t)::NamedTuple{(:u, :v),Tuple{Int64,NamedTuple{(:w, :x),Tuple{Int64,NamedTuple{(:y, :z),Tuple{Int64,Int64}}}}}}
│   %11 = Base.getfield(%10, :v)::NamedTuple{(:w, :x),Tuple{Int64,NamedTuple{(:y, :z),Tuple{Int64,Int64}}}}
│   %12 = Base.getfield(%11, :x)::NamedTuple{(:y, :z),Tuple{Int64,Int64}}
│   %13 = Base.getfield(%12, :z)::Int64
└──       return %13
) => Int64


julia> @code_typed f(x)
CodeInfo(
1 ─ %1  = Base.getfield(x, :b)::NamedTuple{(:c, :d),Tuple{Int64,NamedTuple{(:e, :f),Tuple{Int64,NamedTuple{(:g, :h),Tuple{Int64,NamedTuple{(:i, :j),Tuple{Int64,NamedTuple{(:k, :l),Tuple{Int64,NamedTuple{(:m, :n),Tuple{Int64,NamedTuple{(:o, :p),Tuple{Int64,NamedTuple{(:q, :r),Tuple{Int64,NamedTuple{(:s, :t),Tuple{Int64,NamedTuple{(:u, :v),Tuple{Int64,NamedTuple{(:w, :x),Tuple{Int64,NamedTuple{(:y, :z),Tuple{Int64,Int64}}}}}}}}}}}}}}}}}}}}}}}}
│   %2  = Base.getfield(%1, :d)::NamedTuple{(:e, :f),Tuple{Int64,NamedTuple{(:g, :h),Tuple{Int64,NamedTuple{(:i, :j),Tuple{Int64,NamedTuple{(:k, :l),Tuple{Int64,NamedTuple{(:m, :n),Tuple{Int64,NamedTuple{(:o, :p),Tuple{Int64,NamedTuple{(:q, :r),Tuple{Int64,NamedTuple{(:s, :t),Tuple{Int64,NamedTuple{(:u, :v),Tuple{Int64,NamedTuple{(:w, :x),Tuple{Int64,NamedTuple{(:y, :z),Tuple{Int64,Int64}}}}}}}}}}}}}}}}}}}}}}
│   %3  = Base.getfield(%2, :f)::NamedTuple{(:g, :h),Tuple{Int64,NamedTuple{(:i, :j),Tuple{Int64,NamedTuple{(:k, :l),Tuple{Int64,NamedTuple{(:m, :n),Tuple{Int64,NamedTuple{(:o, :p),Tuple{Int64,NamedTuple{(:q, :r),Tuple{Int64,NamedTuple{(:s, :t),Tuple{Int64,NamedTuple{(:u, :v),Tuple{Int64,NamedTuple{(:w, :x),Tuple{Int64,NamedTuple{(:y, :z),Tuple{Int64,Int64}}}}}}}}}}}}}}}}}}}}
│   %4  = Base.getfield(%3, :h)::NamedTuple{(:i, :j),Tuple{Int64,NamedTuple{(:k, :l),Tuple{Int64,NamedTuple{(:m, :n),Tuple{Int64,NamedTuple{(:o, :p),Tuple{Int64,NamedTuple{(:q, :r),Tuple{Int64,NamedTuple{(:s, :t),Tuple{Int64,NamedTuple{(:u, :v),Tuple{Int64,NamedTuple{(:w, :x),Tuple{Int64,NamedTuple{(:y, :z),Tuple{Int64,Int64}}}}}}}}}}}}}}}}}}
│   %5  = Base.getfield(%4, :j)::NamedTuple{(:k, :l),Tuple{Int64,NamedTuple{(:m, :n),Tuple{Int64,NamedTuple{(:o, :p),Tuple{Int64,NamedTuple{(:q, :r),Tuple{Int64,NamedTuple{(:s, :t),Tuple{Int64,NamedTuple{(:u, :v),Tuple{Int64,NamedTuple{(:w, :x),Tuple{Int64,NamedTuple{(:y, :z),Tuple{Int64,Int64}}}}}}}}}}}}}}}}
│   %6  = Base.getfield(%5, :l)::NamedTuple{(:m, :n),Tuple{Int64,NamedTuple{(:o, :p),Tuple{Int64,NamedTuple{(:q, :r),Tuple{Int64,NamedTuple{(:s, :t),Tuple{Int64,NamedTuple{(:u, :v),Tuple{Int64,NamedTuple{(:w, :x),Tuple{Int64,NamedTuple{(:y, :z),Tuple{Int64,Int64}}}}}}}}}}}}}}
│   %7  = Base.getfield(%6, :n)::NamedTuple{(:o, :p),Tuple{Int64,NamedTuple{(:q, :r),Tuple{Int64,NamedTuple{(:s, :t),Tuple{Int64,NamedTuple{(:u, :v),Tuple{Int64,NamedTuple{(:w, :x),Tuple{Int64,NamedTuple{(:y, :z),Tuple{Int64,Int64}}}}}}}}}}}}
│   %8  = Base.getfield(%7, :p)::NamedTuple{(:q, :r),Tuple{Int64,NamedTuple{(:s, :t),Tuple{Int64,NamedTuple{(:u, :v),Tuple{Int64,NamedTuple{(:w, :x),Tuple{Int64,NamedTuple{(:y, :z),Tuple{Int64,Int64}}}}}}}}}}
│   %9  = Base.getfield(%8, :r)::NamedTuple{(:s, :t),Tuple{Int64,NamedTuple{(:u, :v),Tuple{Int64,NamedTuple{(:w, :x),Tuple{Int64,NamedTuple{(:y, :z),Tuple{Int64,Int64}}}}}}}}
│   %10 = Base.getfield(%9, :t)::NamedTuple{(:u, :v),Tuple{Int64,NamedTuple{(:w, :x),Tuple{Int64,NamedTuple{(:y, :z),Tuple{Int64,Int64}}}}}}
│   %11 = Base.getfield(%10, :v)::NamedTuple{(:w, :x),Tuple{Int64,NamedTuple{(:y, :z),Tuple{Int64,Int64}}}}
│   %12 = Base.getfield(%11, :x)::NamedTuple{(:y, :z),Tuple{Int64,Int64}}
│   %13 = Base.getfield(%12, :z)::Int64
└──       return %13
) => Int64

but

julia> @btime get($x,$ℓ)
  1.162 ns (0 allocations: 0 bytes)
14

julia> @btime f($x)
  0.020 ns (0 allocations: 0 bytes)
14

Your last measured time is completely spurious; 0.02ns would imply you have a CPU with at least a 50GHz clock and that the whole operation takes a single cycle. Try using the setup argument of @btime.

What do you mean by spurious? The results are consistent over many runs.

It’s really common for me to see 0.01ns or 0.02ns when everything just compiles away. I know at such small values we can’t really assume high accuracy, but we still see

julia> judge(median(@benchmark get($x,$ℓ)), median(@benchmark f($x)))
BenchmarkTools.TrialJudgement: 
  time:   +5860.00% => regression (5.00% tolerance)
  memory: +0.00% => invariant (1.00% tolerance)

What would you suggest including in the setup?

So you are probably seeing benchmarking artifacts. ~1ns is basically the overhead of a call and what you are trying to measure is below the resolution of BenchmarkingTools.

The difference in time you are seeing probably comes down to a difference in how the benchmarking harness got compiled.

Nice thread to learn things. I do reproduce the OP bencharks, but effectively, if we do a larger benchmark by hand, the differences go away:

julia> function bench(f,x...)
         s = 0
         for i in 1:10000
           s = s + f(x...)
         end
         s
       end
bench (generic function with 1 method)

julia> @btime bench(get,$x,$ℓ)
  1.749 ns (0 allocations: 0 bytes)
140000

julia> @btime bench(f,$x)
  1.749 ns (0 allocations: 0 bytes)
140000


Still, I am being tricked by the compiler optimizations, as the time and results do not depend on the number of evaluations.

Your benchmark is flawed though, because the loop gets compiled away:

julia> @code_native bench(f, x)
	.text
; ┌ @ REPL[16]:3 within `bench'
	imulq	$10000, 104(%rdi), %rax # imm = 0x2710
; │ @ REPL[16]:6 within `bench'
	retq
	nopl	(%rax)
; └

You can use this instead

function bench(f, x...)
    s = 0
    for s in 1:1_000_000
        s ^= f(x...)
    end
    s
end

With this I see

julia> @btime bench(f, x)
  45.023 ms (2968238 allocations: 152.10 MiB)
0

julia> @btime bench(get, x, ℓ)
  39.292 ms (1968238 allocations: 30.03 MiB)
0

The difference in timing is purely noise, because you can verify that both @code_native bench(f, x) and @code_native bench(get, x, ℓ) generate the same native code (and they retain the loop).

Note those allocations. Those are because of heuristics causing bench not to specialize on the arguments.
We can get rid of them via

using Setfield
function bench(f::F, x::Vararg{Any,K}) where {F,K}
    s = 0
    for s in 1:1_000_000
        s ^= f(x...)
    end
    s
end
x = (a=1,b=(c=2,d=(e=3,f=(g=4,h=(i=5,j=(k=6,l=(m=7,n=(o=8,p=(q=9,r=(s=10,t=(u=11,v=(w=12,x=(y=13,z=14)))))))))))))

ℓ = @lens _.b.d.f.h.j.l.n.p.r.t.v.x.z
f(x) = x.b.d.f.h.j.l.n.p.r.t.v.x.z

@btime bench(f, $x)
@btime bench(get, $x, $ℓ)

yielding

julia> @btime bench(f, $x)
  2.821 ms (0 allocations: 0 bytes)
0

julia> @btime bench(get, $x, $ℓ)
  2.985 ms (0 allocations: 0 bytes)
0

Ah yes, thanks. I often forget about this. If I understand correctly, the following should specialize too, right?

function bench(f::Any, x::Vararg{Any})