@code_lowered vs @code_typed

cscherrer · December 6, 2020, 3:32pm

Where can I find documentation of the difference between these? The main Julia docs aren’t very clear about it.

I have an example of two functions where @code_typed is identical, but @code_lowered is not. There’s a difference in performance, and I’d like to understand what’s going on.

tim.holy · December 6, 2020, 3:42pm

If you’ve not noticed already, the docs for code_lowered and code_typed are better than their @-cousins.

You probably want to initially use optimize=false when comparing the outputs of code_lowered and code_typed. The rough pipeline is lowering->type inference->optimization. After type inference Julia knows which methods will be called (when the types are concretely inferrable), which opens up all sorts of additional possibilities for optimization.

cscherrer · December 6, 2020, 3:53pm

Thanks @tim.holy. What’s the idea of starting with optimize=false? Maybe to compare that with optimized code to see what tricks the compiler can do?

What’s really confusing me here is, how two functions calls have the same @code_typed but different performance?

Here’s the example:

using Setfield

x = (a=1,b=(c=2,d=(e=3,f=(g=4,h=(i=5,j=(k=6,l=(m=7,n=(o=8,p=(q=9,r=(s=10,t=(u=11,v=(w=12,x=(y=13,z=14)))))))))))))

ℓ = @lens _.b.d.f.h.j.l.n.p.r.t.v.x.z
f(x) = x.b.d.f.h.j.l.n.p.r.t.v.x.z

julia> @code_typed get(x,ℓ)
CodeInfo(
1 ─ %1  = Base.getfield(obj, :b)::NamedTuple{(:c, :d),Tuple{Int64,NamedTuple{(:e, :f),Tuple{Int64,NamedTuple{(:g, :h),Tuple{Int64,NamedTuple{(:i, :j),Tuple{Int64,NamedTuple{(:k, :l),Tuple{Int64,NamedTuple{(:m, :n),Tuple{Int64,NamedTuple{(:o, :p),Tuple{Int64,NamedTuple{(:q, :r),Tuple{Int64,NamedTuple{(:s, :t),Tuple{Int64,NamedTuple{(:u, :v),Tuple{Int64,NamedTuple{(:w, :x),Tuple{Int64,NamedTuple{(:y, :z),Tuple{Int64,Int64}}}}}}}}}}}}}}}}}}}}}}}}
│   %2  = Base.getfield(%1, :d)::NamedTuple{(:e, :f),Tuple{Int64,NamedTuple{(:g, :h),Tuple{Int64,NamedTuple{(:i, :j),Tuple{Int64,NamedTuple{(:k, :l),Tuple{Int64,NamedTuple{(:m, :n),Tuple{Int64,NamedTuple{(:o, :p),Tuple{Int64,NamedTuple{(:q, :r),Tuple{Int64,NamedTuple{(:s, :t),Tuple{Int64,NamedTuple{(:u, :v),Tuple{Int64,NamedTuple{(:w, :x),Tuple{Int64,NamedTuple{(:y, :z),Tuple{Int64,Int64}}}}}}}}}}}}}}}}}}}}}}
│   %3  = Base.getfield(%2, :f)::NamedTuple{(:g, :h),Tuple{Int64,NamedTuple{(:i, :j),Tuple{Int64,NamedTuple{(:k, :l),Tuple{Int64,NamedTuple{(:m, :n),Tuple{Int64,NamedTuple{(:o, :p),Tuple{Int64,NamedTuple{(:q, :r),Tuple{Int64,NamedTuple{(:s, :t),Tuple{Int64,NamedTuple{(:u, :v),Tuple{Int64,NamedTuple{(:w, :x),Tuple{Int64,NamedTuple{(:y, :z),Tuple{Int64,Int64}}}}}}}}}}}}}}}}}}}}
│   %4  = Base.getfield(%3, :h)::NamedTuple{(:i, :j),Tuple{Int64,NamedTuple{(:k, :l),Tuple{Int64,NamedTuple{(:m, :n),Tuple{Int64,NamedTuple{(:o, :p),Tuple{Int64,NamedTuple{(:q, :r),Tuple{Int64,NamedTuple{(:s, :t),Tuple{Int64,NamedTuple{(:u, :v),Tuple{Int64,NamedTuple{(:w, :x),Tuple{Int64,NamedTuple{(:y, :z),Tuple{Int64,Int64}}}}}}}}}}}}}}}}}}
│   %5  = Base.getfield(%4, :j)::NamedTuple{(:k, :l),Tuple{Int64,NamedTuple{(:m, :n),Tuple{Int64,NamedTuple{(:o, :p),Tuple{Int64,NamedTuple{(:q, :r),Tuple{Int64,NamedTuple{(:s, :t),Tuple{Int64,NamedTuple{(:u, :v),Tuple{Int64,NamedTuple{(:w, :x),Tuple{Int64,NamedTuple{(:y, :z),Tuple{Int64,Int64}}}}}}}}}}}}}}}}
│   %6  = Base.getfield(%5, :l)::NamedTuple{(:m, :n),Tuple{Int64,NamedTuple{(:o, :p),Tuple{Int64,NamedTuple{(:q, :r),Tuple{Int64,NamedTuple{(:s, :t),Tuple{Int64,NamedTuple{(:u, :v),Tuple{Int64,NamedTuple{(:w, :x),Tuple{Int64,NamedTuple{(:y, :z),Tuple{Int64,Int64}}}}}}}}}}}}}}
│   %7  = Base.getfield(%6, :n)::NamedTuple{(:o, :p),Tuple{Int64,NamedTuple{(:q, :r),Tuple{Int64,NamedTuple{(:s, :t),Tuple{Int64,NamedTuple{(:u, :v),Tuple{Int64,NamedTuple{(:w, :x),Tuple{Int64,NamedTuple{(:y, :z),Tuple{Int64,Int64}}}}}}}}}}}}
│   %8  = Base.getfield(%7, :p)::NamedTuple{(:q, :r),Tuple{Int64,NamedTuple{(:s, :t),Tuple{Int64,NamedTuple{(:u, :v),Tuple{Int64,NamedTuple{(:w, :x),Tuple{Int64,NamedTuple{(:y, :z),Tuple{Int64,Int64}}}}}}}}}}
│   %9  = Base.getfield(%8, :r)::NamedTuple{(:s, :t),Tuple{Int64,NamedTuple{(:u, :v),Tuple{Int64,NamedTuple{(:w, :x),Tuple{Int64,NamedTuple{(:y, :z),Tuple{Int64,Int64}}}}}}}}
│   %10 = Base.getfield(%9, :t)::NamedTuple{(:u, :v),Tuple{Int64,NamedTuple{(:w, :x),Tuple{Int64,NamedTuple{(:y, :z),Tuple{Int64,Int64}}}}}}
│   %11 = Base.getfield(%10, :v)::NamedTuple{(:w, :x),Tuple{Int64,NamedTuple{(:y, :z),Tuple{Int64,Int64}}}}
│   %12 = Base.getfield(%11, :x)::NamedTuple{(:y, :z),Tuple{Int64,Int64}}
│   %13 = Base.getfield(%12, :z)::Int64
└──       return %13
) => Int64


julia> @code_typed f(x)
CodeInfo(
1 ─ %1  = Base.getfield(x, :b)::NamedTuple{(:c, :d),Tuple{Int64,NamedTuple{(:e, :f),Tuple{Int64,NamedTuple{(:g, :h),Tuple{Int64,NamedTuple{(:i, :j),Tuple{Int64,NamedTuple{(:k, :l),Tuple{Int64,NamedTuple{(:m, :n),Tuple{Int64,NamedTuple{(:o, :p),Tuple{Int64,NamedTuple{(:q, :r),Tuple{Int64,NamedTuple{(:s, :t),Tuple{Int64,NamedTuple{(:u, :v),Tuple{Int64,NamedTuple{(:w, :x),Tuple{Int64,NamedTuple{(:y, :z),Tuple{Int64,Int64}}}}}}}}}}}}}}}}}}}}}}}}
│   %2  = Base.getfield(%1, :d)::NamedTuple{(:e, :f),Tuple{Int64,NamedTuple{(:g, :h),Tuple{Int64,NamedTuple{(:i, :j),Tuple{Int64,NamedTuple{(:k, :l),Tuple{Int64,NamedTuple{(:m, :n),Tuple{Int64,NamedTuple{(:o, :p),Tuple{Int64,NamedTuple{(:q, :r),Tuple{Int64,NamedTuple{(:s, :t),Tuple{Int64,NamedTuple{(:u, :v),Tuple{Int64,NamedTuple{(:w, :x),Tuple{Int64,NamedTuple{(:y, :z),Tuple{Int64,Int64}}}}}}}}}}}}}}}}}}}}}}
│   %3  = Base.getfield(%2, :f)::NamedTuple{(:g, :h),Tuple{Int64,NamedTuple{(:i, :j),Tuple{Int64,NamedTuple{(:k, :l),Tuple{Int64,NamedTuple{(:m, :n),Tuple{Int64,NamedTuple{(:o, :p),Tuple{Int64,NamedTuple{(:q, :r),Tuple{Int64,NamedTuple{(:s, :t),Tuple{Int64,NamedTuple{(:u, :v),Tuple{Int64,NamedTuple{(:w, :x),Tuple{Int64,NamedTuple{(:y, :z),Tuple{Int64,Int64}}}}}}}}}}}}}}}}}}}}
│   %4  = Base.getfield(%3, :h)::NamedTuple{(:i, :j),Tuple{Int64,NamedTuple{(:k, :l),Tuple{Int64,NamedTuple{(:m, :n),Tuple{Int64,NamedTuple{(:o, :p),Tuple{Int64,NamedTuple{(:q, :r),Tuple{Int64,NamedTuple{(:s, :t),Tuple{Int64,NamedTuple{(:u, :v),Tuple{Int64,NamedTuple{(:w, :x),Tuple{Int64,NamedTuple{(:y, :z),Tuple{Int64,Int64}}}}}}}}}}}}}}}}}}
│   %5  = Base.getfield(%4, :j)::NamedTuple{(:k, :l),Tuple{Int64,NamedTuple{(:m, :n),Tuple{Int64,NamedTuple{(:o, :p),Tuple{Int64,NamedTuple{(:q, :r),Tuple{Int64,NamedTuple{(:s, :t),Tuple{Int64,NamedTuple{(:u, :v),Tuple{Int64,NamedTuple{(:w, :x),Tuple{Int64,NamedTuple{(:y, :z),Tuple{Int64,Int64}}}}}}}}}}}}}}}}
│   %6  = Base.getfield(%5, :l)::NamedTuple{(:m, :n),Tuple{Int64,NamedTuple{(:o, :p),Tuple{Int64,NamedTuple{(:q, :r),Tuple{Int64,NamedTuple{(:s, :t),Tuple{Int64,NamedTuple{(:u, :v),Tuple{Int64,NamedTuple{(:w, :x),Tuple{Int64,NamedTuple{(:y, :z),Tuple{Int64,Int64}}}}}}}}}}}}}}
│   %7  = Base.getfield(%6, :n)::NamedTuple{(:o, :p),Tuple{Int64,NamedTuple{(:q, :r),Tuple{Int64,NamedTuple{(:s, :t),Tuple{Int64,NamedTuple{(:u, :v),Tuple{Int64,NamedTuple{(:w, :x),Tuple{Int64,NamedTuple{(:y, :z),Tuple{Int64,Int64}}}}}}}}}}}}
│   %8  = Base.getfield(%7, :p)::NamedTuple{(:q, :r),Tuple{Int64,NamedTuple{(:s, :t),Tuple{Int64,NamedTuple{(:u, :v),Tuple{Int64,NamedTuple{(:w, :x),Tuple{Int64,NamedTuple{(:y, :z),Tuple{Int64,Int64}}}}}}}}}}
│   %9  = Base.getfield(%8, :r)::NamedTuple{(:s, :t),Tuple{Int64,NamedTuple{(:u, :v),Tuple{Int64,NamedTuple{(:w, :x),Tuple{Int64,NamedTuple{(:y, :z),Tuple{Int64,Int64}}}}}}}}
│   %10 = Base.getfield(%9, :t)::NamedTuple{(:u, :v),Tuple{Int64,NamedTuple{(:w, :x),Tuple{Int64,NamedTuple{(:y, :z),Tuple{Int64,Int64}}}}}}
│   %11 = Base.getfield(%10, :v)::NamedTuple{(:w, :x),Tuple{Int64,NamedTuple{(:y, :z),Tuple{Int64,Int64}}}}
│   %12 = Base.getfield(%11, :x)::NamedTuple{(:y, :z),Tuple{Int64,Int64}}
│   %13 = Base.getfield(%12, :z)::Int64
└──       return %13
) => Int64

but

julia> @btime get($x,$ℓ)
  1.162 ns (0 allocations: 0 bytes)
14

julia> @btime f($x)
  0.020 ns (0 allocations: 0 bytes)
14

tim.holy · December 6, 2020, 4:00pm

Your last measured time is completely spurious; 0.02ns would imply you have a CPU with at least a 50GHz clock and that the whole operation takes a single cycle. Try using the setup argument of @btime.

cscherrer · December 6, 2020, 4:13pm

What do you mean by spurious? The results are consistent over many runs.

It’s really common for me to see 0.01ns or 0.02ns when everything just compiles away. I know at such small values we can’t really assume high accuracy, but we still see

julia> judge(median(@benchmark get($x,$ℓ)), median(@benchmark f($x)))
BenchmarkTools.TrialJudgement: 
  time:   +5860.00% => regression (5.00% tolerance)
  memory: +0.00% => invariant (1.00% tolerance)

What would you suggest including in the setup?

vchuravy · December 6, 2020, 4:31pm

So you are probably seeing benchmarking artifacts. ~1ns is basically the overhead of a call and what you are trying to measure is below the resolution of BenchmarkingTools.

The difference in time you are seeing probably comes down to a difference in how the benchmarking harness got compiled.

lmiq · December 6, 2020, 4:36pm

Nice thread to learn things. I do reproduce the OP bencharks, but effectively, if we do a larger benchmark by hand, the differences go away:

julia> function bench(f,x...)
         s = 0
         for i in 1:10000
           s = s + f(x...)
         end
         s
       end
bench (generic function with 1 method)

julia> @btime bench(get,$x,$ℓ)
  1.749 ns (0 allocations: 0 bytes)
140000

julia> @btime bench(f,$x)
  1.749 ns (0 allocations: 0 bytes)
140000

Still, I am being tricked by the compiler optimizations, as the time and results do not depend on the number of evaluations.

FedericoStra · December 6, 2020, 4:40pm

Your benchmark is flawed though, because the loop gets compiled away:

julia> @code_native bench(f, x)
	.text
; ┌ @ REPL[16]:3 within `bench'
	imulq	$10000, 104(%rdi), %rax # imm = 0x2710
; │ @ REPL[16]:6 within `bench'
	retq
	nopl	(%rax)
; └

You can use this instead

function bench(f, x...)
    s = 0
    for s in 1:1_000_000
        s ^= f(x...)
    end
    s
end

With this I see

julia> @btime bench(f, x)
  45.023 ms (2968238 allocations: 152.10 MiB)
0

julia> @btime bench(get, x, ℓ)
  39.292 ms (1968238 allocations: 30.03 MiB)
0

The difference in timing is purely noise, because you can verify that both @code_native bench(f, x) and @code_native bench(get, x, ℓ) generate the same native code (and they retain the loop).

Elrod · December 6, 2020, 4:58pm

Note those allocations. Those are because of heuristics causing bench not to specialize on the arguments.
We can get rid of them via

using Setfield
function bench(f::F, x::Vararg{Any,K}) where {F,K}
    s = 0
    for s in 1:1_000_000
        s ^= f(x...)
    end
    s
end
x = (a=1,b=(c=2,d=(e=3,f=(g=4,h=(i=5,j=(k=6,l=(m=7,n=(o=8,p=(q=9,r=(s=10,t=(u=11,v=(w=12,x=(y=13,z=14)))))))))))))

ℓ = @lens _.b.d.f.h.j.l.n.p.r.t.v.x.z
f(x) = x.b.d.f.h.j.l.n.p.r.t.v.x.z

@btime bench(f, $x)
@btime bench(get, $x, $ℓ)

yielding

julia> @btime bench(f, $x)
  2.821 ms (0 allocations: 0 bytes)
0

julia> @btime bench(get, $x, $ℓ)
  2.985 ms (0 allocations: 0 bytes)
0

FedericoStra · December 6, 2020, 5:12pm

Ah yes, thanks. I often forget about this. If I understand correctly, the following should specialize too, right?

function bench(f::Any, x::Vararg{Any})

Topic		Replies	Views
Confusion about optimizations in @code_typed Performance	0	335	July 16, 2020
Understanding performance using `@btime` and `@code_warntype`, `@code_llvm`, etc Performance	13	2273	October 14, 2018
Significant decrease in performance after seemingly irrelevant changes Performance	16	1188	June 29, 2019
Shouldn't 1.8.0 be faster than Julia 1.7? Performance	30	2537	September 16, 2022
Huge difference between passing a type, or using a hardcoded-type (in benchmarks)? Performance	6	805	May 1, 2020

@code_lowered vs @code_typed

Related topics