What's the benefit of inlining?

question

#1

https://docs.julialang.org/en/stable/stdlib/base/#Base.@inline

I wonder why we want to inline a function by @inline or tacitly allow the compiler to do so automatically? Any benefit of having a function inlined? Is there any difference between an inlined version of a function and its un-inlined version from user point of view? Thanks.


#2

You can see an example here. Copy and pasting foobar_lv2’s example:

@inline foo_in(n) = (n, Vector{Int}(n))
@noinline foo_ni(n) = (n, Vector{Int}(n))

function ft_in(n)
       s= 0
       for i= 1:n
        jj,v = foo_in(i)
       s+=sum(v)
       end
       s
end

function ft_ni(n)
       s= 0
       for i= 1:n
        jj,v = foo_ni(i)
       s+=sum(v)
       end
       s
end

@time ft_in(1000)
  0.001948 seconds (1.00 k allocations: 3.962 MiB)

@time ft_ni(1000)
  0.002083 seconds (2.00 k allocations: 3.992 MiB)

That example was to demonstrate allocations (which will hopefully be fixed). If you want an example for runtime:

julia> using BenchmarkTools

julia> @noinline ni(a, b, c) = a * b + c
ni (generic function with 1 method)

julia> @inline fin(a, b, c) = a * b + c
in (generic function with 1 method)

julia> function muladd_fni(va, vb, vc)
           out = zero(eltype(va))
           @assert length(va) == length(vb) == length(vc)
           @inbounds @simd for i ∈ eachindex(va)
               out += ni(va[i], vb[i], vc[i])
           end
           out
       end
muladd_ni (generic function with 1 method)

julia> function muladd_fin(va, vb, vc)
           out = zero(eltype(va))
           @assert length(va) == length(vb) == length(vc)
           @inbounds @simd for i ∈ eachindex(va)
               out += fin(va[i], vb[i], vc[i])
           end
           out
       end
muladd_in (generic function with 1 method)

julia> muladd_fin(va, vb, vc)
155.9768456827954

julia> muladd_fni(va, vb, vc)
155.97684568279539

julia> @benchmark muladd_fni($va, $vb, $vc)
BenchmarkTools.Trial: 
  memory estimate:  0 bytes
  allocs estimate:  0
  --------------
  minimum time:     898.750 ns (0.00% GC)
  median time:      905.795 ns (0.00% GC)
  mean time:        925.898 ns (0.00% GC)
  maximum time:     2.154 μs (0.00% GC)
  --------------
  samples:          10000
  evals/sample:     44

julia> @benchmark muladd_fin($va, $vb, $vc)
BenchmarkTools.Trial: 
  memory estimate:  0 bytes
  allocs estimate:  0
  --------------
  minimum time:     43.738 ns (0.00% GC)
  median time:      44.548 ns (0.00% GC)
  mean time:        46.434 ns (0.00% GC)
  maximum time:     79.827 ns (0.00% GC)
  --------------
  samples:          10000
  evals/sample:     990

@simd requires inlining.


#3

Thanks for the examples! I guess it should in(va[i], vb[i], vc[i]) in muladd_in?

If so, muladd_in and muladd_ni are not much different. Then I wonder why the benchmark statistics are different.


#4

This has good info https://en.wikipedia.org/wiki/Inline_expansion#Effect_on_performance


#5

Thanks for the pointer!


#6

Oops. I have no idea what was going on there @_@.
Because of my mistake (now fixed!), it should have been the exact same code.

The difference is now dramatic.


#7

Thanks, buddy. Now I’m impressed by inline.


#8

For some reason something odd seems to be going on on my machine, where the first defined instance of a function is sometimes slower than subsequent instances. I’m tired, so I’ll look into that tomorrow.
I add that disclaimer because I updated the times to reflect the redefinition.
(The @code_lowered looks exactly the same.)

I make this comment to point out that you don’t need to write @inline everywhere:

julia> let_julia_decide(a, b, c) = a * b + c
let_julia_decide (generic function with 1 method)

julia> function muladd_ljd(va, vb, vc)
           out = zero(eltype(va))
           @assert length(va) == length(vb) == length(vc)
           @inbounds @simd for i ∈ eachindex(va)
               out += let_julia_decide(va[i], vb[i], vc[i])
           end
           out
       end
muladd_ljd (generic function with 1 method)

julia> muladd_ljd(va, vb, vc)
155.72913245066408

julia> @benchmark muladd_ljd($va, $vb, $vc)
BenchmarkTools.Trial: 
  memory estimate:  0 bytes
  allocs estimate:  0
  --------------
  minimum time:     43.738 ns (0.00% GC)
  median time:      44.275 ns (0.00% GC)
  mean time:        45.947 ns (0.00% GC)
  maximum time:     72.803 ns (0.00% GC)
  --------------
  samples:          10000
  evals/sample:     990

Julia will inline small functions automatically.