What's the benefit of inlining?

https://docs.julialang.org/en/stable/stdlib/base/#Base.@inline

I wonder why we want to inline a function by @inline or tacitly allow the compiler to do so automatically? Any benefit of having a function inlined? Is there any difference between an inlined version of a function and its un-inlined version from user point of view? Thanks.

You can see an example here. Copy and pasting foobar_lv2’s example:

@inline foo_in(n) = (n, Vector{Int}(n))
@noinline foo_ni(n) = (n, Vector{Int}(n))

function ft_in(n)
       s= 0
       for i= 1:n
        jj,v = foo_in(i)
       s+=sum(v)
       end
       s
end

function ft_ni(n)
       s= 0
       for i= 1:n
        jj,v = foo_ni(i)
       s+=sum(v)
       end
       s
end

@time ft_in(1000)
  0.001948 seconds (1.00 k allocations: 3.962 MiB)

@time ft_ni(1000)
  0.002083 seconds (2.00 k allocations: 3.992 MiB)

That example was to demonstrate allocations (which will hopefully be fixed). If you want an example for runtime:

julia> using BenchmarkTools

julia> @noinline ni(a, b, c) = a * b + c
ni (generic function with 1 method)

julia> @inline fin(a, b, c) = a * b + c
in (generic function with 1 method)

julia> function muladd_fni(va, vb, vc)
           out = zero(eltype(va))
           @assert length(va) == length(vb) == length(vc)
           @inbounds @simd for i ∈ eachindex(va)
               out += ni(va[i], vb[i], vc[i])
           end
           out
       end
muladd_ni (generic function with 1 method)

julia> function muladd_fin(va, vb, vc)
           out = zero(eltype(va))
           @assert length(va) == length(vb) == length(vc)
           @inbounds @simd for i ∈ eachindex(va)
               out += fin(va[i], vb[i], vc[i])
           end
           out
       end
muladd_in (generic function with 1 method)

julia> muladd_fin(va, vb, vc)
155.9768456827954

julia> muladd_fni(va, vb, vc)
155.97684568279539

julia> @benchmark muladd_fni($va, $vb, $vc)
BenchmarkTools.Trial: 
  memory estimate:  0 bytes
  allocs estimate:  0
  --------------
  minimum time:     898.750 ns (0.00% GC)
  median time:      905.795 ns (0.00% GC)
  mean time:        925.898 ns (0.00% GC)
  maximum time:     2.154 μs (0.00% GC)
  --------------
  samples:          10000
  evals/sample:     44

julia> @benchmark muladd_fin($va, $vb, $vc)
BenchmarkTools.Trial: 
  memory estimate:  0 bytes
  allocs estimate:  0
  --------------
  minimum time:     43.738 ns (0.00% GC)
  median time:      44.548 ns (0.00% GC)
  mean time:        46.434 ns (0.00% GC)
  maximum time:     79.827 ns (0.00% GC)
  --------------
  samples:          10000
  evals/sample:     990

@simd requires inlining.

Thanks for the examples! I guess it should in(va[i], vb[i], vc[i]) in muladd_in?

If so, muladd_in and muladd_ni are not much different. Then I wonder why the benchmark statistics are different.

This has good info Inline expansion - Wikipedia

Thanks for the pointer!

Oops. I have no idea what was going on there @_@.
Because of my mistake (now fixed!), it should have been the exact same code.

The difference is now dramatic.

Thanks, buddy. Now I’m impressed by inline.

For some reason something odd seems to be going on on my machine, where the first defined instance of a function is sometimes slower than subsequent instances. I’m tired, so I’ll look into that tomorrow.
I add that disclaimer because I updated the times to reflect the redefinition.
(The @code_lowered looks exactly the same.)

I make this comment to point out that you don’t need to write @inline everywhere:

julia> let_julia_decide(a, b, c) = a * b + c
let_julia_decide (generic function with 1 method)

julia> function muladd_ljd(va, vb, vc)
           out = zero(eltype(va))
           @assert length(va) == length(vb) == length(vc)
           @inbounds @simd for i ∈ eachindex(va)
               out += let_julia_decide(va[i], vb[i], vc[i])
           end
           out
       end
muladd_ljd (generic function with 1 method)

julia> muladd_ljd(va, vb, vc)
155.72913245066408

julia> @benchmark muladd_ljd($va, $vb, $vc)
BenchmarkTools.Trial: 
  memory estimate:  0 bytes
  allocs estimate:  0
  --------------
  minimum time:     43.738 ns (0.00% GC)
  median time:      44.275 ns (0.00% GC)
  mean time:        45.947 ns (0.00% GC)
  maximum time:     72.803 ns (0.00% GC)
  --------------
  samples:          10000
  evals/sample:     990

Julia will inline small functions automatically.