I wonder why we want to inline a function by @inline or tacitly allow the compiler to do so automatically? Any benefit of having a function inlined? Is there any difference between an inlined version of a function and its un-inlined version from user point of view? Thanks.
You can see an example here. Copy and pasting foobar_lv2’s example:
@inline foo_in(n) = (n, Vector{Int}(n))
@noinline foo_ni(n) = (n, Vector{Int}(n))
function ft_in(n)
s= 0
for i= 1:n
jj,v = foo_in(i)
s+=sum(v)
end
s
end
function ft_ni(n)
s= 0
for i= 1:n
jj,v = foo_ni(i)
s+=sum(v)
end
s
end
@time ft_in(1000)
0.001948 seconds (1.00 k allocations: 3.962 MiB)
@time ft_ni(1000)
0.002083 seconds (2.00 k allocations: 3.992 MiB)
That example was to demonstrate allocations (which will hopefully be fixed). If you want an example for runtime:
julia> using BenchmarkTools
julia> @noinline ni(a, b, c) = a * b + c
ni (generic function with 1 method)
julia> @inline fin(a, b, c) = a * b + c
in (generic function with 1 method)
julia> function muladd_fni(va, vb, vc)
out = zero(eltype(va))
@assert length(va) == length(vb) == length(vc)
@inbounds @simd for i ∈ eachindex(va)
out += ni(va[i], vb[i], vc[i])
end
out
end
muladd_ni (generic function with 1 method)
julia> function muladd_fin(va, vb, vc)
out = zero(eltype(va))
@assert length(va) == length(vb) == length(vc)
@inbounds @simd for i ∈ eachindex(va)
out += fin(va[i], vb[i], vc[i])
end
out
end
muladd_in (generic function with 1 method)
julia> muladd_fin(va, vb, vc)
155.9768456827954
julia> muladd_fni(va, vb, vc)
155.97684568279539
julia> @benchmark muladd_fni($va, $vb, $vc)
BenchmarkTools.Trial:
memory estimate: 0 bytes
allocs estimate: 0
--------------
minimum time: 898.750 ns (0.00% GC)
median time: 905.795 ns (0.00% GC)
mean time: 925.898 ns (0.00% GC)
maximum time: 2.154 μs (0.00% GC)
--------------
samples: 10000
evals/sample: 44
julia> @benchmark muladd_fin($va, $vb, $vc)
BenchmarkTools.Trial:
memory estimate: 0 bytes
allocs estimate: 0
--------------
minimum time: 43.738 ns (0.00% GC)
median time: 44.548 ns (0.00% GC)
mean time: 46.434 ns (0.00% GC)
maximum time: 79.827 ns (0.00% GC)
--------------
samples: 10000
evals/sample: 990
For some reason something odd seems to be going on on my machine, where the first defined instance of a function is sometimes slower than subsequent instances. I’m tired, so I’ll look into that tomorrow.
I add that disclaimer because I updated the times to reflect the redefinition.
(The @code_lowered looks exactly the same.)
I make this comment to point out that you don’t need to write @inline everywhere:
julia> let_julia_decide(a, b, c) = a * b + c
let_julia_decide (generic function with 1 method)
julia> function muladd_ljd(va, vb, vc)
out = zero(eltype(va))
@assert length(va) == length(vb) == length(vc)
@inbounds @simd for i ∈ eachindex(va)
out += let_julia_decide(va[i], vb[i], vc[i])
end
out
end
muladd_ljd (generic function with 1 method)
julia> muladd_ljd(va, vb, vc)
155.72913245066408
julia> @benchmark muladd_ljd($va, $vb, $vc)
BenchmarkTools.Trial:
memory estimate: 0 bytes
allocs estimate: 0
--------------
minimum time: 43.738 ns (0.00% GC)
median time: 44.275 ns (0.00% GC)
mean time: 45.947 ns (0.00% GC)
maximum time: 72.803 ns (0.00% GC)
--------------
samples: 10000
evals/sample: 990