You can see an example here. Copy and pasting foobar_lv2’s example:
@inline foo_in(n) = (n, Vector{Int}(n))
@noinline foo_ni(n) = (n, Vector{Int}(n))
function ft_in(n)
s= 0
for i= 1:n
jj,v = foo_in(i)
s+=sum(v)
end
s
end
function ft_ni(n)
s= 0
for i= 1:n
jj,v = foo_ni(i)
s+=sum(v)
end
s
end
@time ft_in(1000)
0.001948 seconds (1.00 k allocations: 3.962 MiB)
@time ft_ni(1000)
0.002083 seconds (2.00 k allocations: 3.992 MiB)
That example was to demonstrate allocations (which will hopefully be fixed). If you want an example for runtime:
julia> using BenchmarkTools
julia> @noinline ni(a, b, c) = a * b + c
ni (generic function with 1 method)
julia> @inline fin(a, b, c) = a * b + c
in (generic function with 1 method)
julia> function muladd_fni(va, vb, vc)
out = zero(eltype(va))
@assert length(va) == length(vb) == length(vc)
@inbounds @simd for i ∈ eachindex(va)
out += ni(va[i], vb[i], vc[i])
end
out
end
muladd_ni (generic function with 1 method)
julia> function muladd_fin(va, vb, vc)
out = zero(eltype(va))
@assert length(va) == length(vb) == length(vc)
@inbounds @simd for i ∈ eachindex(va)
out += fin(va[i], vb[i], vc[i])
end
out
end
muladd_in (generic function with 1 method)
julia> muladd_fin(va, vb, vc)
155.9768456827954
julia> muladd_fni(va, vb, vc)
155.97684568279539
julia> @benchmark muladd_fni($va, $vb, $vc)
BenchmarkTools.Trial:
memory estimate: 0 bytes
allocs estimate: 0
--------------
minimum time: 898.750 ns (0.00% GC)
median time: 905.795 ns (0.00% GC)
mean time: 925.898 ns (0.00% GC)
maximum time: 2.154 μs (0.00% GC)
--------------
samples: 10000
evals/sample: 44
julia> @benchmark muladd_fin($va, $vb, $vc)
BenchmarkTools.Trial:
memory estimate: 0 bytes
allocs estimate: 0
--------------
minimum time: 43.738 ns (0.00% GC)
median time: 44.548 ns (0.00% GC)
mean time: 46.434 ns (0.00% GC)
maximum time: 79.827 ns (0.00% GC)
--------------
samples: 10000
evals/sample: 990
@simd
requires inlining.