Suppose I write a function that operates over (simple) arrays, even just taking floats as an input to construct the array, can Julia optimize the whole thing into a simple expression?
function f(a::Float64,b::Float64)
[a b] * [a+1; b+1]
end
versus
function g(a::Float64, b::Float64)
a * (a+1) + b * (b+1)
end
Would it be necessary perhaps for me to use a more specialized matrix computation compiler library, for instance, or is this something we might see the Julia compiler do someday? Or is this something that might happen only in the JIT level, perhaps?
You can even use MVectors if the calls get inlined:
julia> using StaticArrays, BenchmarkTools
julia> function f(a,b)
[a, b]' * [a+1, b+1]
end
f (generic function with 1 method)
julia> function g(a,b)
@SVector([a, b])' * @SVector([a+1, b+1])
end
g (generic function with 1 method)
julia> function h(a,b)
@MVector([a, b])' * @MVector([a+1, b+1])
end
h (generic function with 1 method)
julia> @btime f(3,4)
53.826 ns (2 allocations: 192 bytes)
32
julia> @btime g(3,4)
0.020 ns (0 allocations: 0 bytes)
32
julia> @btime h(3,4)
15.013 ns (3 allocations: 80 bytes)
32
julia> @inline function mdot(a::AbstractVector{T},b::AbstractVector{T}) where {T}
out = zero(T)
@inbounds for n ∈ eachindex(a,b)
out += a[n] * b[n]
end
out
end
mdot (generic function with 1 method)
julia> function h2(a,b)
mdot(@MVector([a, b]), @MVector([a+1, b+1]))
end
h2 (generic function with 1 method)
julia> @btime h2(3,4)
2.725 ns (0 allocations: 0 bytes)
32
I would guess that the inline heuristics will eventually become smarter about figuring that out on their own.