I’m having trouble replicating the performance of SVector{3, Float64}
with my own custom Vec
type. Here’s the code:
using LinearAlgebra
using StaticArrays
using BenchmarkTools
struct Vec
x::Float64
y::Float64
z::Float64
end
Vec(u) = Vec(u[1], u[2], u[3])
function Base.iterate(v::Vec, state=1)
if state == 1
v.x, 2
elseif state == 2
v.y, 3
elseif state == 3
v.z, 4
else
nothing
end
end
Base.length(v::Vec) = 3
Base.eltype(::Type{Vec}) = Float64
Base.IteratorSize(::Type{Vec}) = Base.HasLength()
Base.IteratorEltype(::Type{Vec}) = Base.HasEltype()
vs = [SVector{3}(rand(3)) for _ in 1:10000]
ps = [Vec(svec) for svec in vs]
x = SVector(2.0, 3.0, 4.0)
y = Vec(2, 3, 4)
function foo(vs, x)
dot.(vs, Ref(x))
end
@btime foo($vs, $x);
@btime foo($ps, $y);
And here are the benchmark results:
julia> @btime foo($vs, $x);
2.674 μs (2 allocations: 78.17 KiB)
julia> @btime foo($ps, $y);
16.559 μs (2 allocations: 78.17 KiB)
Am I doing something wrong? Is there anything easy I can do to improve the performance of my Vec
type? One of the guiding principles of the Julia language is that users should be able to get the same performance with custom types that they would get from Base types. However, that doesn’t seem to be the case with StaticArrays (well, I know that StaticArrays is not in Base, but it’s the same idea). They seem to have some extra magic to squeeze out the maximum performance from SVector
. Or maybe it’s just that the compiler is better at optimizing NTuple
s?