How to optimize struct for speed?

I am defining a new struct for a project, which I need to be fast. This struct only contains one parameter, an `MVector` (from the `StaticArrays.jl` package). However, the operations on the struct are 4x slower than those on the MVector directly.

Here is the definition of the struct:

``````struct Foo{N}
dev::MVector{N}
Foo(v::A) where A <: Union{Vector, SVector, MVector} = new{length(v)}(MVector{length(v)}(v))
end

Base.copy(foo::Foo) = Foo(foo.dev)

function Base.:+(a::Foo, b::Foo)
return Foo(a.dev + b.dev)
end
``````

And the benchmarks:

``````julia> mvec = MVector{100}(rand(100))
[...]
julia> x = Foo(rand(100))
[...]
julia> @benchmark copy(\$mvec)
BenchmarkTools.Trial: 10000 samples with 984 evaluations.
Range (min β¦ max):  43.874 ns β¦ 323.259 ns  β GC (min β¦ max): 0.00% β¦ 71.34%
Time  (median):     62.593 ns               β GC (median):    0.00%
Time  (mean Β± Ο):   67.954 ns Β±  24.152 ns  β GC (mean Β± Ο):  3.85% Β±  8.97%

βββββββ                                                  β
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
43.9 ns       Histogram: log(frequency) by time       221 ns <

Memory estimate: 816 bytes, allocs estimate: 1.

julia> @benchmark copy(\$x)
BenchmarkTools.Trial: 10000 samples with 822 evaluations.
Range (min β¦ max):  149.393 ns β¦ 527.189 ns  β GC (min β¦ max): 0.00% β¦ 65.09%
Time  (median):     161.423 ns               β GC (median):    0.00%
Time  (mean Β± Ο):   165.675 ns Β±  33.197 ns  β GC (mean Β± Ο):  2.35% Β±  7.32%

βββββ                                                        β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
149 ns        Histogram: log(frequency) by time        417 ns <

Memory estimate: 832 bytes, allocs estimate: 2.

julia> @benchmark +(\$mvec, \$mvec)
BenchmarkTools.Trial: 10000 samples with 987 evaluations.
Range (min β¦ max):  37.670 ns β¦ 386.473 ns  β GC (min β¦ max): 0.00% β¦ 46.21%
Time  (median):     60.061 ns               β GC (median):    0.00%
Time  (mean Β± Ο):   62.614 ns Β±  24.071 ns  β GC (mean Β± Ο):  4.30% Β±  9.31%

βββββββ                                                  β
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
37.7 ns       Histogram: log(frequency) by time       217 ns <

Memory estimate: 816 bytes, allocs estimate: 1.

julia> @benchmark +(\$x, \$x)
BenchmarkTools.Trial: 10000 samples with 492 evaluations.
Range (min β¦ max):  223.604 ns β¦ 915.809 ns  β GC (min β¦ max): 0.00% β¦ 48.79%
Time  (median):     240.069 ns               β GC (median):    0.00%
Time  (mean Β± Ο):   248.639 ns Β±  56.365 ns  β GC (mean Β± Ο):  2.50% Β±  7.61%

ββββββ                                                        β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
224 ns        Histogram: log(frequency) by time        607 ns <

Memory estimate: 1.61 KiB, allocs estimate: 3.
``````

Is there something I am doing wrong? How can I fix this?

2 Likes

It is likely due to the missing type parameter of `MVector`. It actually has two type parameters: `MVector{S, T}`, where `S` is the size and `T` is the element type.

This should help:

``````struct Foo{N, T}
dev::MVector{N, T}
Foo(v::A) where A <: Union{Vector, SVector, MVector} =
new{length(v), eltype(A)}(MVector{length(v)}(v))
end
``````
3 Likes

This was exactly what was missing. Now performances are identical. Thanks!

You can also consider a definition more like

``````struct Foo{M<:MVector} # or just `struct Foo{M<:AbstractVector}`
dev::M
end
``````

where the vector type `M` itself is parameterized. I often find this more convenient if I donβt need to dispatch further on the field typeβs parameters.

Going further, Iβll suggest a few more changes:

``````struct Foo{M<:MVector} # or just `struct Foo{M<:AbstractVector}`
dev::M
# no inner constructor that forces a copy of the input vector
end
Foo(v::AbstractVector) = Foo(MVector{length(v)}(v)) # this constructor creates a MVector copy of `v`
# if the input to `Foo(v)` is already an `MVector`, no copy is made

Base.copy(foo::Foo) = Foo(copy(foo.dev)) # need to make a copy now because the inner constructor no longer does
``````

Notice that I also changed your inner constructor to an outer constructor. The design Iβm using here may be slightly more ergonomic and is something you should maybe consider, if it doesnβt conflict with your other usage. Notice that with your old definitions, `Base.:+` creates a new `MVector` for `a.dev + b.dev`, then copies the result of that to yet-another new `MVector` in the `Foo` callβs inner constructor. Not having a constructor that can work without copying will often lead to performance blockers like this, if performance is something you care about in this situation.

Since you were happy to copy every input anyway, Iβll also make a plug for `SVector` over `MVector`. The only thing you lose is the ability to mutate entries. But for modest sizes itβs equally fast (and often faster, depending on context) to just make an entirely new `SVector` with one value changed (see `setindex`, not the usual `setindex!`, for a convenient function for this) than to mutate a `MVector`. This is because the immutable nature of `SVector` allows additional compiler optimizations. Also, `MVector`s usually require heap allocations to create but `SVector`s almost never do, which is another point is `SVector`βs favor.

I use `SVector`s all the time but I donβt think Iβve used an `MVector` more than once or twice. Itβs possible Iβve never actually used one in βrealβ code.

4 Likes

Trying to understand why the solution proposed by @Dan using this function
Tables.columntype(sch, :Close) is much faster than the others, I came to narrow down the reason for the difference in the use of structures in a similar way (if I understand correctly) to what was discussed here.

``````julia> using CSV, Tables, DataFrames, Dates, TSFrames

julia> tsdf=TSFrame(df);

julia> sch = Tables.schema(tsdf)
Tables.Schema:
:Index   Date
:Open    Float64
:High    Float64
:Low     Float64
:Close   Float64
:Volume  Float64

julia> using BenchmarkTools

julia> n,t= sch.names,sch.types
((:Index, :Open, :High, :Low, :Close, :Volume), (Date, Float64, Float64, Float64, Float64, Float64))

julia> @btime     t[findfirst(==(:Close),\$n)]
191.888 ns (0 allocations: 0 bytes)
Float64

julia> n,t= sch.names,Tuple{sch.types...}
((:Index, :Open, :High, :Low, :Close, :Volume), Tuple{Date, Vararg{Float64, 5}})

julia> @btime     fieldtype(t, findfirst(==(:Close),\$n))
10.010 ns (0 allocations: 0 bytes)
Float64

julia> @btime Tables.columntype(sch, :Close)
12.813 ns (0 allocations: 0 bytes)
Float64
``````

What I couldnβt delve into(*) is why the `fieldtype(Tuple{t...},i)` function is so much faster than `getindex(t,i)`.

(*) ```
julia> @edit fieldtype(t, findfirst(==(:Close),n))
ERROR: could not determine location of method definition
Stacktrace: