Configurations.jl Option Type with StaticArrays.jl

I’m trying to use Configuration.jl to load a TOML config file into an option structure with StaticVectors or Floats. There are two approaches I’ve used so far to try and achieve this, however, they both have terrible performance.

Approach 1: Multiple Dispatch on Base.convert

Consider the following option structure:

using Configurations, StaticArrays, BenchmarkTools

const FloatOrSVector = Union{
    Float64,SVector{2,Float64},SVector{3,Float64},SVector{4,Float64},SVector{5,Float64}
}

@option struct MyStruct
    a::FloatOrSVector
    b::FloatOrSVector
end

here the parameters a and b can either be a Float64 type or a SVector type (up to length 5). I also have a test TOML file (delete.toml) as follows:

a = 1.0
b = 2.0

I would like the possibility to change a parameter to a static vector by entering in the .toml file, for example, a = [1.0, 2.0, 3.0].

Now consider the following simple bench marking function (its implementation is irrelevant) and a function to convert types of Vector{Float64} to type FloatOrSVector:

Base.convert(::Type{FloatOrSVector}, x::Vector{Float64}) = SVector{length(x)}(x)

function main()
    x = from_toml(MyStruct, "delete.toml")
    sum = x.a .+ x.b
    for i in 1:1000000
        sum = sum .+ x.a .+ x.b
    end
    return x
end

and benchmark using @benchmark main():

julia> @benchmark main()
BenchmarkTools.Trial: 14 samples with 1 evaluation.
 Range (min … max):  373.811 ms … 381.526 ms  ┊ GC (min … max): 4.23% … 4.19%
 Time  (median):     377.048 ms               ┊ GC (median):    4.40%
 Time  (mean ± σ):   377.177 ms ±   2.316 ms  ┊ GC (mean ± σ):  4.45% ± 0.26%

  █           ▁▁▁  ▁   ▁        ▁▁   ▁      ▁  ▁▁             ▁  
  █▁▁▁▁▁▁▁▁▁▁▁███▁▁█▁▁▁█▁▁▁▁▁▁▁▁██▁▁▁█▁▁▁▁▁▁█▁▁██▁▁▁▁▁▁▁▁▁▁▁▁▁█ ▁
  374 ms           Histogram: frequency by time          382 ms <

 Memory estimate: 320.44 MiB, allocs estimate: 7000161.

Approach 2: Separate Config Structure:

In this approach, I parameterise MyStruct with T1 and T2, and then make a new structure called MyStructConfig with parameters a and b of types Float64 or Vector{Float64}. I then define a convert function to convert MyStructConfig to MyStruct. See below:

struct MyStruct{T1,T2}
    a::T1
    b::T2
end

@option struct MyStructConfig
    a::Union{Float64,Vector{Float64}}
    b::Union{Float64,Vector{Float64}}
end

function convert(cfg::MyStructConfig)
    a = typeof(cfg.a) === Vector{Float64} ? SVector{length(cfg.a)}(cfg.a) : cfg.a
    b = typeof(cfg.b) === Vector{Float64} ? SVector{length(cfg.b)}(cfg.b) : cfg.b
    return MyStruct(a, b)
end

function main()
    xcfg = from_toml(MyStructConfig, "delete.toml")
    x = convert(xcfg)
    sum = x.a .+ x.b
    for i in 1:1000000
        sum = sum .+ x.a .+ x.b
    end
    return x
end

bench marking this, we get:

@benchmark main()
BenchmarkTools.Trial: 12 samples with 1 evaluation.
 Range (min … max):  418.077 ms … 425.742 ms  ┊ GC (min … max): 4.54% … 4.45%
 Time  (median):     421.507 ms               ┊ GC (median):    4.53%
 Time  (mean ± σ):   421.509 ms ±   2.308 ms  ┊ GC (mean ± σ):  4.68% ± 0.29%

  ▁  ▁      ▁      ▁       ▁▁▁ █           ▁            ▁     ▁  
  █▁▁█▁▁▁▁▁▁█▁▁▁▁▁▁█▁▁▁▁▁▁▁███▁█▁▁▁▁▁▁▁▁▁▁▁█▁▁▁▁▁▁▁▁▁▁▁▁█▁▁▁▁▁█ ▁
  418 ms           Histogram: frequency by time          426 ms <

 Memory estimate: 320.44 MiB, allocs estimate: 7000161.

So overall approach 1 got 377ms (mean) and approach 2 got 422ms (mean). Is this the best performance possible for what I’m trying to do? I’m new to Julia so I don’t understand whats going on behind the scenes. But if its impossible to speed up these examples due to the size of the StaticVector not being known at compile time let me know.

Any insight is appreciated. Cheers.

Note: If I set the parameters a and b to concrete types Float64 (so vectors in the config file are not possible), I get the following bench mark (approx. 100,000x faster):

julia> @btime main()
  4.522 μs (70 allocations: 5.77 KiB)

@Roger-luo

For your second example try using a function barrier:

function _main(x)
    sum = x.a .+ x.b
    for i in 1:1000000
        sum = sum .+ x.a .+ x.b
    end
    return x,sum
end

function main()
    xcfg = from_toml(MyStructConfig, "delete.toml")
    x = convert(xcfg)
    return _main(x)[1]
end

I chose to return some more things because I fear that Julia might optimize the calls out.

This way you isolate the typeinstability to main() and the computation inside _main is typestable and will be fast.

2 Likes

That’s the issue. The code will be fast if the type of x contains all the information about the concrete types of its fields. That’s why using abstract field types or unions is generally not recommended in performance-critical places (like the loop in your example).

Like @abraemer suggested, your second approach together with a function barrier should do the trick. You need the wrapper type to have something with concrete field types, but you also need the barrier because otherwise the code in main still only sees some x which comes from some xcfg and could be one of multiple types – in every loop iteration, the types of x.a and x.b have to be checked. Once you plug x into the “barrier function”, that function will only see x and its actual type and the loop can run much faster, since x.a and x.b have a known type from the start.

2 Likes

Thank you! That definitely increased the speed:

julia> @benchmark main()
BenchmarkTools.Trial: 6292 samples with 1 evaluation.
 Range (min … max):  778.137 μs … 993.571 μs  ┊ GC (min … max): 0.00% … 0.00%
 Time  (median):     786.001 μs               ┊ GC (median):    0.00%
 Time  (mean ± σ):   792.781 μs ±  16.623 μs  ┊ GC (mean ± σ):  0.00% ± 0.00%

  ██                                                             
  ██▇▄▅▆▆▅▄▄▄▄▃▃▃▃▃▃▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▁▂▂▂▂▁▂▁▁▁▁▁▁▁▁▁▁▁ ▂
  778 μs           Histogram: frequency by time          843 μs <

 Memory estimate: 6.55 KiB, allocs estimate: 88.

Uses barrier functions is an interesting idea that I hadn’t thought of yet.

Thanks again!

1 Like

Cheers, yes it makes sense why the barrier function speeds things up now.

1 Like

Nice!

Here is a very useful overview of Performance Tips and pitfalls (including the function barriers) which might be useful.