Type stability and iterating over operations

JADekker · January 21, 2025, 1:56pm

Hi, in my project I have to define a sequence of operations that need to performed on some data. Two ways of capturing this in a struct came to my mind: either store them as a vector of operations (FooVec) (which suggests some typing issues in my @code_warntype calls) or in a recursive definition (FooRec).

I found that in my toy example, FooVec is much quicker despite there being some types that are not being inferred in the function. I was wondering: are both my recursive methods of calculating with FooRec inefficient, or should I expect the vector-version to be much quicker? And in that case, is there a way to improve type inference, or is this unavoidable?

using BenchmarkTools

abstract type OpType end

struct Op1 <: OpType end
run_op!(x, ::Op1) = sqrt(x)

struct Op2 <: OpType end
run_op!(x, ::Op2) = (x+π)^2

struct FooVec{T <: OpType}
    op_vec::Vector{T}
end
abstract type FooRecType end
struct FooRecEnd <: FooRecType end
struct FooRec{T<:OpType, U<:FooRecType} <: FooRecType
    op::T
    remainder::U
end

function MakeFooRec(op_vec::Vector{T}) where T<:OpType
    op_rec = FooRecEnd()
    for op in reverse(op_vec)
        op_rec = FooRec(op, op_rec)
    end
    return op_rec
end

function run_ops(foo_vec::FooVec)
    x = 0.
    for op in foo_vec.op_vec
        x = run_op!(x, op)
    end
    return x
end

function run_ops(foo_rec::FooRec)
    x = 0.
    return run_ops_inner!(x, foo_rec)
end

run_ops_inner!(x, ::FooRecEnd) = x
function run_ops_inner!(x, foo_rec::FooRec)
    x = run_op!(x, foo_rec.op)
    run_ops_inner!(x, foo_rec.remainder)
end

function run_ops_alt(foo_rec::FooRec)
    x = 0.
    while true
        x = run_op!(x, foo_rec.op)
        foo_rec = foo_rec.remainder
        if foo_rec isa FooRecEnd
            break
        end
    end
    return x
end

function RunTestsBasic()
    Op1(); #Force compilation of Op1
    Op2(); #Force compilation of Op2
end

function RunTests_Vec(N; verbose=false)
    op_vec = [n % 2 == 0 ? Op1() : Op2() for n in 1:N]
    foo_vec_timed = @timed FooVec(op_vec)
    verbose && display("foo_vec took $(foo_vec_timed.time) seconds to compile and create the object. ")
    foo_vec_compute = @timed run_ops(foo_vec_timed.value);
    verbose && @code_warntype run_ops(foo_vec_timed.value)
    verbose && display("foo_vec took $(foo_vec_compute.time) seconds to compute. ")
    verbose && display(foo_vec_compute.value)
end

function RunTests_Rec(N; verbose=false)
    op_vec = [n % 2 == 0 ? Op1() : Op2() for n in 1:N]
    foo_rec_timed = @timed MakeFooRec(op_vec)
    verbose && display("foo_rec took $(foo_rec_timed.time) seconds to compile and create the object. ")

    foo_rec_compute = @timed run_ops(foo_rec_timed.value);
    verbose && @code_warntype run_ops(foo_rec_timed.value)
    verbose && display("foo_rec took $(foo_rec_compute.time) seconds to compute. ")
    verbose && display(foo_rec_compute.value)
end

function RunTests_Rec2(N; verbose=false)
    op_vec = [n % 2 == 0 ? Op1() : Op2() for n in 1:N]
    foo_rec = MakeFooRec(op_vec)
    verbose && display("FooRec didn't need compilation")
    foo_rec_compute = @timed run_ops_alt(foo_rec);
    verbose && @code_warntype run_ops_alt(foo_rec)
    verbose && display("foo_rec alt took $(foo_rec_compute.time) seconds to compute. ")
    verbose && display(foo_rec_compute.value)
end

RunTestsBasic()

N = 5
RunTests_Vec(N, verbose=true)
RunTests_Rec(N, verbose=true)
RunTests_Rec2(N, verbose=true)

N_glob = 400
RunTests_Vec(N_glob)
RunTests_Rec(N_glob)
RunTests_Rec2(N_glob)

bench1 = @benchmark RunTests_Vec($N_glob)
display(bench1)
bench2 = @benchmark RunTests_Rec($N_glob)
display(bench2)
bench3 = @benchmark RunTests_Rec2($N_glob)
display(bench3)

The non-benchmarked lines give the following output, where I added comments to lines that are red:

"foo_vec took 2.5e-7 seconds to compile and create the object. "
MethodInstance for run_ops(::FooVec{OpType})
  from run_ops(foo_vec::FooVec) @ Main ~/Documents/ConOptStop_Julia_v0.1/test_compilation_speeds.jl:29
Arguments
  #self#::Core.Const(Main.run_ops)
  foo_vec::FooVec{OpType}
Locals
  @_3::Union{Nothing, Tuple{OpType, Int64}} # RED
  x::Float64
  op::OpType # RED
Body::Float64
1 ─       (x = 0.0)
│   %2  = Base.getproperty(foo_vec, :op_vec)::Vector{OpType}
│         (@_3 = Base.iterate(%2))
│   %4  = @_3::Union{Nothing, Tuple{OpType, Int64}} # RED
│   %5  = (%4 === nothing)::Bool
│   %6  = Base.not_int(%5)::Bool
└──       goto #4 if not %6
2 ┄ %8  = @_3::Tuple{OpType, Int64}
│         (op = Core.getfield(%8, 1)) # RED
│   %10 = Core.getfield(%8, 2)::Int64
│   %11 = x::Float64
│   %12 = op::OpType # RED
│         (x = Main.run_op!(%11, %12))
│         (@_3 = Base.iterate(%2, %10))
│   %15 = @_3::Union{Nothing, Tuple{OpType, Int64}} # RED
│   %16 = (%15 === nothing)::Bool
│   %17 = Base.not_int(%16)::Bool
└──       goto #4 if not %17
3 ─       goto #2
4 ┄ %20 = x::Float64
└──       return %20

"foo_vec took 6.375e-6 seconds to compute. "
88.82643960980423
"foo_rec took 0.000125958 seconds to compile and create the object. "
MethodInstance for run_ops(::FooRec{Op2, FooRec{Op1, FooRec{Op2, FooRec{Op1, FooRec{Op2, FooRecEnd}}}}})
  from run_ops(foo_rec::FooRec) @ Main ~/Documents/ConOptStop_Julia_v0.1/test_compilation_speeds.jl:37
Arguments
  #self#::Core.Const(Main.run_ops)
  foo_rec::Core.Const(FooRec{Op2, FooRec{Op1, FooRec{Op2, FooRec{Op1, FooRec{Op2, FooRecEnd}}}}}(Op2(), FooRec{Op1, FooRec{Op2, FooRec{Op1, FooRec{Op2, FooRecEnd}}}}(Op1(), FooRec{Op2, FooRec{Op1, FooRec{Op2, FooRecEnd}}}(Op2(), FooRec{Op1, FooRec{Op2, FooRecEnd}}(Op1(), FooRec{Op2, FooRecEnd}(Op2(), FooRecEnd()))))))
Locals
  x::Float64
Body::Float64
1 ─      (x = 0.0)
│   %2 = x::Core.Const(0.0)
│   %3 = Main.run_ops_inner!(%2, foo_rec)::Core.Const(88.82643960980423)
└──      return %3

"foo_rec took 0.002928167 seconds to compute. "
88.82643960980423
"FooRec didn't need compilation"
MethodInstance for run_ops_alt(::FooRec{Op2, FooRec{Op1, FooRec{Op2, FooRec{Op1, FooRec{Op2, FooRecEnd}}}}})
  from run_ops_alt(foo_rec::FooRec) @ Main ~/Documents/ConOptStop_Julia_v0.1/test_compilation_speeds.jl:48
Arguments
  #self#::Core.Const(Main.run_ops_alt)
  foo_rec@_2::Core.Const(FooRec{Op2, FooRec{Op1, FooRec{Op2, FooRec{Op1, FooRec{Op2, FooRecEnd}}}}}(Op2(), FooRec{Op1, FooRec{Op2, FooRec{Op1, FooRec{Op2, FooRecEnd}}}}(Op1(), FooRec{Op2, FooRec{Op1, FooRec{Op2, FooRecEnd}}}(Op2(), FooRec{Op1, FooRec{Op2, FooRecEnd}}(Op1(), FooRec{Op2, FooRecEnd}(Op2(), FooRecEnd()))))))
Locals
  x::Float64
  foo_rec@_4::Any # RED
Body::Float64
1 ─       (foo_rec@_4 = foo_rec@_2)
└──       (x = 0.0)
2 ┄       goto #6 if not true
3 ─ %4  = Main.run_op!::Core.Const(Main.run_op!)
│   %5  = x::Float64
│   %6  = foo_rec@_4::Any # RED
│   %7  = Base.getproperty(%6, :op)::Any # RED
│         (x = (%4)(%5, %7))
│   %9  = foo_rec@_4::Any # RED
│         (foo_rec@_4 = Base.getproperty(%9, :remainder))
│   %11 = foo_rec@_4::Any # RED
│   %12 = (%11 isa Main.FooRecEnd)::Bool
└──       goto #5 if not %12
4 ─       goto #6
5 ─       goto #2
6 ┄ %16 = x::Float64
└──       return %16

The benchmarking steps give:

BenchmarkTools.Trial: 10000 samples with 10 evaluations.
 Range (min … max):  1.829 μs … 949.221 μs  ┊ GC (min … max): 0.00% … 99.53%
 Time  (median):     2.054 μs               ┊ GC (median):    0.00%
 Time  (mean ± σ):   2.320 μs ±  12.976 μs  ┊ GC (mean ± σ):  7.88% ±  1.41%

  ▃▅▄▃▃▆██▇▆▄▄▂▂▁▂▃▃▂▁▁                                       ▂
  ██████████████████████████████▇▇▇█▇██▇▇▅▆▆▆▄▅▆▅▅▅▅▄▅▄▂▄▄▃▅▅ █
  1.83 μs      Histogram: log(frequency) by time      3.59 μs <

 Memory estimate: 3.83 KiB, allocs estimate: 18.
BenchmarkTools.Trial: 10000 samples with 1 evaluation.
 Range (min … max):  70.166 μs … 204.625 μs  ┊ GC (min … max): 0.00% … 0.00%
 Time  (median):     71.500 μs               ┊ GC (median):    0.00%
 Time  (mean ± σ):   72.862 μs ±   4.999 μs  ┊ GC (mean ± σ):  0.00% ± 0.00%

  ▄▇█▇▆▅▃▁▁▂▃▂▁▁                                               ▂
  ██████████████▇▇▇▅▆▇▇▇▇▇▆▆▆▆▇▇█▇▇▆▆▅▅▇▇▆▆▆▅▆▆▅▆▅▆▅▆▅▅▅▆▅▄▅▄▅ █
  70.2 μs       Histogram: log(frequency) by time      95.5 μs <

 Memory estimate: 7.00 KiB, allocs estimate: 22.
BenchmarkTools.Trial: 10000 samples with 1 evaluation.
 Range (min … max):  206.583 μs … 568.875 μs  ┊ GC (min … max): 0.00% … 0.00%
 Time  (median):     213.250 μs               ┊ GC (median):    0.00%
 Time  (mean ± σ):   216.658 μs ±  12.829 μs  ┊ GC (mean ± σ):  0.00% ± 0.00%

  ▂▅▆▆▇█▇▆▅▅▃▂▂▂▂▂▂▂▁▁▁▁▁▁▁                                     ▂
  ██████████████████████████▇▇█▇▆▇▆▇▆▇▆▇▆▆▆▇▅▅▅▆▅▅▅▅▅▅▅▄▅▅▆▆▅▅▅ █
  207 μs        Histogram: log(frequency) by time        276 μs <

 Memory estimate: 6.45 KiB, allocs estimate: 9.

lmiq · January 21, 2025, 4:38pm

Your benchmark seems a little bit confuse, because you called @benchmark on a function that itself runs a benchmark, and calls @code_warntype, etc. If you benchmark the call directly, you get:

julia> op_vec = [n % 2 == 0 ? Op1() : Op2() for n in 1:400];

julia> foo_vec = FooVec(op_vec);

julia> @btime run_ops($foo_vec)
  1.004 μs (0 allocations: 0 bytes)
628.3185307179571

Which, for this array which has only two types, is as good as it can be. If you have more types you may start to get dynamic dispatch, and then you can use something as GitHub - JuliaDynamics/LightSumTypes.jl: Easy-to-use sum types in Julia.

JADekker · January 21, 2025, 4:46pm

Thank you for your reply! The function itself runs the benchmarking steps only if verbose = true, while the @benchmark calls are made using verbose = false (the default value), so there shouldn’t be a problem there I think, but I could be seeing it wrong. In reality I do have (quite many) different types of operation indeed, which is why I thought I might use a recursive definition.

On Julia 1.11, can I use a Union type instead of a light sumtype, or would you expect the light sum type to still be much better?

lmiq · January 21, 2025, 5:09pm

I’m not sure. In this thread there is a benchmark: [ANN] LightSumTypes.jl v4 - #15 by Tortar

which suggests that on 1.11 the different is small or even against sum types. But I’m not following closely these developments.

(personally I would prefer to use the package, to guarantee that it is fast on 1.10 which is the current LTS)

Tortar · January 21, 2025, 5:35pm

yes as the developer of the package I want to say that @Imiq is right on his analysis, but I have seen cases where a sumytype still performs better (2x in a macrobenchmark) in 1.11, this seems to be the case when iterators are involved AFAICT. On 1.10 and below the performance is much better instead in almost all cases instead. You can find benchmarks updated to 1.11 at https://juliadynamics.github.io/Agents.jl/stable/performance_tips/#multi_vs_union

bertschi · January 21, 2025, 6:08pm

Would FunctionWrappers.jl be useful in your case?

JADekker · January 23, 2025, 9:36am

Thank you all for your suggestions; just using Vector{T} where {T<:OpType} speeds up the compilation speed in my problem from 31 seconds to 2 seconds and roughly cuts runtimes in half, which is a good improvement already. I’ll check out the effect of sumtypes later on; I’m already happy that the large compilation times are now gone.

In my use-case, the functions will sometimes need to act with ‘Int’, sometimes with ‘Float64’ and sometimes with ‘Dual’ (for autodiff), so it feels easier for me to just dispatch on OpType than wrap everything in a FunctionWrapper, but I will think about this a bit more!

Topic		Replies	Views
[ANN] LightSumTypes.jl v4 Package Announcements package , announcement , performance , macros , type-stability	33	2192	March 26, 2025
Type issue in for loop over items in struct field New to Julia code_warntype , type-stability	19	243	August 19, 2024
Improving performance elegantly - type stability General Usage type-stability	20	1405	January 13, 2019
[ANN] SumTypes.jl v0.4 Package Announcements	21	2489	June 13, 2023
Avoiding Vectors of Abstract Types Performance question , type-stability	22	4429	February 17, 2022

Type stability and iterating over operations

Related topics