Type stability and iterating over operations

Hi, in my project I have to define a sequence of operations that need to performed on some data. Two ways of capturing this in a struct came to my mind: either store them as a vector of operations (FooVec) (which suggests some typing issues in my @code_warntype calls) or in a recursive definition (FooRec).

I found that in my toy example, FooVec is much quicker despite there being some types that are not being inferred in the function. I was wondering: are both my recursive methods of calculating with FooRec inefficient, or should I expect the vector-version to be much quicker? And in that case, is there a way to improve type inference, or is this unavoidable?

using BenchmarkTools

abstract type OpType end

struct Op1 <: OpType end
run_op!(x, ::Op1) = sqrt(x)

struct Op2 <: OpType end
run_op!(x, ::Op2) = (x+Ο€)^2

struct FooVec{T <: OpType}
    op_vec::Vector{T}
end
abstract type FooRecType end
struct FooRecEnd <: FooRecType end
struct FooRec{T<:OpType, U<:FooRecType} <: FooRecType
    op::T
    remainder::U
end

function MakeFooRec(op_vec::Vector{T}) where T<:OpType
    op_rec = FooRecEnd()
    for op in reverse(op_vec)
        op_rec = FooRec(op, op_rec)
    end
    return op_rec
end

function run_ops(foo_vec::FooVec)
    x = 0.
    for op in foo_vec.op_vec
        x = run_op!(x, op)
    end
    return x
end

function run_ops(foo_rec::FooRec)
    x = 0.
    return run_ops_inner!(x, foo_rec)
end

run_ops_inner!(x, ::FooRecEnd) = x
function run_ops_inner!(x, foo_rec::FooRec)
    x = run_op!(x, foo_rec.op)
    run_ops_inner!(x, foo_rec.remainder)
end

function run_ops_alt(foo_rec::FooRec)
    x = 0.
    while true
        x = run_op!(x, foo_rec.op)
        foo_rec = foo_rec.remainder
        if foo_rec isa FooRecEnd
            break
        end
    end
    return x
end

function RunTestsBasic()
    Op1(); #Force compilation of Op1
    Op2(); #Force compilation of Op2
end

function RunTests_Vec(N; verbose=false)
    op_vec = [n % 2 == 0 ? Op1() : Op2() for n in 1:N]
    foo_vec_timed = @timed FooVec(op_vec)
    verbose && display("foo_vec took $(foo_vec_timed.time) seconds to compile and create the object. ")
    foo_vec_compute = @timed run_ops(foo_vec_timed.value);
    verbose && @code_warntype run_ops(foo_vec_timed.value)
    verbose && display("foo_vec took $(foo_vec_compute.time) seconds to compute. ")
    verbose && display(foo_vec_compute.value)
end

function RunTests_Rec(N; verbose=false)
    op_vec = [n % 2 == 0 ? Op1() : Op2() for n in 1:N]
    foo_rec_timed = @timed MakeFooRec(op_vec)
    verbose && display("foo_rec took $(foo_rec_timed.time) seconds to compile and create the object. ")

    foo_rec_compute = @timed run_ops(foo_rec_timed.value);
    verbose && @code_warntype run_ops(foo_rec_timed.value)
    verbose && display("foo_rec took $(foo_rec_compute.time) seconds to compute. ")
    verbose && display(foo_rec_compute.value)
end

function RunTests_Rec2(N; verbose=false)
    op_vec = [n % 2 == 0 ? Op1() : Op2() for n in 1:N]
    foo_rec = MakeFooRec(op_vec)
    verbose && display("FooRec didn't need compilation")
    foo_rec_compute = @timed run_ops_alt(foo_rec);
    verbose && @code_warntype run_ops_alt(foo_rec)
    verbose && display("foo_rec alt took $(foo_rec_compute.time) seconds to compute. ")
    verbose && display(foo_rec_compute.value)
end

RunTestsBasic()

N = 5
RunTests_Vec(N, verbose=true)
RunTests_Rec(N, verbose=true)
RunTests_Rec2(N, verbose=true)

N_glob = 400
RunTests_Vec(N_glob)
RunTests_Rec(N_glob)
RunTests_Rec2(N_glob)

bench1 = @benchmark RunTests_Vec($N_glob)
display(bench1)
bench2 = @benchmark RunTests_Rec($N_glob)
display(bench2)
bench3 = @benchmark RunTests_Rec2($N_glob)
display(bench3)

The non-benchmarked lines give the following output, where I added comments to lines that are red:

"foo_vec took 2.5e-7 seconds to compile and create the object. "
MethodInstance for run_ops(::FooVec{OpType})
  from run_ops(foo_vec::FooVec) @ Main ~/Documents/ConOptStop_Julia_v0.1/test_compilation_speeds.jl:29
Arguments
  #self#::Core.Const(Main.run_ops)
  foo_vec::FooVec{OpType}
Locals
  @_3::Union{Nothing, Tuple{OpType, Int64}} # RED
  x::Float64
  op::OpType # RED
Body::Float64
1 ─       (x = 0.0)
β”‚   %2  = Base.getproperty(foo_vec, :op_vec)::Vector{OpType}
β”‚         (@_3 = Base.iterate(%2))
β”‚   %4  = @_3::Union{Nothing, Tuple{OpType, Int64}} # RED
β”‚   %5  = (%4 === nothing)::Bool
β”‚   %6  = Base.not_int(%5)::Bool
└──       goto #4 if not %6
2 β”„ %8  = @_3::Tuple{OpType, Int64}
β”‚         (op = Core.getfield(%8, 1)) # RED
β”‚   %10 = Core.getfield(%8, 2)::Int64
β”‚   %11 = x::Float64
β”‚   %12 = op::OpType # RED
β”‚         (x = Main.run_op!(%11, %12))
β”‚         (@_3 = Base.iterate(%2, %10))
β”‚   %15 = @_3::Union{Nothing, Tuple{OpType, Int64}} # RED
β”‚   %16 = (%15 === nothing)::Bool
β”‚   %17 = Base.not_int(%16)::Bool
└──       goto #4 if not %17
3 ─       goto #2
4 β”„ %20 = x::Float64
└──       return %20

"foo_vec took 6.375e-6 seconds to compute. "
88.82643960980423
"foo_rec took 0.000125958 seconds to compile and create the object. "
MethodInstance for run_ops(::FooRec{Op2, FooRec{Op1, FooRec{Op2, FooRec{Op1, FooRec{Op2, FooRecEnd}}}}})
  from run_ops(foo_rec::FooRec) @ Main ~/Documents/ConOptStop_Julia_v0.1/test_compilation_speeds.jl:37
Arguments
  #self#::Core.Const(Main.run_ops)
  foo_rec::Core.Const(FooRec{Op2, FooRec{Op1, FooRec{Op2, FooRec{Op1, FooRec{Op2, FooRecEnd}}}}}(Op2(), FooRec{Op1, FooRec{Op2, FooRec{Op1, FooRec{Op2, FooRecEnd}}}}(Op1(), FooRec{Op2, FooRec{Op1, FooRec{Op2, FooRecEnd}}}(Op2(), FooRec{Op1, FooRec{Op2, FooRecEnd}}(Op1(), FooRec{Op2, FooRecEnd}(Op2(), FooRecEnd()))))))
Locals
  x::Float64
Body::Float64
1 ─      (x = 0.0)
β”‚   %2 = x::Core.Const(0.0)
β”‚   %3 = Main.run_ops_inner!(%2, foo_rec)::Core.Const(88.82643960980423)
└──      return %3

"foo_rec took 0.002928167 seconds to compute. "
88.82643960980423
"FooRec didn't need compilation"
MethodInstance for run_ops_alt(::FooRec{Op2, FooRec{Op1, FooRec{Op2, FooRec{Op1, FooRec{Op2, FooRecEnd}}}}})
  from run_ops_alt(foo_rec::FooRec) @ Main ~/Documents/ConOptStop_Julia_v0.1/test_compilation_speeds.jl:48
Arguments
  #self#::Core.Const(Main.run_ops_alt)
  foo_rec@_2::Core.Const(FooRec{Op2, FooRec{Op1, FooRec{Op2, FooRec{Op1, FooRec{Op2, FooRecEnd}}}}}(Op2(), FooRec{Op1, FooRec{Op2, FooRec{Op1, FooRec{Op2, FooRecEnd}}}}(Op1(), FooRec{Op2, FooRec{Op1, FooRec{Op2, FooRecEnd}}}(Op2(), FooRec{Op1, FooRec{Op2, FooRecEnd}}(Op1(), FooRec{Op2, FooRecEnd}(Op2(), FooRecEnd()))))))
Locals
  x::Float64
  foo_rec@_4::Any # RED
Body::Float64
1 ─       (foo_rec@_4 = foo_rec@_2)
└──       (x = 0.0)
2 β”„       goto #6 if not true
3 ─ %4  = Main.run_op!::Core.Const(Main.run_op!)
β”‚   %5  = x::Float64
β”‚   %6  = foo_rec@_4::Any # RED
β”‚   %7  = Base.getproperty(%6, :op)::Any # RED
β”‚         (x = (%4)(%5, %7))
β”‚   %9  = foo_rec@_4::Any # RED
β”‚         (foo_rec@_4 = Base.getproperty(%9, :remainder))
β”‚   %11 = foo_rec@_4::Any # RED
β”‚   %12 = (%11 isa Main.FooRecEnd)::Bool
└──       goto #5 if not %12
4 ─       goto #6
5 ─       goto #2
6 β”„ %16 = x::Float64
└──       return %16

The benchmarking steps give:

BenchmarkTools.Trial: 10000 samples with 10 evaluations.
 Range (min … max):  1.829 ΞΌs … 949.221 ΞΌs  β”Š GC (min … max): 0.00% … 99.53%
 Time  (median):     2.054 ΞΌs               β”Š GC (median):    0.00%
 Time  (mean Β± Οƒ):   2.320 ΞΌs Β±  12.976 ΞΌs  β”Š GC (mean Β± Οƒ):  7.88% Β±  1.41%

  β–ƒβ–…β–„β–ƒβ–ƒβ–†β–ˆβ–ˆβ–‡β–†β–„β–„β–‚β–‚β–β–‚β–ƒβ–ƒβ–‚β–β–                                       β–‚
  β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‡β–‡β–‡β–ˆβ–‡β–ˆβ–ˆβ–‡β–‡β–…β–†β–†β–†β–„β–…β–†β–…β–…β–…β–…β–„β–…β–„β–‚β–„β–„β–ƒβ–…β–… β–ˆ
  1.83 ΞΌs      Histogram: log(frequency) by time      3.59 ΞΌs <

 Memory estimate: 3.83 KiB, allocs estimate: 18.
BenchmarkTools.Trial: 10000 samples with 1 evaluation.
 Range (min … max):  70.166 ΞΌs … 204.625 ΞΌs  β”Š GC (min … max): 0.00% … 0.00%
 Time  (median):     71.500 ΞΌs               β”Š GC (median):    0.00%
 Time  (mean Β± Οƒ):   72.862 ΞΌs Β±   4.999 ΞΌs  β”Š GC (mean Β± Οƒ):  0.00% Β± 0.00%

  β–„β–‡β–ˆβ–‡β–†β–…β–ƒβ–β–β–‚β–ƒβ–‚β–β–                                               β–‚
  β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‡β–‡β–‡β–…β–†β–‡β–‡β–‡β–‡β–‡β–†β–†β–†β–†β–‡β–‡β–ˆβ–‡β–‡β–†β–†β–…β–…β–‡β–‡β–†β–†β–†β–…β–†β–†β–…β–†β–…β–†β–…β–†β–…β–…β–…β–†β–…β–„β–…β–„β–… β–ˆ
  70.2 ΞΌs       Histogram: log(frequency) by time      95.5 ΞΌs <

 Memory estimate: 7.00 KiB, allocs estimate: 22.
BenchmarkTools.Trial: 10000 samples with 1 evaluation.
 Range (min … max):  206.583 ΞΌs … 568.875 ΞΌs  β”Š GC (min … max): 0.00% … 0.00%
 Time  (median):     213.250 ΞΌs               β”Š GC (median):    0.00%
 Time  (mean Β± Οƒ):   216.658 ΞΌs Β±  12.829 ΞΌs  β”Š GC (mean Β± Οƒ):  0.00% Β± 0.00%

  β–‚β–…β–†β–†β–‡β–ˆβ–‡β–†β–…β–…β–ƒβ–‚β–‚β–‚β–‚β–‚β–‚β–‚β–β–β–β–β–β–β–                                     β–‚
  β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‡β–‡β–ˆβ–‡β–†β–‡β–†β–‡β–†β–‡β–†β–‡β–†β–†β–†β–‡β–…β–…β–…β–†β–…β–…β–…β–…β–…β–…β–…β–„β–…β–…β–†β–†β–…β–…β–… β–ˆ
  207 ΞΌs        Histogram: log(frequency) by time        276 ΞΌs <

 Memory estimate: 6.45 KiB, allocs estimate: 9.

Your benchmark seems a little bit confuse, because you called @benchmark on a function that itself runs a benchmark, and calls @code_warntype, etc. If you benchmark the call directly, you get:

julia> op_vec = [n % 2 == 0 ? Op1() : Op2() for n in 1:400];

julia> foo_vec = FooVec(op_vec);

julia> @btime run_ops($foo_vec)
  1.004 ΞΌs (0 allocations: 0 bytes)
628.3185307179571

Which, for this array which has only two types, is as good as it can be. If you have more types you may start to get dynamic dispatch, and then you can use something as GitHub - JuliaDynamics/LightSumTypes.jl: Easy-to-use sum types in Julia.

1 Like

Thank you for your reply! The function itself runs the benchmarking steps only if verbose = true, while the @benchmark calls are made using verbose = false (the default value), so there shouldn’t be a problem there I think, but I could be seeing it wrong. In reality I do have (quite many) different types of operation indeed, which is why I thought I might use a recursive definition.

On Julia 1.11, can I use a Union type instead of a light sumtype, or would you expect the light sum type to still be much better?

I’m not sure. In this thread there is a benchmark: [ANN] LightSumTypes.jl v4 - #15 by Tortar

which suggests that on 1.11 the different is small or even against sum types. But I’m not following closely these developments.

(personally I would prefer to use the package, to guarantee that it is fast on 1.10 which is the current LTS)

1 Like

yes as the developer of the package I want to say that @Imiq is right on his analysis, but I have seen cases where a sumytype still performs better (2x in a macrobenchmark) in 1.11, this seems to be the case when iterators are involved AFAICT. On 1.10 and below the performance is much better instead in almost all cases instead. You can find benchmarks updated to 1.11 at https://juliadynamics.github.io/Agents.jl/stable/performance_tips/#multi_vs_union

1 Like

Would FunctionWrappers.jl be useful in your case?

Thank you all for your suggestions; just using Vector{T} where {T<:OpType} speeds up the compilation speed in my problem from 31 seconds to 2 seconds and roughly cuts runtimes in half, which is a good improvement already. I’ll check out the effect of sumtypes later on; I’m already happy that the large compilation times are now gone.

In my use-case, the functions will sometimes need to act with β€˜Int’, sometimes with β€˜Float64’ and sometimes with β€˜Dual’ (for autodiff), so it feels easier for me to just dispatch on OpType than wrap everything in a FunctionWrapper, but I will think about this a bit more!

1 Like