Unusual non-deterministic benchmark results

I’m getting some quite unexpected non-deterministic benchmarking results on a toy problem.

abstract type Shape end
area(::Shape) = 0.0

struct Square <: Shape
    side::Float64
end
area(s::Square) = s.side * s.side
    
struct Rectangle <: Shape
    width::Float64
    height::Float64
end
area(r::Rectangle) = r.width * r.height
    
struct Triangle <: Shape
    base::Float64
    height::Float64
end
area(t::Triangle) = 1.0/2.0 * t.base * t.height

struct Circle <: Shape
    radius::Float64
end
area(c::Circle) = Ο€ * c.radius^2

I have one million shapes that I’ve generated and here’s a screenshot that I captured that shows this particular β€œunusual” benchmarking result:

main1 is defined as main1(shapes) = sum(area.(shapes)) and @benchmark gives 16ms on average for Vector{Any} and 37ms on average for Vector{Shape}.
I am expecting these to be similar in performance.

However, this is not reproducible and if I run it again, I get the results that I expect to get:

I get the unexpected results once every 10 - 15 times of execution and it appears to be completely random.

I’ve linked to a commit of a notebook with the full code where I happened to reproduce this issue again.

I’m using Julia 1.10.2

Julia Version 1.10.2
Commit bd47eca2c8a (2024-03-01 10:14 UTC)
Build Info:
  Official https://julialang.org/ release
Platform Info:
  OS: macOS (arm64-apple-darwin22.4.0)
  CPU: 10 Γ— Apple M1 Pro
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-15.0.7 (ORCJIT, apple-m1)
Threads: 1 default, 0 interactive, 1 GC (on 8 virtual cores)
Environment:
  JULIA_PROJECT = @.

This has happened enough times randomly that I’m worried I’m doing something wrong. Can anyone else reproduce this? Am I using benchmarking tools incorrectly or missing something else?

You should interpolate globals when benchmarking: @benchmark main1($sorted_shapes_shape). Maybe this solves the issue?

edit: now saw that there different types so it is maybe something expected but not sure why

As an aside, maybe a package I built (MixedStructTypes.jl) could be of help to you:

using MixedStructTypes, BenchmarkTools

abstract type AbstractShape end

@sum_structs Shape <: AbstractShape begin
    @kwdef struct Square
        side::Float64 = rand()
    end   
    @kwdef struct Rectangle
        width::Float64 = rand()
        height::Float64 = rand()
    end
    @kwdef struct Triangle
        base::Float64 = rand()
        height::Float64 = rand()
    end
    @kwdef struct Circle
        radius::Float64 = rand()
    end
end

function area(sh)
    if kindof(sh) === :Square
        area_square(sh)
    elseif kindof(sh) === :Rectangle
        area_rectangle(sh)
    elseif kindof(sh) === :Triangle
        area_triangle(sh)
    elseif kindof(sh) === :Circle
        area_circle(sh)
    else
        area_default(sh)
    end
end
   
area_square(s) = s.side * s.side
area_rectangle(r) = r.width * r.height
area_triangle(t) = 1.0/2.0 * t.base * t.height
area_circle(c) = Ο€ * c.radius^2
area_default(::AbstractShape) = 0.0
main1(shapes) = sum(area.(shapes))

count = 1_000_000 
shapes = [rand((Square,Rectangle,Triangle,Circle))() for _ in 1:count];

which results in

julia> @benchmark main1($shapes)
BenchmarkTools.Trial: 725 samples with 1 evaluation.
 Range (min … max):  6.211 ms … 10.326 ms  β”Š GC (min … max): 0.00% … 10.52%
 Time  (median):     6.246 ms              β”Š GC (median):    0.00%
 Time  (mean Β± Οƒ):   6.887 ms Β±  1.168 ms  β”Š GC (mean Β± Οƒ):  1.56% Β±  4.32%

  β–ˆβ–…β–                  β–‚                              β–ƒ  β–‚    
  β–ˆβ–ˆβ–ˆβ–‡β–†β–β–β–„β–β–β–β–β–β–β–β–β–β–β–β–β–β–ˆβ–ˆβ–ˆβ–„β–β–„β–β–β–β–β–β–β–β–β–β–β–β–β–β–β–β–β–β–β–β–β–β–β–β–β–„β–ˆβ–ˆβ–†β–ˆβ–ˆβ–ˆ β–‡
  6.21 ms      Histogram: log(frequency) by time     9.49 ms <

 Memory estimate: 7.63 MiB, allocs estimate: 2.

It is still a bit annoying that you need to use if-else inside the area function though, I’m planning to lift this though and provide instead a macro which does everything by itself, with a syntax like this:

@dispatch area(s::Square) = s.side * s.side
@dispatch area(r::Rectangle) = r.width * r.height
@dispatch area(t::Triangle) = 1.0/2.0 * t.base * t.height
@dispatch area(c::Circle) = Ο€ * c.radius^2
@dispatch area(::AbstractShape) = 0.0

which I think would make the package almost trivial to use.

Actually I’m seeing also strange patterns anyway :thinking:

julia> using Chairmarks

julia> shapes = Shape[rand((Square,Rectangle,Triangle,Circle))() for _ in 1:count];

julia> for i in 1:20 t = @b main1($shapes); println(t.time) end
0.152007459
0.158316896
0.1540718
0.16130086400000002
0.156448573
0.158393229
0.15222005900000002
0.150562149
0.109738663
0.08966570500000001
0.08639044200000001
0.091231459
0.08552548800000001
0.089771054
0.0895214
0.08954263900000001
0.089699627
0.086503645
0.091225606
0.08580834400000001

julia> shapes = Any[rand((Square,Rectangle,Triangle,Circle))() for _ in 1:count];

julia> for i in 1:20 t = @b main1($shapes); println(t.time) end
0.08459066800000001
0.077453509
0.076593295
0.077546024
0.079647928
0.077983406
0.077728751
0.07846008900000001
0.07840199
0.045150448
0.052501182
0.052321902
0.052907855000000004
0.053115739
0.05305312
0.053056116
0.052989439000000006
0.052936709000000005
0.052743011000000006
0.052789139000000006

Thanks for looking into this!

That’s a neat package! I had seen SumTypes.jl before but didn’t know about MixedStructTypes.jl. I haven’t had the chance to use them yet!

Maybe something to do with the GC? That’s the only explanation I have. Because the function is compiled at this point, it shouldn’t really behave differently on multiple runs.

I also find it odd that the deviations in each benchmark are small, meaning if it is a GC issue, the issue only kicks in between benchmarking runs but not during the 100-300 samples while benchmarking.

1 Like

mmmh, one thing I noticed from the benchmark graphs you posted is that allocations also decreased on your fast run. This is strange

Side note: this would probably be more efficient as sum(area, shapes), which runs the function in the first argument on every element in the second argument while summing the result. This removes the need to allocate an intermediate array with the area function results.

1 Like

Neat suggestion!

Weirdly, that seems to be slower when using Vector{Shape} or Vector{Any} than what I had previously.

It is faster when running sum(main1, (arr1, arr2, arr3, arr4)) with fewer allocations, when all the arrays are of a type with a concrete generic parameter, so you are right in that regard.

The performance tips section of the manual say the following:

If you cannot avoid containers with abstract value types, it is sometimes better to parametrize with Any to avoid runtime type checking. E.g. IdDict{Any, Any} performs better than IdDict{Type, Vector}

Maybe that we are seeing here is related to that?