2 identical versions of the code: one allocates the other not, when accessing type-unstable tuple. Why?

I am aware that the two versions have to be different somewhere. But where? I Can’t see it. Also, please excuse the “Not Minimal Enough” Minimal Working Example


Summary

So I am writing a very nice example that highlights the strength of Julia and multiple dispatch. This will be used in conjuction to this discourse post: Idea: repository for examples highlighting Julia's strengths

In essense I am making a smaller version of DynamicalBilliards. In this tutorial, a billiard bd is a tuple of obstacles, that are of different type. When I am accessing this tuple with some index , e.g. o = bd[i], Julia allocates as this is a type unstable operation. It is not known in advance what type is obstacle bd[i]. However, with some magic that I don’t understand, it is actually possible to make this operation non allocating. Because, in DynamicalBilliards this is actually non-allocating, see below.

MWE

# barebone definitions
using StaticArrays
const SV = SVector{2,Float64}

mutable struct Particle; pos::SV; vel::SV; end
Particle(x0, y0, φ0) = Particle(SV(x0, y0), SV(cos(φ0), sin(φ0)))

struct Wall; sp::SV; ep::SV; normal::SV; end
struct Disk; c::SV; r::Float64; end

collisiontime(p::Particle, w::Wall) = cos(p.pos[1])
collisiontime(p::Particle, w::Disk) = sin(p.pos[1])

struct Billiard{L, O<:Tuple}
    obstacles::O
end
Billiard(stuff::O) where {O<:Tuple} = Billiard{length(stuff), O}(stuff)

@inline Base.getindex(bd::Billiard, i) = bd.obstacles[i]
@inline Base.eachindex(bd::Billiard) = eachindex(bd.obstacles)
@inline Base.iterate(bd::Billiard, state = 1) = iterate(bd.obstacles, state)
@inline Base.length(bd::Billiard{L}) where {L} = L

# only relevant function
@generated function next_collision(p::Particle, bd::Billiard{L}) where {L}
    out = :(j = 0; ct = Inf)
    for i=1:L
        push!(out.args, quote
                            let o = bd[$i]
                                t = collisiontime(p, o)
                                if t < ct
                                  ct = t
                                  j = $i
                                end
                            end
                        end
                        )
    end
    push!(out.args, :(return j, ct))
    return out
end

# function where the allocation actually happens
@inline function bounce!(p::Particle, bd::Billiard)
    i::Int, ct::Float64 = next_collision(p, bd)
    o = bd[i] # <------ ALLOCATION HERE!
    x = collisiontime(p, o) # <- irrelevant line, just so that `o` is used.
    return i, x
end

##### Instantiate and benchmark:
x, y, r = 1.0, 1.0, 0.3
sp = [0.0,y]; ep = [0.0, 0.0]; n = [x,0.0]
leftw = Wall(sp, ep, n)
sp = [x,0.0]; ep = [x, y]; n = [-x,0.0]
rightw = Wall(sp, ep, n)
sp = [x,y]; ep = [0.0, y]; n = [0.0,-y]
topw = Wall(sp, ep, n)
sp = [0.0,0.0]; ep = [x, 0.0]; n = [0.0,y]
botw = Wall(sp, ep, n)
disk = Disk([x/2, y/2], r)

bd = Billiard((botw, rightw, topw, leftw, disk))
p = Particle(0.2, 0.1, 2π*rand())

using BenchmarkTools

@btime bounce!($p, $bd)
@profiler for i in 1:1000000; bounce!(p, bd); end
  83.746 ns (2 allocations: 256 bytes)

The profiler results show that half of the function time is spent in the line o = bd[i].

Okay, I can understand that, it is a type unstable operation after all.

Same code, no allocation.

Exactly the process described above, is done in the DynamicalBilliards package. The function next_collision is identical (link) and the individual struct and collisiontime differences are irrelevant. To the best of my knowledge even the use of @inline is identical. The only differences are:

  1. The Billiard struct has an extra but irrelevant field. link
  2. After the line o = bd[i], bounce! calls some other functions that use o in a meaningful way. link

Yet, how can it be that (in Julia 1.0):

julia> using BenchmarkTools, DynamicalBilliards

julia> bd = billiard_sinai(); p = randominside(bd);

julia> @btime bounce!($p, $bd);
  34.408 ns (0 allocations: 0 bytes)

??? (I promise you the line bd = billiard_sinai(); p = randominside(bd); replicates identical setting as the MWE). Doing profile on the code version of DynamicalBilliards shows that there is nearly 0 time spend on the line o = bd[i].

Why is this happening? :S :S I am super confused.

I’d really appreciate some help, as I have finished my nice example since two weeks ago, but I can’t post it since I don’t know why is this happening… :frowning:

p.s.: This has nothing to do with the MWE not being inside a module. I have already checked that. It also has nothing to do with collisiontime not using the identical functions as in DynamicalBilliards, I have also checked that. (in fact I just modified the MWE to not use them to save space)

I think the question boils down to why

dothings(::Float64) = 1
dothings(::Int) = 2
example(t, i) = dothings(t[i])

@benchmark example(t, i) setup = (t = (1, 2.0); i = 1)
#   memory estimate:  32 bytes

allocates. Shouldn’t the compiler employ small union optimization tricks here?

1 Like

That is a valid point and definitely related to the answer to my post (if any).

However, to be specific, it is not my question. My question is why one version allocates while the other does not. From the developer’s point of view, the two versions are identical in any sense.

Learning why these two versions are different would also allow me to consciously change the MWE to be non-allocating.

The compiler does indeed split the Union{Disk,Wall}.
(and it also inlines the correct version of collisiontime depending on the type)

julia> @code_typed bounce!(p, bd)
CodeInfo(
45 1 ─ %1  = invoke Main.next_collision(_2::Particle, _3::Billiard{5,Tuple{Wall,Wall,Wall,Wall,Disk}})::Tuple{Int64,Float64}                                                                                    │
   │   %2  = (Base.getfield)(%1, 1, true)::Int64                                                                                                                                                                │╻╷  indexed_iterate
46 │   %3  = (Base.getfield)(bd, :obstacles)::Tuple{Wall,Wall,Wall,Wall,Disk}                                                                                                                                   │╻╷  getindex
   │   %4  = (Base.getfield)(%3, %2, true)::Union{Disk, Wall}                                                                                                                                                   ││╻   getindex
47 │   %5  = (isa)(%4, Disk)::Bool                                                                                                                                                                              │
   └──       goto #3 if not %5                                                                                                                                                                                  │
   2 ─ %7  = (Base.getfield)(p, :pos)::SArray{Tuple{2},Float64,1,2}                                                                                                                                             │╻   collisiontime
   │   %8  = (Base.getfield)(%7, :data)::Tuple{Float64,Float64}                                                                                                                                                 ││╻   getindex
   │   %9  = (Base.getfield)(%8, 1, true)::Float64                                                                                                                                                              │││╻   getindex
   │   %10 = invoke Main.sin(%9::Float64)::Float64                                                                                                                                                              ││
   └──       goto #6                                                                                                                                                                                            │
   3 ─ %12 = (isa)(%4, Wall)::Bool                                                                                                                                                                              │
   └──       goto #5 if not %12                                                                                                                                                                                 │
   4 ─ %14 = (Base.getfield)(p, :pos)::SArray{Tuple{2},Float64,1,2}                                                                                                                                             ││╻   getproperty
   │   %15 = (Base.getfield)(%14, :data)::Tuple{Float64,Float64}                                                                                                                                                │││╻   getproperty
   │   %16 = (Base.getfield)(%15, 1, true)::Float64                                                                                                                                                             │││╻   getindex
   │   %17 = invoke Main.cos(%16::Float64)::Float64                                                                                                                                                             ││
   └──       goto #6                                                                                                                                                                                            │
   5 ─       (Core.throw)(ErrorException("fatal error in type inference (type bound)"))::Union{}                                                                                                                │
   └──       $(Expr(:unreachable))::Union{}                                                                                                                                                                     │
   6 ┄ %21 = φ (#2 => %10, #4 => %17)::Float64                                                                                                                                                                  │
48 │   %22 = (Core.tuple)(%2, %21)::Tuple{Int64,Float64}                                                                                                                                                        │
   └──       return %22                                                                                                                                                                                         │
) => Tuple{Int64,Float64}

The simplest thing that I could come up with is this:

julia> tup = (1, 2.0)
(1, 2.0)

julia> typeof(tup)
Tuple{Int64,Float64}

julia> i = rand(1:2)
1

julia> @btime $(tup)[$i];
  20.383 ns (1 allocation: 32 bytes)

julia> @btime $(tup)[1];
  1.812 ns (0 allocations: 0 bytes)