Testing performance of functions that are 'parametric over input collections'

I have the following MWE for two methods: addUnits1 is parametric in the type of an input vector, while addUnits2 gives the parameters of the individual elements of the input vector.

struct Myunit{X,Y}
    x::X
    y::Y
end

function addUnits1(d::Vector,k::Int64,l::Int64)
  tab = zeros(k,l)
  for i in 1:length(d)
    tab[d[i].y,d[i].x] += 1
  end
  tab
end

function addUnits2(d::Vector{Myunit{X,Y}},k::Int64,l::Int64) where {X,Y}
  tab = zeros(k,l)
  for i in 1:length(d)
    tab[d[i].y,d[i].x] += 1
  end
  tab
end

First, I tested whether there was a performance difference given an array of parametric type Myunit:

N = 2^20;
k = 3;
l = 100;
xs = rand(1:l,N);
ys = rand(1:k,N);
d = Myunit.(xs,ys);
@btime addUnits1(d,k,l);
@btime addUnits2(d,k,l);

1.147 ms (1 allocation: 2.50 KiB)
1.136 ms (1 allocation: 2.50 KiB)

As expected, the difference is negligible, and @code_warntype output for both is identical since the compiler knows what Myunit has without it being supplied to addUnits1:

@code_warntype addUnits1(d,k,l)

Variables
#self#::Core.Compiler.Const(addUnits1, false)
d::Array{Myunit{Int64,Int64},1}
k::Int64
l::Int64
tab::Array{Float64,2}
@_6::Union{Nothing, Tuple{Int64,Int64}}
i::Int64

Body::Array{Float64,2}
1 ─ (tab = Main.zeros(k, l))
β”‚ %2 = Main.length(d)::Int64
β”‚ %3 = (1:%2)::Core.Compiler.PartialStruct(UnitRange{Int64}, Any[Core.Compiler.Const(1, false), Int64])
β”‚ (@_6 = Base.iterate(%3))
β”‚ %5 = (@_6 === nothing)::Bool
β”‚ %6 = Base.not_int(%5)::Bool
└── goto #4 if not %6
2 β”„ %8 = @_6::Tuple{Int64,Int64}::Tuple{Int64,Int64}
β”‚ (i = Core.getfield(%8, 1))
β”‚ %10 = Core.getfield(%8, 2)::Int64
β”‚ %11 = Base.getindex(d, i)::Myunit{Int64,Int64}
β”‚ %12 = Base.getproperty(%11, :y)::Int64
β”‚ %13 = Base.getindex(d, i)::Myunit{Int64,Int64}
β”‚ %14 = Base.getproperty(%13, :x)::Int64
β”‚ %15 = Base.getindex(tab, %12, %14)::Float64
β”‚ %16 = (%15 + 1)::Float64
β”‚ Base.setindex!(tab, %16, %12, %14)
β”‚ (@_6 = Base.iterate(%3, %10))
β”‚ %19 = (@_6 === nothing)::Bool
β”‚ %20 = Base.not_int(%19)::Bool
└── goto #4 if not %20
3 ─ goto #2
4 β”„ return tab

@code_warntype addUnits2(d,k,l)

Variables
#self#::Core.Compiler.Const(addUnits2, false)
d::Array{Myunit{Int64,Int64},1}
k::Int64
l::Int64
tab::Array{Float64,2}
@_6::Union{Nothing, Tuple{Int64,Int64}}
i::Int64

Body::Array{Float64,2}
1 ─ (tab = Main.zeros(k, l))
β”‚ %2 = Main.length(d)::Int64
β”‚ %3 = (1:%2)::Core.Compiler.PartialStruct(UnitRange{Int64}, Any[Core.Compiler.Const(1, false), Int64])
β”‚ (@_6 = Base.iterate(%3))
β”‚ %5 = (@_6 === nothing)::Bool
β”‚ %6 = Base.not_int(%5)::Bool
└── goto #4 if not %6
2 β”„ %8 = @_6::Tuple{Int64,Int64}::Tuple{Int64,Int64}
β”‚ (i = Core.getfield(%8, 1))
β”‚ %10 = Core.getfield(%8, 2)::Int64
β”‚ %11 = Base.getindex(d, i)::Myunit{Int64,Int64}
β”‚ %12 = Base.getproperty(%11, :y)::Int64
β”‚ %13 = Base.getindex(d, i)::Myunit{Int64,Int64}
β”‚ %14 = Base.getproperty(%13, :x)::Int64
β”‚ %15 = Base.getindex(tab, %12, %14)::Float64
β”‚ %16 = (%15 + 1)::Float64
β”‚ Base.setindex!(tab, %16, %12, %14)
β”‚ (@_6 = Base.iterate(%3, %10))
β”‚ %19 = (@_6 === nothing)::Bool
β”‚ %20 = Base.not_int(%19)::Bool
└── goto #4 if not %20
3 ─ goto #2
4 β”„ return tab

Now I expected that passing the same data to addUnits1 but without the known structure of Myunit would decrease performance, but the following turned out to be the fastest of the three:

d2 = vec(mapslices(z->(x=z[1],y=z[2]),hcat(xs,ys)',dims=1));
@btime addUnits1(d2,k,l);
1.131 ms (1 allocation: 2.50 KiB)
@code_warntype addUnits1(d2,k,l)

Variables
#self#::Core.Compiler.Const(addUnits1, false)
d::Array{NamedTuple{(:x, :y),Tuple{Int64,Int64}},1}
k::Int64
l::Int64
tab::Array{Float64,2}
@_6::Union{Nothing, Tuple{Int64,Int64}}
i::Int64

Body::Array{Float64,2}
1 ─ (tab = Main.zeros(k, l))
β”‚ %2 = Main.length(d)::Int64
β”‚ %3 = (1:%2)::Core.Compiler.PartialStruct(UnitRange{Int64}, Any[Core.Compiler.Const(1, false), Int64])
β”‚ (@_6 = Base.iterate(%3))
β”‚ %5 = (@_6 === nothing)::Bool
β”‚ %6 = Base.not_int(%5)::Bool
└── goto #4 if not %6
2 β”„ %8 = @_6::Tuple{Int64,Int64}::Tuple{Int64,Int64}
β”‚ (i = Core.getfield(%8, 1))
β”‚ %10 = Core.getfield(%8, 2)::Int64
β”‚ %11 = Base.getindex(d, i)::NamedTuple{(:x, :y),Tuple{Int64,Int64}}
β”‚ %12 = Base.getproperty(%11, :y)::Int64
β”‚ %13 = Base.getindex(d, i)::NamedTuple{(:x, :y),Tuple{Int64,Int64}}
β”‚ %14 = Base.getproperty(%13, :x)::Int64
β”‚ %15 = Base.getindex(tab, %12, %14)::Float64
β”‚ %16 = (%15 + 1)::Float64
β”‚ Base.setindex!(tab, %16, %12, %14)
β”‚ (@_6 = Base.iterate(%3, %10))
β”‚ %19 = (@_6 === nothing)::Bool
β”‚ %20 = Base.not_int(%19)::Bool
└── goto #4 if not %20
3 ─ goto #2
4 β”„ return tab

Somehow this is not working as expected.

Can someone please comment/confirm/deny any of the following?

  1. parametric types are named tuples and all of the above do the same thing
  2. neither function is behaving as a parametric method for some reason
  3. this MWE is just not β€˜deep enough’ to see any benefit from parametric types and methods
  4. something else I completely missed

-edit: corrected addUnits1 suggested by @jlapeyre

The compiler knows the structure of d2 just as well as of d, and it’s running the exact same code on both. Maybe you can explain why you expect a performance difference?

1 Like

There is no performance difference between addUnits1 and

function addUnits0(d, k, l)
  tab = zeros(k,l)
  for i in 1:length(d)
    tab[d[i].y,d[i].x] += 1
  end
  tab
end

This is because the compiler always knows the type of an argument passed to a function. (Also, in general, you should omit the type parameter T in the method definition since it is not used in the body of the method, so it serves no purpose.)

1 Like

Thanks @DNF, that makes sense.

I was expecting a difference because I assumed the structure of d2 was unknown, since a tuple can be anything.

Thanks @jlapeyre, I have corrected the example.
When (if ever) would there be a difference between addUnits0 and addUnits2?

julia> typeof(d2)
Array{NamedTuple{(:x, :y),Tuple{Int64,Int64}},1}

The compiler knows the type of what is passed, so it has this information for optimization. If you put the tuples in a more generic container, then the compiler would not know type of the elements at compile time.

There is no performance difference.
The compiler already knows the type of what you pass, so the information in the parameter list is redundant. Putting type information in the function parameter list is used for dispatch, to choose which method to call. Or for β€œsafety”, to raise an error if the appropriate method does not exist. Or for documentation.