Testing performance of functions that are 'parametric over input collections'

mkarikom · September 7, 2020, 4:25pm

I have the following MWE for two methods: addUnits1 is parametric in the type of an input vector, while addUnits2 gives the parameters of the individual elements of the input vector.

struct Myunit{X,Y}
    x::X
    y::Y
end

function addUnits1(d::Vector,k::Int64,l::Int64)
  tab = zeros(k,l)
  for i in 1:length(d)
    tab[d[i].y,d[i].x] += 1
  end
  tab
end

function addUnits2(d::Vector{Myunit{X,Y}},k::Int64,l::Int64) where {X,Y}
  tab = zeros(k,l)
  for i in 1:length(d)
    tab[d[i].y,d[i].x] += 1
  end
  tab
end

First, I tested whether there was a performance difference given an array of parametric type Myunit:

N = 2^20;
k = 3;
l = 100;
xs = rand(1:l,N);
ys = rand(1:k,N);
d = Myunit.(xs,ys);
@btime addUnits1(d,k,l);
@btime addUnits2(d,k,l);

1.147 ms (1 allocation: 2.50 KiB)
1.136 ms (1 allocation: 2.50 KiB)

As expected, the difference is negligible, and @code_warntype output for both is identical since the compiler knows what Myunit has without it being supplied to addUnits1:

@code_warntype addUnits1(d,k,l)

Variables
#self#::Core.Compiler.Const(addUnits1, false)
d::Array{Myunit{Int64,Int64},1}
k::Int64
l::Int64
tab::Array{Float64,2}
@_6::Union{Nothing, Tuple{Int64,Int64}}
i::Int64

Body::Array{Float64,2}
1 ─ (tab = Main.zeros(k, l))
│ %2 = Main.length(d)::Int64
│ %3 = (1:%2)::Core.Compiler.PartialStruct(UnitRange{Int64}, Any[Core.Compiler.Const(1, false), Int64])
│ (@_6 = Base.iterate(%3))
│ %5 = (@_6 === nothing)::Bool
│ %6 = Base.not_int(%5)::Bool
└── goto #4 if not %6
2 ┄ %8 = @_6::Tuple{Int64,Int64}::Tuple{Int64,Int64}
│ (i = Core.getfield(%8, 1))
│ %10 = Core.getfield(%8, 2)::Int64
│ %11 = Base.getindex(d, i)::Myunit{Int64,Int64}
│ %12 = Base.getproperty(%11, :y)::Int64
│ %13 = Base.getindex(d, i)::Myunit{Int64,Int64}
│ %14 = Base.getproperty(%13, :x)::Int64
│ %15 = Base.getindex(tab, %12, %14)::Float64
│ %16 = (%15 + 1)::Float64
│ Base.setindex!(tab, %16, %12, %14)
│ (@_6 = Base.iterate(%3, %10))
│ %19 = (@_6 === nothing)::Bool
│ %20 = Base.not_int(%19)::Bool
└── goto #4 if not %20
3 ─ goto #2
4 ┄ return tab

@code_warntype addUnits2(d,k,l)

Variables
#self#::Core.Compiler.Const(addUnits2, false)
d::Array{Myunit{Int64,Int64},1}
k::Int64
l::Int64
tab::Array{Float64,2}
@_6::Union{Nothing, Tuple{Int64,Int64}}
i::Int64

Body::Array{Float64,2}
1 ─ (tab = Main.zeros(k, l))
│ %2 = Main.length(d)::Int64
│ %3 = (1:%2)::Core.Compiler.PartialStruct(UnitRange{Int64}, Any[Core.Compiler.Const(1, false), Int64])
│ (@_6 = Base.iterate(%3))
│ %5 = (@_6 === nothing)::Bool
│ %6 = Base.not_int(%5)::Bool
└── goto #4 if not %6
2 ┄ %8 = @_6::Tuple{Int64,Int64}::Tuple{Int64,Int64}
│ (i = Core.getfield(%8, 1))
│ %10 = Core.getfield(%8, 2)::Int64
│ %11 = Base.getindex(d, i)::Myunit{Int64,Int64}
│ %12 = Base.getproperty(%11, :y)::Int64
│ %13 = Base.getindex(d, i)::Myunit{Int64,Int64}
│ %14 = Base.getproperty(%13, :x)::Int64
│ %15 = Base.getindex(tab, %12, %14)::Float64
│ %16 = (%15 + 1)::Float64
│ Base.setindex!(tab, %16, %12, %14)
│ (@_6 = Base.iterate(%3, %10))
│ %19 = (@_6 === nothing)::Bool
│ %20 = Base.not_int(%19)::Bool
└── goto #4 if not %20
3 ─ goto #2
4 ┄ return tab

Now I expected that passing the same data to addUnits1 but without the known structure of Myunit would decrease performance, but the following turned out to be the fastest of the three:

d2 = vec(mapslices(z->(x=z[1],y=z[2]),hcat(xs,ys)',dims=1));
@btime addUnits1(d2,k,l);
1.131 ms (1 allocation: 2.50 KiB)

@code_warntype addUnits1(d2,k,l)

Variables
#self#::Core.Compiler.Const(addUnits1, false)
d::Array{NamedTuple{(:x, :y),Tuple{Int64,Int64}},1}
k::Int64
l::Int64
tab::Array{Float64,2}
@_6::Union{Nothing, Tuple{Int64,Int64}}
i::Int64

Body::Array{Float64,2}
1 ─ (tab = Main.zeros(k, l))
│ %2 = Main.length(d)::Int64
│ %3 = (1:%2)::Core.Compiler.PartialStruct(UnitRange{Int64}, Any[Core.Compiler.Const(1, false), Int64])
│ (@_6 = Base.iterate(%3))
│ %5 = (@_6 === nothing)::Bool
│ %6 = Base.not_int(%5)::Bool
└── goto #4 if not %6
2 ┄ %8 = @_6::Tuple{Int64,Int64}::Tuple{Int64,Int64}
│ (i = Core.getfield(%8, 1))
│ %10 = Core.getfield(%8, 2)::Int64
│ %11 = Base.getindex(d, i)::NamedTuple{(:x, :y),Tuple{Int64,Int64}}
│ %12 = Base.getproperty(%11, :y)::Int64
│ %13 = Base.getindex(d, i)::NamedTuple{(:x, :y),Tuple{Int64,Int64}}
│ %14 = Base.getproperty(%13, :x)::Int64
│ %15 = Base.getindex(tab, %12, %14)::Float64
│ %16 = (%15 + 1)::Float64
│ Base.setindex!(tab, %16, %12, %14)
│ (@_6 = Base.iterate(%3, %10))
│ %19 = (@_6 === nothing)::Bool
│ %20 = Base.not_int(%19)::Bool
└── goto #4 if not %20
3 ─ goto #2
4 ┄ return tab

Somehow this is not working as expected.

Can someone please comment/confirm/deny any of the following?

parametric types are named tuples and all of the above do the same thing
neither function is behaving as a parametric method for some reason
this MWE is just not ‘deep enough’ to see any benefit from parametric types and methods
something else I completely missed

-edit: corrected addUnits1 suggested by @jlapeyre

DNF · September 7, 2020, 4:40pm

The compiler knows the structure of d2 just as well as of d, and it’s running the exact same code on both. Maybe you can explain why you expect a performance difference?

jlapeyre · September 7, 2020, 4:45pm

There is no performance difference between addUnits1 and

function addUnits0(d, k, l)
  tab = zeros(k,l)
  for i in 1:length(d)
    tab[d[i].y,d[i].x] += 1
  end
  tab
end

This is because the compiler always knows the type of an argument passed to a function. (Also, in general, you should omit the type parameter T in the method definition since it is not used in the body of the method, so it serves no purpose.)

mkarikom · September 7, 2020, 5:07pm

Thanks @DNF, that makes sense.

I was expecting a difference because I assumed the structure of d2 was unknown, since a tuple can be anything.

mkarikom · September 7, 2020, 5:14pm

Thanks @jlapeyre, I have corrected the example.
When (if ever) would there be a difference between addUnits0 and addUnits2?

jlapeyre · September 7, 2020, 5:21pm

julia> typeof(d2)
Array{NamedTuple{(:x, :y),Tuple{Int64,Int64}},1}

The compiler knows the type of what is passed, so it has this information for optimization. If you put the tuples in a more generic container, then the compiler would not know type of the elements at compile time.

jlapeyre · September 7, 2020, 5:28pm

There is no performance difference.
The compiler already knows the type of what you pass, so the information in the parameter list is redundant. Putting type information in the function parameter list is used for dispatch, to choose which method to call. Or for “safety”, to raise an error if the appropriate method does not exist. Or for documentation.

Topic		Replies	Views
Parametric type signature in methods, any actual difference? General Usage	3	519	September 3, 2021
Specify type when calling parametric function General Usage	6	388	September 16, 2019
Advice on over- vs. proper use of type parameters Performance parametric-types	2	413	August 12, 2020
Different ways to initialize an array of specified type General Usage	10	1520	July 29, 2020
Parametric types and StaticArrays Performance	2	353	December 8, 2023

Testing performance of functions that are 'parametric over input collections'

Related topics