Integer parametric typed struct much slower than concrete struct

Lucas_Liu · May 26, 2020, 7:14am

Need help on performance of struct with integer parametric-types.

I have the following piece of code for an AD package I am developing.

PGrad is a struct for tracking gradients w.r.t N_v variables in N_c cells.

PVar is a struct for variables.

using StaticArrays
using BenchmarkTools

struct PGrad{Nv, Nc}
    ind::SVector{Nc, Int}
    grad::SVector{Nc, SVector{Nv,Float64}}
end

struct PVar{T<:PGrad}
    val::Float64
    grad::T
end

Construction of one PVar{PGrad{2,1}} takes

pg1 = PGrad{2,1}(SA[1], SA[SA[1.0, 0.0]])
@btime PVar{PGrad{2,1}}(1.0, pg1)

 70.569 ns (1 allocation: 48 bytes)

If I explicitly write

struct CGrad
    ind::SVector{1, Int}
    grad::SVector{1, SVector{2,Float64}}
end

struct CVar
    val::Float64
    grad::CGrad
end

Construction becomes much faster

cg1 = CGrad(SA[1], SA[SA[1.0, 0.0]])
@btime CVar(1.0, cg1)

7.000 ns (0 allocations: 0 bytes)

If I annotate types of grad, time for both cases go below ~5ns

ug1 = PGrad{2,1}(SA[1], SA[SA[1.0, 0.0]])
@btime PVar{PGrad{2,1}}(1.0, ug1::PGrad{2,1})
4.800 ns (0 allocations: 0 bytes)

cg1 = CGrad(SA[1], SA[SA[1.0, 0.0]])
@btime CVar(1.0, cg1::CGrad)
4.500 ns (0 allocations: 0 bytes)

Any idea what causes the difference in timing and number of allocations?

Maybe I am not using integer parameteric-type the correct way.

mauro3 · May 26, 2020, 7:24am

You need to interpolate to make timings accurate. If I do this I get the same timings:

julia> @btime PVar{PGrad{2,1}}(1.0, $pg1)
  0.017 ns (0 allocations: 0 bytes)
PVar{PGrad{2,1}}(1.0, PGrad{2,1}([1], SArray{Tuple{2},Float64,1,2}[[1.0, 0.0]]))

julia> @btime CVar(1.0, $cg1)
  0.017 ns (0 allocations: 0 bytes)
CVar(1.0, CGrad([1], SArray{Tuple{2},Float64,1,2}[[1.0, 0.0]]))

ettersi · May 26, 2020, 7:24am

It’s simply an issue with your benchmark:

# Your benchmark
julia> @btime PVar{PGrad{2,1}}(1.0, pg1);
  59.793 ns (1 allocation: 48 bytes)

# Fixed benchmark
julia> @btime PVar{PGrad{2,1}}(1.0, $pg1);
  0.028 ns (0 allocations: 0 bytes)

The extra $ in the second benchmark indicates that the compiler may treat pg1 as a constant for the purpose of the benchmark. This allows the compiler to figure out precisely which functions to call and inline these function calls such that in the end there’s nothing left to do (hence the <1 ns runtime).

Lucas_Liu · May 26, 2020, 7:45am

Thank you both!

jishnub · May 26, 2020, 11:26am

Note the warning for sub-nanosecond timings. It’s not real, it’s the compiler cheating.

baggepinnen · May 26, 2020, 11:43am

Try @btime PVar{PGrad{2,1}}(1.0, $(Ref(pg1))[]);
The Ref prevents the compiler from doing too much at compile time.

Lucas_Liu · May 26, 2020, 6:48pm

I get

pg1 = PGrad{2,1}(SA[1], SA[SA[1.0, 0.0]])
@btime PVar{PGrad{2,1}}(1.0, $(Ref(pg1))[])
1.400 ns (0 allocations: 0 bytes)

and

cg1 = CGrad(SA[1], SA[SA[1.0, 0.0]])
@btime CVar(1.0, $(Ref(cg1))[])
1.200 ns (0 allocations: 0 bytes)

Topic		Replies	Views
Bad performance from parametric struct dispatch Performance	9	142	July 3, 2024
Performance of structs with incompletely parameterized fields Performance parametric-types	3	479	February 4, 2021
Parametric types and StaticArrays Performance	2	351	December 8, 2023
Performance when dispatching on type Performance	4	408	April 18, 2022
Vectorization when broadcasting vector of structs Performance	0	289	March 17, 2021

Integer parametric typed struct much slower than concrete struct

Related topics