Integer parametric typed struct much slower than concrete struct

Need help on performance of struct with integer parametric-types.

I have the following piece of code for an AD package I am developing.

PGrad is a struct for tracking gradients w.r.t N_v variables in N_c cells.

PVar is a struct for variables.

using StaticArrays
using BenchmarkTools

struct PGrad{Nv, Nc}
    ind::SVector{Nc, Int}
    grad::SVector{Nc, SVector{Nv,Float64}}
end

struct PVar{T<:PGrad}
    val::Float64
    grad::T
end

Construction of one PVar{PGrad{2,1}} takes

pg1 = PGrad{2,1}(SA[1], SA[SA[1.0, 0.0]])
@btime PVar{PGrad{2,1}}(1.0, pg1)
 70.569 ns (1 allocation: 48 bytes) 

If I explicitly write

struct CGrad
    ind::SVector{1, Int}
    grad::SVector{1, SVector{2,Float64}}
end

struct CVar
    val::Float64
    grad::CGrad
end

Construction becomes much faster

cg1 = CGrad(SA[1], SA[SA[1.0, 0.0]])
@btime CVar(1.0, cg1)
7.000 ns (0 allocations: 0 bytes)

If I annotate types of grad, time for both cases go below ~5ns

ug1 = PGrad{2,1}(SA[1], SA[SA[1.0, 0.0]])
@btime PVar{PGrad{2,1}}(1.0, ug1::PGrad{2,1})
4.800 ns (0 allocations: 0 bytes)
cg1 = CGrad(SA[1], SA[SA[1.0, 0.0]])
@btime CVar(1.0, cg1::CGrad)
4.500 ns (0 allocations: 0 bytes)

Any idea what causes the difference in timing and number of allocations?

Maybe I am not using integer parameteric-type the correct way.

You need to interpolate to make timings accurate. If I do this I get the same timings:

julia> @btime PVar{PGrad{2,1}}(1.0, $pg1)
  0.017 ns (0 allocations: 0 bytes)
PVar{PGrad{2,1}}(1.0, PGrad{2,1}([1], SArray{Tuple{2},Float64,1,2}[[1.0, 0.0]]))

julia> @btime CVar(1.0, $cg1)
  0.017 ns (0 allocations: 0 bytes)
CVar(1.0, CGrad([1], SArray{Tuple{2},Float64,1,2}[[1.0, 0.0]]))
1 Like

It’s simply an issue with your benchmark:

# Your benchmark
julia> @btime PVar{PGrad{2,1}}(1.0, pg1);
  59.793 ns (1 allocation: 48 bytes)

# Fixed benchmark
julia> @btime PVar{PGrad{2,1}}(1.0, $pg1);
  0.028 ns (0 allocations: 0 bytes)

The extra $ in the second benchmark indicates that the compiler may treat pg1 as a constant for the purpose of the benchmark. This allows the compiler to figure out precisely which functions to call and inline these function calls such that in the end there’s nothing left to do (hence the <1 ns runtime).

1 Like

Thank you both!

1 Like

Note the warning for sub-nanosecond timings. It’s not real, it’s the compiler cheating.

Try @btime PVar{PGrad{2,1}}(1.0, $(Ref(pg1))[]);
The Ref prevents the compiler from doing too much at compile time.

1 Like

I get

pg1 = PGrad{2,1}(SA[1], SA[SA[1.0, 0.0]])
@btime PVar{PGrad{2,1}}(1.0, $(Ref(pg1))[])
1.400 ns (0 allocations: 0 bytes)

and

cg1 = CGrad(SA[1], SA[SA[1.0, 0.0]])
@btime CVar(1.0, $(Ref(cg1))[])
1.200 ns (0 allocations: 0 bytes)