Why is closure slower than handmade callable struct?

I’m having a hard time trying to understand why in the following code a closure is slower than an equivalent hand-made callable `struct`.

``````using BenchmarkTools
using Interpolations
using LinearAlgebra: norm
using StaticArrays

# closure version

length_closure(spline) = quadgk(t -> speed(spline, t), 0, length(spline))

struct LenIntegrand{S}
spline::S
end

(li::LenIntegrand)(t) = speed(li.spline, t)

# benchmarking code

θs = range(0, 2π, length=25)[1:end-1]
xs, ys = 2cos.(θs), 0.5sin.(θs)
vec = [SA[x,y] for (x,y) in zip(xs, ys)]
spl = extrapolate(interpolate(vec, BSpline(Cubic(Periodic(OnCell())))), Periodic())

@benchmark length_closure(\$spl) # ~ 65 μs
@benchmark length_struct(\$spl)  # ~ 45 μs
``````

I can observe the same behavior in other similar examples, where a hand-made `struct` always beats a closure for the purpose of fixing some arguments, whereas I expected the two implementation to be pretty much equivalent. `@code_warntype` does not seem to help me understand the underlying issue here.

1 Like

It could be related to Performance Tips · The Julia Language.

Changing some instances of

to `quadgk(f::F, segs...; kws...) where {F}` would confirm that.

Ah great catch! I always try to remind of missed `::Function` specialization in my code, but it didn’t occur to me in this case.

I’m left with a doubt though. After running both versions, `methods(quadgk)[1].specializations` shows that `quadgk` got specialized three times:

• for `::LenIntegrand{Interpolations.Extrapolation{SVector{2, Float64}, 1, Interpolations.BSplineInterpolation{SVector{2, Float64}, 1, Vector{SVector{2, Float64}}, BSpline{Cubic{Periodic{OnCell}}}, Tuple{Base.OneTo{Int64}}}, BSpline{Cubic{Periodic{OnCell}}}, Periodic{Nothing}}}` (the hand-made callable struct)
• for `::Function`
• for `::var"#1#2"{Interpolations.Extrapolation{SVector{2, Float64}, 1, Interpolations.BSplineInterpolation{SVector{2, Float64}, 1, Vector{SVector{2, Float64}}, BSpline{Cubic{Periodic{OnCell}}}, Tuple{Base.OneTo{Int64}}}, BSpline{Cubic{Periodic{OnCell}}}, Periodic{Nothing}}}` (the closure).

Why did `quadgk` get specialized at all, in apparent contradiction to its signature and https://docs.julialang.org/en/v1/manual/performance-tips/#Be-aware-of-when-Julia-avoids-specializing?
Given that `quadgk` got specialized anyway, why is there still a performance difference between the two versions?

Actually, I think this has already been fixed on the master branch of QuadGK. Probably by Allow passing preallocated segsbuf for alloc reuse (#59) · JuliaMath/QuadGK.jl@298f76e · GitHub which added some `where F`.

Using the master branch I get the same performance with both an anonymous function and a callable struct.