Why is closure slower than handmade callable struct?

I’m having a hard time trying to understand why in the following code a closure is slower than an equivalent hand-made callable struct.

using BenchmarkTools
using Interpolations
using LinearAlgebra: norm
using StaticArrays
using QuadGK

speed(spline, t) = norm(Interpolations.gradient1(spline, t))

# closure version

length_closure(spline) = quadgk(t -> speed(spline, t), 0, length(spline))

# hand-made struct version

struct LenIntegrand{S}
    spline::S
end

(li::LenIntegrand)(t) = speed(li.spline, t)

length_struct(spline) = quadgk(LenIntegrand(spline), 0, length(spline))

# benchmarking code

θs = range(0, 2π, length=25)[1:end-1]
xs, ys = 2cos.(θs), 0.5sin.(θs)
vec = [SA[x,y] for (x,y) in zip(xs, ys)]
spl = extrapolate(interpolate(vec, BSpline(Cubic(Periodic(OnCell())))), Periodic())

@benchmark length_closure($spl) # ~ 65 μs
@benchmark length_struct($spl)  # ~ 45 μs

I can observe the same behavior in other similar examples, where a hand-made struct always beats a closure for the purpose of fixing some arguments, whereas I expected the two implementation to be pretty much equivalent. @code_warntype does not seem to help me understand the underlying issue here.

1 Like

It could be related to Performance Tips · The Julia Language.

Changing some instances of

to quadgk(f::F, segs...; kws...) where {F} would confirm that.

Ah great catch! I always try to remind of missed ::Function specialization in my code, but it didn’t occur to me in this case.

I’m left with a doubt though. After running both versions, methods(quadgk)[1].specializations shows that quadgk got specialized three times:

  • for ::LenIntegrand{Interpolations.Extrapolation{SVector{2, Float64}, 1, Interpolations.BSplineInterpolation{SVector{2, Float64}, 1, Vector{SVector{2, Float64}}, BSpline{Cubic{Periodic{OnCell}}}, Tuple{Base.OneTo{Int64}}}, BSpline{Cubic{Periodic{OnCell}}}, Periodic{Nothing}}} (the hand-made callable struct)
  • for ::Function
  • for ::var"#1#2"{Interpolations.Extrapolation{SVector{2, Float64}, 1, Interpolations.BSplineInterpolation{SVector{2, Float64}, 1, Vector{SVector{2, Float64}}, BSpline{Cubic{Periodic{OnCell}}}, Tuple{Base.OneTo{Int64}}}, BSpline{Cubic{Periodic{OnCell}}}, Periodic{Nothing}}} (the closure).

Why did quadgk get specialized at all, in apparent contradiction to its signature and https://docs.julialang.org/en/v1/manual/performance-tips/#Be-aware-of-when-Julia-avoids-specializing?
Given that quadgk got specialized anyway, why is there still a performance difference between the two versions?

Actually, I think this has already been fixed on the master branch of QuadGK. Probably by Allow passing preallocated segsbuf for alloc reuse (#59) · JuliaMath/QuadGK.jl@298f76e · GitHub which added some where F.

Using the master branch I get the same performance with both an anonymous function and a callable struct.