They don’t need to. Since they are implemented with type-generic Julia code (presumably just recurrence relations), ForwardDiff.jl can differentiate them just fine:
julia> using LegendrePolynomials, ForwardDiff
julia> Plm(0.5, 3, 2) # associated Legendre polynomial
5.624999999999997
julia> Plm′(x, ℓ, m) = ForwardDiff.derivative(x -> Plm(x, ℓ, m), x) # its derivative
Plm′ (generic function with 1 method)
julia> Plm′(0.5, 3, 2)
3.7499999999999973
As you’d hope, evaluating the derivative is basically the same cost as evaluating the function:
julia> using BenchmarkTools
julia> @btime Plm(0.5, 3, 2);
132.144 ns (0 allocations: 0 bytes)
julia> @btime Plm′(0.5, 3, 2);
134.959 ns (0 allocations: 0 bytes)
However, LegendrePolynomials.jl seems to be leaving some performance (and accuracy?) on the table (slow compared to AssociatedLegendrePolynomials.jl and to GSL.jl · Issue #18 · jishnub/LegendrePolynomials.jl · GitHub). AssociatedLegendrePolynomials.jl is faster:
julia> import AssociatedLegendrePolynomials
julia> @btime AssociatedLegendrePolynomials.Plm(3, 2, 0.5)
34.540 ns (6 allocations: 96 bytes)
5.625
julia> @btime ForwardDiff.derivative(x -> AssociatedLegendrePolynomials.Plm(3, 2, x), 0.5)
54.234 ns (6 allocations: 192 bytes)
3.75
but this is also losing some performance because it allocates, even if you use its “in-place” API:
julia> P = fill(NaN); # 0-dimensional array to hold the result
julia> @btime AssociatedLegendrePolynomials.Plm!($P, 3, 2, 0.5)
30.920 ns (5 allocations: 80 bytes)
0-dimensional Array{Float64, 0}:
5.625
GSL is currently slightly faster than AssociatedLegendrePolynomials.Plm!
julia> import GSL
julia> @btime GSL.sf_legendre_Plm(3, 2, 0.5)
27.088 ns (0 allocations: 0 bytes)
5.625
They are probably all using the same recurrences, but LegendrePolynomials.jl is hiding it behind an iterator-based implementation that I’m guessing is failing to inline or something?