How do I write a macro for evaluating polynomial derivatives?

I am working with splines, and I would like to evaluate polynomials along with their derivatives using all sorts of nice tricks like @evalpoly and @SArray. The function below gives an idea. It does not look great, though. What are my options to turn this into something where I can be sure the resulting code can be nicely inlined and optimized, with no “if dev==…” after compilation, and also avoiding code duplication? It would also be cool to make it generic on the polynomial order, for instance. Is this a situation where macros or @generated might help me? Or is this code perhaps already the best conceivably possible?

using StaticArrays

function calc_spline(t, a, dev=0)
    N = size(a, 2)
    i = convert(Int, min(max(1, floor(t)), N)) :: Int
    u = t - i
    p = a[:, i]

    if dev==0
        @evalpoly(u, p[1], p[2], p[3], p[4])
    elseif dev==1
        @SArray [
            @evalpoly(u, p[1], p[2], p[3], p[4]),
            @evalpoly(u, p[2], 2 * p[3], 3 * p[4])
        ]
    elseif dev==2
        @SArray [
            @evalpoly(u, p[1], p[2], p[3], p[4]),
            @evalpoly(u, p[2], 2 * p[3], 3 * p[4]),
            @evalpoly(u, 2 * p[3], 2 * (3 * p[4]))
        ]
    elseif dev==3
        @SArray [
            @evalpoly(u, p[1], p[2], p[3], p[4]),
            @evalpoly(u, p[2], 2 * p[3], 3 * p[4]),
            @evalpoly(u, 2 * p[3], 2 * (3 * p[4])),
            2 * (3 * p[4])
        ]
    end
end

julia> a = 0.10 * reshape(1:40, 4,:)
4×10 Array{Float64,2}:
 0.1  0.5  0.9  1.3  1.7  2.1  2.5  2.9  3.3  3.7
 0.2  0.6  1.0  1.4  1.8  2.2  2.6  3.0  3.4  3.8
 0.3  0.7  1.1  1.5  1.9  2.3  2.7  3.1  3.5  3.9
 0.4  0.8  1.2  1.6  2.0  2.4  2.8  3.2  3.6  4.0

julia> calc_spline(3.4, a, 0)
1.5527999999999997

julia> calc_spline(3.4, a, 1)
2-element SArray{Tuple{2},Float64,1,2}:
 1.5527999999999997
 2.4559999999999995

julia> calc_spline(3.4, a, 2)
3-element SArray{Tuple{3},Float64,1,3}:
 1.5527999999999997
 2.4559999999999995
 5.08              

julia> calc_spline(3.4, a, 3)
4-element SArray{Tuple{4},Float64,1,4}:
 1.5527999999999997
 2.4559999999999995
 5.08              
 7.200000000000001 
1 Like

Yes this is a good use for generated functions. As a simple example, here’s a version of evalpoly for StaticVectors, invoking @evalpoly to do the hard work:

@generated function evalpoly(x, v::StaticVector{N,T}) where {N,T}
    vs = [:(v[$i]) for i=1:N] 
    quote  
        Base.Math.@evalpoly x $(vs...)
    end    
end
1 Like

First, a couple of suggestions:
The line i = convert(Int, min(max(1, floor(t)), N)) :: Int could be written more concisely as i = min(max(1, floor(Int, t)), N). ::Int is not needed, as Julia is smart enough to infer the correct output type.
Is there a special reason, you chose to use StaticArrays instead of regular Arrays? It doesn’t make much sense here, because the compiler can’t infer the length of the Array anyway, so there wouldn’t be much of a performance benefit.
You also want to be careful with respect to type stability. It’s probably not a good idea to return a scalar if dev is 0 and otherwise an Array. You should return an Array with one entry. This will make it easier to handle the result afterwards, as you don’t have to always handle this cases separately and also makes the function more type stable. Also, instead of just ignoring values of dev not between 0 and 3, it would be a good idea to throw an error instead because otherwise, your function returns nothing in these cases, which can make your code harder to debug and also means your function isn’t as type stable.
Now back to your original question:
You can certainly make the code more generic and precise, but I’m not sure it really makes sense to use macros and generated functions here. @c42f’s generated evalpoly function won’t work in your example, because p is not a StaticVector, but I wonder if there’s really a performance benefit here over just a regular function like this:

function evalpoly(x, p)
    res = 0
    for i in keys(p)
        res += p[i] * x^(i-1)
    end
    return res
end

For the differentiation I would write a helper function like this:

diffpoly(p) = p[2:end] .* (1:length(p)-1)

You can then write your function like this:

function calc_spline(t, a, dev=0)
    N = size(a, 2)
    i = min(max(1, floor(Int, t)), N)
    u = t - i
    p = a[:, i]

    res = Vector{promote_type(eltype(p),typeof(u))}(undef, dev+1)
    
    for i in 1:dev
        res[i] = evalpoly(u, p)
        p = diffpoly(p)
    end
    
    res[end] = evalpoly(u, p)
    
    return res
end

Some quick benchmarks show, this is actually slower than you’re example, but you could probably still optimize it quite a bit. Maybe someone else has an idea.

3 Likes

To get rid of the dev portion, define a new struct dev{D} end then dispatch on it, for example

function calc_spline(t, a, ::dev{D}) where D

then use D for your if-then-else construct, then at compile time only the relevant branch will be included

1 Like

Why not just use dual numbers?

The way I’d model a real solution to this problem is to have a type Poly and another type PolyDerivs{D} for all derivatives up to D. Here’s one way to do that.

A few quick spot checks seems to indicate that this generates quite reasonable code. Please benchmark it though!

module Polys

export Poly, PolyDerivs, diffpoly

using StaticArrays

"""
    Poly(a,b,c, ...)
"""
struct Poly{N,T}
    coeffs::SVector{N,T}
end

Poly(args...) = Poly(SVector(args))

Base.show(io::IO, p::Poly) = join(io, ["$(c)x^$(i-1)" for (i,c) in enumerate(p.coeffs)], " + ")

# Implementations below generate good code when Poly coeffs are StaticArrays.

# evalpoly via recursive horner implementation. This would be bad if the length
# of the coefficients were unknown but it's fully expanded by the compiler for a
# StaticVector. Could use `foldr` if StaticArrays implemented it.
_evalpoly(x, c) = length(c) == 1 ? c[1] : muladd(x, _evalpoly(x, popfirst(c)), c[1])

_diffpoly(c) = map(*, popfirst(c), 1:length(c)-1)


(p::Poly)(x) = _evalpoly(x, p.coeffs)
diffpoly(p::Poly) = Poly(_diffpoly(p.coeffs))


"""
    PolyDerivs{D}(p)

Derivatives of polynomial p of orders `0:D`.
"""
struct PolyDerivs{D,P<:Poly}
    p::P
end

PolyDerivs{D}(p) where {D} = PolyDerivs{D,typeof(p)}(p)

Base.show(io::IO, pd::PolyDerivs{D}) where {D} = print(io, "PolyDerivs{$D}(", pd.p, ")")

# I got lazy here and used a generated function to iterate the calculation of
# the derivatives based on `diffpoly` above, rather than doing something more
# sensible involving iterating the terms numerically. LLVM produces reasonable
# code at a quick glance, but it should be possible to do much better.
# Also should be possible to do it without a generated function, but that's
# also an exercise for the reader ;-)
@generated function (d::PolyDerivs{D})(x) where {D}
    exs = [:(p0 = d.p)]
    vals = [:(p0(x))]
    for i = 1:D
        p_i1 = Symbol("p$(i-1)")
        p_i  = Symbol("p$i")
        push!(exs, :($p_i = diffpoly($p_i1)))
        push!(vals, :($p_i(x)))
    end
    quote
        $(exs...)
        SVector(($(vals...)))
    end
end

end

You can use individual Poly and PolyDerivs objects as functions:

julia> p = Polys.Poly(1.0, 2.0, 3.0, 4.0, 5.0)
1.0x^0 + 2.0x^1 + 3.0x^2 + 4.0x^3 + 5.0x^4

julia> pd = Polys.PolyDerivs{2}(p)
PolyDerivs{2}(1.0x^0 + 2.0x^1 + 3.0x^2 + 4.0x^3 + 5.0x^4)

julia> p(1.0)
15.0

julia> pd(1.0)
3-element StaticArrays.SArray{Tuple{3},Float64,1,3}:
 15.0
 40.0
 90.0

You can very easily put them together into a spline. In practice I’d suggest you introduce a Spline type for this rather than needing a new calc_spline function. And also noting that there’s packages for this kind of thing which already exist…

julia> spline = [Polys.Poly(1.0, 2.0, i) for i = 1:5]
5-element Array{Main.Polys.Poly{3,Float64},1}:
 1.0x^0 + 2.0x^1 + 1.0x^2
 1.0x^0 + 2.0x^1 + 2.0x^2
 1.0x^0 + 2.0x^1 + 3.0x^2
 1.0x^0 + 2.0x^1 + 4.0x^2
 1.0x^0 + 2.0x^1 + 5.0x^2

julia> spline_derivs = Polys.PolyDerivs{2}.(spline)
5-element Array{Main.Polys.PolyDerivs{2,Main.Polys.Poly{3,Float64}},1}:
 PolyDerivs{2}(1.0x^0 + 2.0x^1 + 1.0x^2)
 PolyDerivs{2}(1.0x^0 + 2.0x^1 + 2.0x^2)
 PolyDerivs{2}(1.0x^0 + 2.0x^1 + 3.0x^2)
 PolyDerivs{2}(1.0x^0 + 2.0x^1 + 4.0x^2)
 PolyDerivs{2}(1.0x^0 + 2.0x^1 + 5.0x^2)

julia> function calc_spline(t, spline)
           i = clamp(floor(Int,t), 1, length(spline))
           u = t - i
           @inbounds spline[i](u)
       end
calc_spline (generic function with 1 method)

julia> calc_spline(3.4, spline)
2.28

julia> calc_spline(3.4, spline_derivs)
3-element StaticArrays.SArray{Tuple{3},Float64,1,3}:
 2.28              
 4.3999999999999995
 6.0               
4 Likes

I have an implementation for evaluating a polynomial and its derivative in my StaticPolynomials package here:

https://github.com/JuliaAlgebra/StaticPolynomials.jl/blob/098d9888da1bcd06e79058d7d5e2a44e28368f2d/src/evalpoly.jl#L79

3 Likes

I actually recently registered StaticUnivariatePolynomials.jl because among the many polynomial libraries, I couldn’t find one with a polynomial type backed by a Tuple.

Demo:

julia> using StaticUnivariatePolynomials

julia> import StaticUnivariatePolynomials: derivative, integral # unexported to avoid conflicts with other packages

julia> p = Polynomial(1, 2, 3) # 1 + 2x + 3x^2
Polynomial{3,Int64}((1, 2, 3))

julia> p′ = derivative(p)
Polynomial{2,Int64}((2, 6))

Here’s how I implemented derivative:

https://github.com/tkoolen/StaticUnivariatePolynomials.jl/blob/f7f8454e21d25a4e6f2235a43ea28d3078d33570/src/StaticUnivariatePolynomials.jl#L87-L92

2 Likes

@c42f, I did something very similar to your PolyDerivs type here:

https://github.com/tkoolen/QPControl.jl/blob/6e4a1c82890fabaea89bbb4f3d95bbcac00b8d21/src/trajectories/interpolation.jl#L89-L107

(@simeonschaub, there might be some useful utilities for you in that directory)

but I’m storing the derivatives to optimize evaluating the polynomial and its derivatives at multiple points.

I guess I could just add that a version of this to StaticUnivariatePolynomials.

Thanks for the suggestion. The polynomial size is known at compile time.

Thanks a lot, this looks great!

I am not just interested in evaluating these polynomials, I am also trying to understand macros and generated functions better. This here is a real problem I had while working with other languages, and I feel Julia might offer a great solution using macros. I could of course just write down all the expressions in a way I think the compiler will like, and doing a lot of code duplication. But the point is to exercise the high-level capabilities of the language. What disadvantage do you see in this approach?

This looks nice, using tuples is fine by me. I’ll try all these options and see if there’s any difference.

Cool :slight_smile: In the above I was storing only the polynomial, but of course you could precalculate coefficients of all the derivatives. StaticUnivariatePolynomials looks like a neat little package very applicable to this problem.

No real disadvantage. The julia compiler really excels at letting you write generic code for this kind of stuff and get great performance. Macros are great, but when you’ve got small data structures which are parameterized by size (eg, polynomial order) you often won’t know the size based on syntax alone, in which case macros won’t help you. In those cases you may need generated functions which let you generate syntax based on the type of the arguments. Even better if you can get the same results with normal function calls, carefully arranged and this has been getting a lot easier with improvements to the compiler.

It’s quite interesting, I wrote these two versions here: one just puts all the calculations into a tuple and the other is based on your make_poly_derivs, except I evaluate each polynomial straight away (and use your library). The second version actually gets compiled into slightly fewer instructions, even with -O3! Curious now to find out why is there any difference at all in the result. Numerically they don’t exactly match either.

In any case, this recursive inlining with tuple splatting and the use of Val looks quite nice to me, and I think I’ll be sticking to that. Thanks!

@inline function polydevs(u, p::NTuple{4,T}) where T
    (@evalpoly(u, p[1], p[2], p[3], p[4]),
     @evalpoly(u, p[2], 2*p[3], 3*p[4]),
     @evalpoly(u, 2*p[3], 6*p[4]))
end

@code_native polydevs(u, p)

using StaticUnivariatePolynomials

calc_poly_derivs(u, p::Polynomial, ::Val{0}) = (p(u),)
@inline function calc_poly_derivs(u, p::Polynomial, ::Val{num_derivs}) where num_derivs
    (p(u), calc_poly_derivs(u, StaticUnivariatePolynomials.derivative(p), Val(num_derivs - 1))...)
end

function polydevs2(u, p::NTuple{4,T}) where T
    pp = Polynomial(p)
    calc_poly_derivs(u, pp, Val{2}())
end

@code_native polydevs2(u, p)
julia> @code_native polydevs(u, p)
        .text
        vmovsd  24(%rsi), %xmm1         # xmm1 = mem[0],zero
        vmovsd  8(%rsi), %xmm2          # xmm2 = mem[0],zero
        vmovsd  16(%rsi), %xmm3         # xmm3 = mem[0],zero
        movabsq $140395050138792, %rax  # imm = 0x7FB045177CA8
        vmulsd  (%rax), %xmm0, %xmm4
        vunpcklpd       %xmm4, %xmm0, %xmm4 # xmm4 = xmm0[0],xmm4[0]
        vmovddup        %xmm0, %xmm5    # xmm5 = xmm0[0,0]
        movabsq $140395050138800, %rax  # imm = 0x7FB045177CB0
        vmulsd  (%rax), %xmm0, %xmm6
        vfmadd213sd     %xmm3, %xmm1, %xmm0
        vaddsd  %xmm3, %xmm3, %xmm3
        vunpcklpd       %xmm1, %xmm0, %xmm0 # xmm0 = xmm0[0],xmm1[0]
        vunpcklpd       %xmm3, %xmm2, %xmm2 # xmm2 = xmm2[0],xmm3[0]
        vfmadd231pd     %xmm0, %xmm4, %xmm2
        vfmadd213pd     (%rsi), %xmm2, %xmm5
        vfmadd213sd     %xmm3, %xmm1, %xmm6
        vmovupd %xmm5, (%rdi)
        vmovsd  %xmm6, 16(%rdi)
        movq    %rdi, %rax
        retq
julia> @code_native polydevs2(u, p)
        .text
        vmovsd  8(%rsi), %xmm1          # xmm1 = mem[0],zero
        vmovsd  16(%rsi), %xmm2         # xmm2 = mem[0],zero
        vmulsd  24(%rsi), %xmm0, %xmm3
        vaddsd  %xmm2, %xmm3, %xmm4
        vaddsd  %xmm2, %xmm2, %xmm2
        vunpcklpd       %xmm3, %xmm0, %xmm5 # xmm5 = xmm0[0],xmm3[0]
        movabsq $140395050140592, %rax  # imm = 0x7FB0451783B0
        vblendpd        $2, (%rax), %xmm4, %xmm4 # xmm4 = xmm4[0],mem[1]
        vunpcklpd       %xmm2, %xmm1, %xmm1 # xmm1 = xmm1[0],xmm2[0]
        vfmadd231pd     %xmm4, %xmm5, %xmm1
        vmovddup        %xmm0, %xmm0    # xmm0 = xmm0[0,0]
        vfmadd213pd     (%rsi), %xmm1, %xmm0
        movabsq $140395050140608, %rax  # imm = 0x7FB0451783C0
        vfmadd231sd     (%rax), %xmm3, %xmm2
        vmovupd %xmm0, (%rdi)
        vmovsd  %xmm2, 16(%rdi)
        movq    %rdi, %rax
        retq