Performant piecewise function evaluation

I’m working on developing a solver that requires piecewise-defined polynomial functions to be evaluated very frequently. The easiest way to define such a function would be something like

function piecewise_hardcoded(x)
    if x < -1
        return 0.0
    elseif x < 0
        return 1.0 - x^2
    elseif x < 1
        return x^2 -1.0
    else
        return 0.0
    end
end

This performs very well:

julia> @btime piecewise_hardcoded(x) setup=(x=4*rand()-2)
2.499 ns (0 allocations: 0 bytes)

However, I’d like to be able to take arbitrary derivatives and integrals of such functions as well without rewriting the whole function, so I want to be able to store the functions and breakpoints in a struct like so:

using Polynomials
struct PiecewisePolynomial{P<:AbstractPolynomial, T<:Real} <: Function
    polynomials::Vector{P}
    breakpoints::Vector{T}
end

function evalpiecewise(functions, breakpoints, x)
    for (i, b) in enumerate(breakpoints)
        if b < x
            return functions[i](x)
        end
    end
    return functions[end](x)
end

(p::PiecewisePolynomial)(x) = evalpiecewise(p.polynomials, p.breakpoints, x)

const polynomials = [
    Polynomial([0.0]), Polynomial([1.0, 0.0, -1.0]), Polynomial([-1.0, 0.0, 1.0]), Polynomial([0.0])
]

const breakpoints = [-1.0, 0.0, 1.0]

const mypiecewise = PiecewisePolynomial(polynomials, breakpoints)

This is reasonably fast, but still 5-10 times slower than the hardcoded version

julia> @btime mypiecewise(x) setup=(x=4*rand()-2)
17.734 ns (0 allocations: 0 bytes)

My next idea was to compute a closure for the evaluation and store that in the struct, in the hopes that the compiler could do some constant propagation on evalfunc

struct PiecewisePolynomial_2{F<:Function, P<:AbstractPolynomial, T<:Real} <: Function
    polynomials::Vector{P}
    breakpoints::Vector{T}
    evalfunc::F
    function PiecewisePolynomial_2(polynomials::Vector{P}, breakpoints::Vector{T}) where {P, T}
        # define closure
        evalfunc = let p = polynomials, b = breakpoints
            x -> evalpiecewise(p, b, x)
        end
        return new{typeof(evalfunc), P, T}(polynomials, breakpoints, evalfunc)
    end
end

(p::PiecewisePolynomial_2)(x) = p.evalfunc(x)
const mypiecewise_withclosure = PiecewisePolynomial_2(polynomials, breakpoints)

However, this did not measurably impact performance

julia> @btime mypiecewise_withclosure(x) setup=(x=4*rand()-2)
17.734 ns (0 allocations: 0 bytes)

The last thing I tried was using NTuples instead of vectors so that the number of segments is inferrable from the type:

struct PiecewisePolynomial_Tuples{F<:Function, P<:AbstractPolynomial, T<:Real, M} <: Function
    polynomials::NTuple{N, P} where N
    breakpoints::NTuple{M, T}
    evalfunc::F
        
    function PiecewisePolynomial_Tuples(polynomials::NTuple{N, P}, breakpoints::NTuple{M, T}) where {P, N, T, M}
        @assert N == M + 1
        @assert issorted(breakpoints)
        evalfunc = let p = polynomials, b = breakpoints
           x -> evalpiecewise(polynomials, breakpoints, x)
        end
        return new{typeof(evalfunc), P, T, M}(polynomials, breakpoints, evalfunc) 
    end
end

const poly_tuple = (
    Polynomial([0.0]), Polynomial([1.0, 0.0, -1.0]), Polynomial([-1.0, 0.0, 1.0]), Polynomial([0.0])
)
const breakpoint_tuple = (-1.0, 0.0, 1.0)
const mypiecewise_tuples = PiecewisePolynomial_Tuples(poly_tuple, breakpoint_tuple)

and that got me another 5 ns over the vectors, but it’s still 6 times slower than the hardcoded version.

julia> @btime mypiecewise_tuples(x) setup=(x=4*rand()-2)
12.512 ns (0 allocations: 0 bytes)

Does anyone have any ideas for further speedups?

1 Like

N should be a parameter of your struct; otherwise you have an abstractly typed field.

2 Likes

Unfortunately no change in performance when I add N as a parameter of the struct, and I feel it overspecifies things as N should be M + 1

12.512 ns (0 allocations: 0 bytes)

You have basically two nested loops: one iterating through breakpoints and one iterating through the coefficients of each polynomial. You can improve performance if you unroll these loops in your code. This can be done with the @generated macro.

using BenchmarkTools

struct Polynomial_Tuple{N, T <: Real}
	p::NTuple{N, T}
end
(poly::Polynomial_Tuple)(x) = evalpoly(x, poly.p)

struct PiecewisePolynomial{N, P<:Polynomial_Tuple, T<:Real} <: Function
    polynomials::Vector{P}
    breakpoints::Vector{T}
	function PiecewisePolynomial{N}(polynomials::Vector{P}, breakpoints::Vector{T}) where {N, P <: Polynomial_Tuple, T <: Real}
		@assert length(polynomials) == N + 1
		@assert length(breakpoints) == N
		@assert issorted(breakpoints)
		new{N, P, T}(polynomials, breakpoints)
	end
end
(p::PiecewisePolynomial{N})(x) where N = evalpiecewise(Val(N), p.polynomials, p.breakpoints, x)

function evalpiecewise(::Val{N}, functions, breakpoints, x) where N
	if @generated
		generator = (:(if x < breakpoints[$k]
			return functions[$k](x)
		end) for k = 1 : N)
		quote
			@inbounds begin
				$(generator...)
				return functions[$(N + 1)](x)
			end
		end
	else
		ind = searchsortedfirst(breakpoints, x)
		return functions[ind](x)
	end
end

polynomials = [
    Polynomial_Tuple((0., 0., 0.)), Polynomial_Tuple((1.0, 0.0, -1.0)), Polynomial_Tuple((-1.0, 0.0, 1.0)), Polynomial_Tuple((0., 0., 0.))
]
breakpoints = [-1.0, 0.0, 1.0]
mypiecewise = PiecewisePolynomial{3}(polynomials, breakpoints)

@btime $mypiecewise($(4rand() - 2))

I get 1.499 ns both with the code above and with your original piecewise_hardcoded function.

Note that:

  • You probably meant x < b in your original evalpiecewise function.
  • The if @generated line allows the compiler to choose whether to use the generated function or not.
  • The searchsortedfirst function is optimized to do the lookup in your evalpiecewise function, especially when N is large.
  • The Val type is used to pass the compile-time constant N to the function evalpiecewise.
  • The Base function evalpoly implicitly unrolls the polynomial evaluation for you, since the polynomial is a Tuple.
  • The ImmutablePolynomial type of the Polynomials package is similar to the Polynomial_Tuple type defined here. I have defined a new type only because I want polynomials to be a concrete container (in this case of quadratic polynomials), while the leading coefficient of ImmutablePolynomials must be nonzero.
  • Even though PiecewisePolynomial has a type parameter N to define the number of breakpoints, I still define its fields to be Vectors. You could change that to Tuples if you want, but this could result in large compile times if N is large.

This last point makes me wonder if there is a way to condition the use of a @generated function on the value of N (using something like if @generated() && N < 10)…

6 Likes

This works great! I’m fairly new to the metaprogramming aspect of julia and had toyed with using a generated function before, but couldn’t figure out to make it work for me. I think the Val aspect of it was definitely something I was missing.

I did try out searchsortedfirst initially, but I’m expecting to be working with smaller piecewise (less than 6 or so) so it didn’t offer too many advantages over the more naive search.

Thank you so much!