# Performant piecewise function evaluation

I’m working on developing a solver that requires piecewise-defined polynomial functions to be evaluated very frequently. The easiest way to define such a function would be something like

``````function piecewise_hardcoded(x)
if x < -1
return 0.0
elseif x < 0
return 1.0 - x^2
elseif x < 1
return x^2 -1.0
else
return 0.0
end
end
``````

This performs very well:

``````julia> @btime piecewise_hardcoded(x) setup=(x=4*rand()-2)
2.499 ns (0 allocations: 0 bytes)
``````

However, I’d like to be able to take arbitrary derivatives and integrals of such functions as well without rewriting the whole function, so I want to be able to store the functions and breakpoints in a struct like so:

``````using Polynomials
struct PiecewisePolynomial{P<:AbstractPolynomial, T<:Real} <: Function
polynomials::Vector{P}
breakpoints::Vector{T}
end

function evalpiecewise(functions, breakpoints, x)
for (i, b) in enumerate(breakpoints)
if b < x
return functions[i](x)
end
end
return functions[end](x)
end

(p::PiecewisePolynomial)(x) = evalpiecewise(p.polynomials, p.breakpoints, x)

const polynomials = [
Polynomial([0.0]), Polynomial([1.0, 0.0, -1.0]), Polynomial([-1.0, 0.0, 1.0]), Polynomial([0.0])
]

const breakpoints = [-1.0, 0.0, 1.0]

const mypiecewise = PiecewisePolynomial(polynomials, breakpoints)
``````

This is reasonably fast, but still 5-10 times slower than the hardcoded version

``````julia> @btime mypiecewise(x) setup=(x=4*rand()-2)
17.734 ns (0 allocations: 0 bytes)
``````

My next idea was to compute a closure for the evaluation and store that in the struct, in the hopes that the compiler could do some constant propagation on `evalfunc`

``````struct PiecewisePolynomial_2{F<:Function, P<:AbstractPolynomial, T<:Real} <: Function
polynomials::Vector{P}
breakpoints::Vector{T}
evalfunc::F
function PiecewisePolynomial_2(polynomials::Vector{P}, breakpoints::Vector{T}) where {P, T}
# define closure
evalfunc = let p = polynomials, b = breakpoints
x -> evalpiecewise(p, b, x)
end
return new{typeof(evalfunc), P, T}(polynomials, breakpoints, evalfunc)
end
end

(p::PiecewisePolynomial_2)(x) = p.evalfunc(x)
const mypiecewise_withclosure = PiecewisePolynomial_2(polynomials, breakpoints)
``````

However, this did not measurably impact performance

``````julia> @btime mypiecewise_withclosure(x) setup=(x=4*rand()-2)
17.734 ns (0 allocations: 0 bytes)
``````

The last thing I tried was using NTuples instead of vectors so that the number of segments is inferrable from the type:

``````struct PiecewisePolynomial_Tuples{F<:Function, P<:AbstractPolynomial, T<:Real, M} <: Function
polynomials::NTuple{N, P} where N
breakpoints::NTuple{M, T}
evalfunc::F

function PiecewisePolynomial_Tuples(polynomials::NTuple{N, P}, breakpoints::NTuple{M, T}) where {P, N, T, M}
@assert N == M + 1
@assert issorted(breakpoints)
evalfunc = let p = polynomials, b = breakpoints
x -> evalpiecewise(polynomials, breakpoints, x)
end
return new{typeof(evalfunc), P, T, M}(polynomials, breakpoints, evalfunc)
end
end

const poly_tuple = (
Polynomial([0.0]), Polynomial([1.0, 0.0, -1.0]), Polynomial([-1.0, 0.0, 1.0]), Polynomial([0.0])
)
const breakpoint_tuple = (-1.0, 0.0, 1.0)
const mypiecewise_tuples = PiecewisePolynomial_Tuples(poly_tuple, breakpoint_tuple)
``````

and that got me another 5 ns over the vectors, but it’s still 6 times slower than the hardcoded version.

``````julia> @btime mypiecewise_tuples(x) setup=(x=4*rand()-2)
12.512 ns (0 allocations: 0 bytes)
``````

Does anyone have any ideas for further speedups?

1 Like

`N` should be a parameter of your `struct`; otherwise you have an abstractly typed field.

2 Likes

Unfortunately no change in performance when I add N as a parameter of the struct, and I feel it overspecifies things as N should be M + 1

``````12.512 ns (0 allocations: 0 bytes)
``````

You have basically two nested loops: one iterating through `breakpoints` and one iterating through the coefficients of each polynomial. You can improve performance if you unroll these loops in your code. This can be done with the `@generated` macro.

``````using BenchmarkTools

struct Polynomial_Tuple{N, T <: Real}
p::NTuple{N, T}
end
(poly::Polynomial_Tuple)(x) = evalpoly(x, poly.p)

struct PiecewisePolynomial{N, P<:Polynomial_Tuple, T<:Real} <: Function
polynomials::Vector{P}
breakpoints::Vector{T}
function PiecewisePolynomial{N}(polynomials::Vector{P}, breakpoints::Vector{T}) where {N, P <: Polynomial_Tuple, T <: Real}
@assert length(polynomials) == N + 1
@assert length(breakpoints) == N
@assert issorted(breakpoints)
new{N, P, T}(polynomials, breakpoints)
end
end
(p::PiecewisePolynomial{N})(x) where N = evalpiecewise(Val(N), p.polynomials, p.breakpoints, x)

function evalpiecewise(::Val{N}, functions, breakpoints, x) where N
if @generated
generator = (:(if x < breakpoints[\$k]
return functions[\$k](x)
end) for k = 1 : N)
quote
@inbounds begin
\$(generator...)
return functions[\$(N + 1)](x)
end
end
else
ind = searchsortedfirst(breakpoints, x)
return functions[ind](x)
end
end

polynomials = [
Polynomial_Tuple((0., 0., 0.)), Polynomial_Tuple((1.0, 0.0, -1.0)), Polynomial_Tuple((-1.0, 0.0, 1.0)), Polynomial_Tuple((0., 0., 0.))
]
breakpoints = [-1.0, 0.0, 1.0]
mypiecewise = PiecewisePolynomial{3}(polynomials, breakpoints)

@btime \$mypiecewise(\$(4rand() - 2))
``````

I get 1.499 ns both with the code above and with your original `piecewise_hardcoded` function.

Note that:

• You probably meant `x < b` in your original `evalpiecewise` function.
• The `if @generated` line allows the compiler to choose whether to use the generated function or not.
• The `searchsortedfirst` function is optimized to do the lookup in your `evalpiecewise` function, especially when `N` is large.
• The `Val` type is used to pass the compile-time constant `N` to the function `evalpiecewise`.
• The `Base` function `evalpoly` implicitly unrolls the polynomial evaluation for you, since the polynomial is a `Tuple`.
• The `ImmutablePolynomial` type of the `Polynomials` package is similar to the `Polynomial_Tuple` type defined here. I have defined a new type only because I want `polynomials` to be a concrete container (in this case of quadratic polynomials), while the leading coefficient of `ImmutablePolynomial`s must be nonzero.
• Even though `PiecewisePolynomial` has a type parameter `N` to define the number of breakpoints, I still define its fields to be `Vector`s. You could change that to `Tuple`s if you want, but this could result in large compile times if `N` is large.

This last point makes me wonder if there is a way to condition the use of a `@generated` function on the value of `N` (using something like `if @generated() && N < 10`)…

6 Likes

This works great! I’m fairly new to the metaprogramming aspect of julia and had toyed with using a generated function before, but couldn’t figure out to make it work for me. I think the Val aspect of it was definitely something I was missing.

I did try out searchsortedfirst initially, but I’m expecting to be working with smaller piecewise (less than 6 or so) so it didn’t offer too many advantages over the more naive search.

Thank you so much!