# Allocation-efficient calculation of Hermite (and Laguerre) polynomials

un-XY: I’m working on a project that involves a lot of simulation, for which I need to compute basis-functions based on the simulated quantities repeatedly. I’m writing code that allows me to flexibly mix-and-match various types of basis-functions and interactions between them, and I’m trying to do this in a way that minimises memory and allocation.

Concrete problem:
How can I improve the function `ComputeBasis!` below, to reduce the number of allocations? I currently get 12 allocations for the entire function call, but doing a similar thing with monomials (x, x^2, x^3, etc) takes substantially fewer allocations.

``````using BenchmarkTools;

struct HermiteOperation
init_domain::Int # What column in X to use to compute Hermite polynomials in
range::Vector{Int} # What columns in MX to write the Hermite polynomials to
order::Int # What order of Hermite polynomial to consider
end

function ComputeBasis!(MX::AbstractMatrix{Float64}, X::AbstractMatrix{Float64}, op::HermiteOperation)
tmp = X[:, op.init_domain]
MX[:, op.range[1]] .= tmp
MX[:, op.range[2]] .= muladd.(tmp, tmp, -1)
@views for i in 3:op.order
tmp .= MX[:, op.range[i - 1]]
MX[:, op.range[i]] .= muladd.(MX[:, op.init_domain], tmp , - (i-1) * MX[:, op.range[i - 2]])
end
return MX
end

struct PolynomialOperation
init_domain::Int
range::Vector{Int}
order::Int
end

function ComputeBasis!(MX::AbstractMatrix{Float64}, X::AbstractMatrix{Float64}, op::PolynomialOperation)
tmp = X[:, op.init_domain]
MX[:, op.range[1]] = tmp
@views for i in 2:op.order
tmp .*= X[:, op.init_domain]
MX[:, op.range[i]] .= tmp
end
return MX
end

function RunTest()
X = [-2.;0;;-5;0]
K = 10
MX = Matrix{Float64}(undef, size(X, 1), K)
op = HermiteOperation(1, 1:K, K)
#op = PolynomialOperation(1, 1:K, K)
ComputeBasis!(MX, X, op)
end

@benchmark RunTest()
``````

Any suggestions to reduce allocations/memory use when using `PolynomialOperation` are also very welcome, of course!

Further motivation: I will be writing analogous functions that act on vectors rather than matrices (where vectors correspond in my setting to 1 simulation path), and I need to call `ComputeBasis!` repeatedly within an `optimize` call, which I need to run for many simulated paths.

Let me just add that I have the same problem for my implementation of Laguerre polynomials, which are given by

``````struct LaguerreOperation <: BasisOperation
init_domain::Int
range::Vector{Int}
order::Int
α::Float64
end

function ComputeBasis!(MX::AbstractMatrix{Float64}, X::AbstractMatrix{Float64}, op::LaguerreOperation)
tmp = ones(size(X, 1))
@views MX[:, op.range[1]] .= 1 + op.α .- X[:, op.init_domain]
@views for i in 2:op.order
MX[:, op.range[i]] .= muladd.(2 * i + op.α - 1 .- X[:, op.init_domain], MX[:, op.range[i - 1]], -(i - 1 + op.α)*tmp)/i
if i < op.order
tmp .= MX[:, op.range[i - 1]]
end
end
return MX
end
``````

I think your code is efficient, we can add `@inbounds` to disable bounds checking and there is no need for `temp` variable in the for loop

``````function ComputeBasis11!(MX::AbstractMatrix{Float64}, X::AbstractMatrix{Float64}, op::HermiteOperation)
@views begin
tmp = X[:, op.init_domain]
MX[:, op.range[1]] .= tmp
MX[:, op.range[2]] .= muladd.(tmp, tmp, -1)
for i in 3:op.order
@inbounds MX[:, op.range[i]] .= muladd.(tmp, MX[:, op.range[i - 1]], - (i-1) * MX[:, op.range[i - 2]])
end
end
return MX
end
``````
2 Likes

thank you! I’m guessing the same probably applies for my Laguerre polynomial code?

After some tests, we can write code with just loops only

``````function ComputeBasis_No_Vec!(MX::AbstractMatrix{Float64}, X::AbstractMatrix{Float64}, op::HermiteOperation)

tmp = X[:, op.init_domain]
@views for (i, x) in enumerate(eachcol(MX))
for j in eachindex(x)
if i == 1
x[j] = tmp[j]
continue
end
if i == 2
x[j] = tmp[j] * tmp[j] - 1
continue
end
x[j] = tmp[j] * MX[j, op.range[i-1]] - (i-1) * MX[j, op.range[i - 2]]
end
end
MX
end
``````

which is nearly 2X with my benchmarks .

``````julia> @btime RunTest()
2.699 μs (17 allocations: 1.33 KiB)

julia> @btime RunTest_No_Vec()
1.287 μs (10 allocations: 800 bytes)
``````
2 Likes

That’s amazing, thank you! (I’ve added an @views at the line where `tmp` is defined, to cut out another allocation). I ended up using the following, as it allowed me to consistently generalise the methods to other choices of basis functions:

``````function ComputeBasisInner!(MX::AbstractMatrix{Float64}, X::AbstractMatrix{Float64}, op::HermiteOperation)
@views tmp = X[:, op.init_domain]
@views for (k, i) in enumerate(op.range)
for j in eachindex(tmp)
if k == 1
@inbounds MX[j, i] = tmp[j]
continue
end
if i == 2
@inbounds MX[j, i] = muladd(tmp[j], tmp[j], -1)
continue
end
@inbounds MX[j, i] = muladd(tmp[j], MX[j, op.range[k - 1]], - (i-1) * MX[j, op.range[k - 2]])
end
end
return MX
end
``````
1 Like