Allocation-efficient calculation of Hermite (and Laguerre) polynomials

un-XY: I’m working on a project that involves a lot of simulation, for which I need to compute basis-functions based on the simulated quantities repeatedly. I’m writing code that allows me to flexibly mix-and-match various types of basis-functions and interactions between them, and I’m trying to do this in a way that minimises memory and allocation.

Concrete problem:
How can I improve the function ComputeBasis! below, to reduce the number of allocations? I currently get 12 allocations for the entire function call, but doing a similar thing with monomials (x, x^2, x^3, etc) takes substantially fewer allocations.

using BenchmarkTools;

struct HermiteOperation
    init_domain::Int # What column in X to use to compute Hermite polynomials in
    range::Vector{Int} # What columns in MX to write the Hermite polynomials to
    order::Int # What order of Hermite polynomial to consider
end

function ComputeBasis!(MX::AbstractMatrix{Float64}, X::AbstractMatrix{Float64}, op::HermiteOperation)
    tmp = X[:, op.init_domain]
    MX[:, op.range[1]] .= tmp
    MX[:, op.range[2]] .= muladd.(tmp, tmp, -1)
    @views for i in 3:op.order
        tmp .= MX[:, op.range[i - 1]]
        MX[:, op.range[i]] .= muladd.(MX[:, op.init_domain], tmp , - (i-1) * MX[:, op.range[i - 2]])
    end
    return MX
end

struct PolynomialOperation
    init_domain::Int
    range::Vector{Int}
    order::Int
end

function ComputeBasis!(MX::AbstractMatrix{Float64}, X::AbstractMatrix{Float64}, op::PolynomialOperation)
    tmp = X[:, op.init_domain]
    MX[:, op.range[1]] = tmp
    @views for i in 2:op.order
        tmp .*= X[:, op.init_domain]
        MX[:, op.range[i]] .= tmp
    end
    return MX
end

function RunTest()
    X = [-2.;0;;-5;0]
    K = 10
    MX = Matrix{Float64}(undef, size(X, 1), K)
    op = HermiteOperation(1, 1:K, K)
    #op = PolynomialOperation(1, 1:K, K)
    ComputeBasis!(MX, X, op)
end

@benchmark RunTest()

Any suggestions to reduce allocations/memory use when using PolynomialOperation are also very welcome, of course!

Further motivation: I will be writing analogous functions that act on vectors rather than matrices (where vectors correspond in my setting to 1 simulation path), and I need to call ComputeBasis! repeatedly within an optimize call, which I need to run for many simulated paths.

Let me just add that I have the same problem for my implementation of Laguerre polynomials, which are given by

struct LaguerreOperation <: BasisOperation
    init_domain::Int
    range::Vector{Int}
    order::Int
    α::Float64
end

function ComputeBasis!(MX::AbstractMatrix{Float64}, X::AbstractMatrix{Float64}, op::LaguerreOperation)
    tmp = ones(size(X, 1))
    @views MX[:, op.range[1]] .= 1 + op.α .- X[:, op.init_domain]
    @views for i in 2:op.order
        MX[:, op.range[i]] .= muladd.(2 * i + op.α - 1 .- X[:, op.init_domain], MX[:, op.range[i - 1]], -(i - 1 + op.α)*tmp)/i
        if i < op.order
            tmp .= MX[:, op.range[i - 1]]
        end
    end
    return MX
end

I think your code is efficient, we can add @inbounds to disable bounds checking and there is no need for temp variable in the for loop

function ComputeBasis11!(MX::AbstractMatrix{Float64}, X::AbstractMatrix{Float64}, op::HermiteOperation)
           @views begin
               tmp = X[:, op.init_domain] 
               MX[:, op.range[1]] .= tmp
               MX[:, op.range[2]] .= muladd.(tmp, tmp, -1)
               for i in 3:op.order
                   @inbounds MX[:, op.range[i]] .= muladd.(tmp, MX[:, op.range[i - 1]], - (i-1) * MX[:, op.range[i - 2]])
               end
           end
           return MX
       end
2 Likes

thank you! I’m guessing the same probably applies for my Laguerre polynomial code?

After some tests, we can write code with just loops only

function ComputeBasis_No_Vec!(MX::AbstractMatrix{Float64}, X::AbstractMatrix{Float64}, op::HermiteOperation)
       
       tmp = X[:, op.init_domain]
       @views for (i, x) in enumerate(eachcol(MX))
           for j in eachindex(x)
               if i == 1 
                   x[j] = tmp[j]
                   continue
               end 
               if i == 2
                   x[j] = tmp[j] * tmp[j] - 1
                   continue
               end
               x[j] = tmp[j] * MX[j, op.range[i-1]] - (i-1) * MX[j, op.range[i - 2]]
           end
       end
       MX
       end

which is nearly 2X with my benchmarks .

julia> @btime RunTest()
  2.699 μs (17 allocations: 1.33 KiB)

julia> @btime RunTest_No_Vec()
  1.287 μs (10 allocations: 800 bytes)
2 Likes

That’s amazing, thank you! (I’ve added an @views at the line where tmp is defined, to cut out another allocation). I ended up using the following, as it allowed me to consistently generalise the methods to other choices of basis functions:

function ComputeBasisInner!(MX::AbstractMatrix{Float64}, X::AbstractMatrix{Float64}, op::HermiteOperation)
    @views tmp = X[:, op.init_domain]
    @views for (k, i) in enumerate(op.range)
        for j in eachindex(tmp)
            if k == 1
                @inbounds MX[j, i] = tmp[j]
                continue
            end
            if i == 2
                @inbounds MX[j, i] = muladd(tmp[j], tmp[j], -1)
                continue
            end
            @inbounds MX[j, i] = muladd(tmp[j], MX[j, op.range[k - 1]], - (i-1) * MX[j, op.range[k - 2]])
        end
    end
    return MX
end
1 Like