un-XY: I’m working on a project that involves a lot of simulation, for which I need to compute basis-functions based on the simulated quantities repeatedly. I’m writing code that allows me to flexibly mix-and-match various types of basis-functions and interactions between them, and I’m trying to do this in a way that minimises memory and allocation.
Concrete problem:
How can I improve the function ComputeBasis!
below, to reduce the number of allocations? I currently get 12 allocations for the entire function call, but doing a similar thing with monomials (x, x^2, x^3, etc) takes substantially fewer allocations.
using BenchmarkTools;
struct HermiteOperation
init_domain::Int # What column in X to use to compute Hermite polynomials in
range::Vector{Int} # What columns in MX to write the Hermite polynomials to
order::Int # What order of Hermite polynomial to consider
end
function ComputeBasis!(MX::AbstractMatrix{Float64}, X::AbstractMatrix{Float64}, op::HermiteOperation)
tmp = X[:, op.init_domain]
MX[:, op.range[1]] .= tmp
MX[:, op.range[2]] .= muladd.(tmp, tmp, -1)
@views for i in 3:op.order
tmp .= MX[:, op.range[i - 1]]
MX[:, op.range[i]] .= muladd.(MX[:, op.init_domain], tmp , - (i-1) * MX[:, op.range[i - 2]])
end
return MX
end
struct PolynomialOperation
init_domain::Int
range::Vector{Int}
order::Int
end
function ComputeBasis!(MX::AbstractMatrix{Float64}, X::AbstractMatrix{Float64}, op::PolynomialOperation)
tmp = X[:, op.init_domain]
MX[:, op.range[1]] = tmp
@views for i in 2:op.order
tmp .*= X[:, op.init_domain]
MX[:, op.range[i]] .= tmp
end
return MX
end
function RunTest()
X = [-2.;0;;-5;0]
K = 10
MX = Matrix{Float64}(undef, size(X, 1), K)
op = HermiteOperation(1, 1:K, K)
#op = PolynomialOperation(1, 1:K, K)
ComputeBasis!(MX, X, op)
end
@benchmark RunTest()
Any suggestions to reduce allocations/memory use when using PolynomialOperation
are also very welcome, of course!
Further motivation: I will be writing analogous functions that act on vectors rather than matrices (where vectors correspond in my setting to 1 simulation path), and I need to call ComputeBasis!
repeatedly within an optimize
call, which I need to run for many simulated paths.