Create interpolation function from file data during module initialization

I have a module that is supposed to perform some numerical tasks, including random sampling. These tasks, in particular the sampling, depend on using functions for which there is no closed-form expression, so I brute-forced a calculation for a big range of parameters with decent granularity to obtain a table that I then use to approximate these functions by linear interpolation. The resulting values have been stored in an HDF5 file that is read by the module.

I want the module to read the HDF5 file on runtime and to define the corresponding interpolation function that is based on that file’s tabular data. I currently have it defined in a function, but I can’t figure out how to avoid multiple instances of reading the HDF5 file and creating the interpolating function, which happens for each random number generation (without defining a global variable that is particular instance of this function). Keep in mind I defined a rand(rng::AbstractRNG, d::MyContinuousUnivariateDistributionType) method in the style of Distributions.jl, which then allows me to use the inplace and the array version of this method, and I wish to keep that functionality.

I’ve tried RuntimeGeneratedFunctions.jl, but it didn’t work, which I suspect is because this is akin to a closure. I have yet to try GeneralizedGenerated.jl.

Here’s a sketch of what is going on (doesn’t run, I will have to write a script to produce an artificial file if necessary):

module MyModule

import HDF5
import Interpolations


function read_truncations_fromfile()
    truncfile = HDF5.h5open(dirname(@__DIR__) * "/data/rng_truncations.h5")
    # println("reading from truncations table") # used this to detect multiple reads
    grids = truncfile["grids"]
    nodes = (HDF5.read(grids["alpha"]),
             HDF5.read(grids["c"]),
             HDF5.read(grids["theta"])
             )
    itp = Interpolations.interpolate(nodes,
                                     HDF5.read(truncfile["truncations"]),
                                     Interpolations.Gridded(Interpolations.Linear())
                                     )
    HDF5.close(truncfile)
    
    return itp
end

end

If I define

function truncation(α::Real, c::Real, θ::Real)
    itp = read_truncations_fromfile()
    return itp(α, c, θ)
end

It will read the file and construct the interpolating function every time truncation is called, so this seems useless.

If I simply declare a variable that is an instance of the function in the middle of the module

truncation =  read_truncations_fromfile()

It seems to work, but I am wary of doing this.

Any suggestions?

I just tried using @memoize from Memoization.jl and I think it solved the issue. I just put the macro at the function definition and I stopped seeing readings of the file on multiple calls.

That’s perfectly fine. I would just put const in front of it. (In fact, the interpolation object will then just get precompiled into your module.)

(Note that if your function is smooth and you can choose the sampling points, then you can get much better accuracy than linear interpolation with a much smaller number of points, e.g. using FastChebInterp.jl.)

That’s perfectly fine. I would just put const in front of it.

I think I did that and it blocked the instance as an Array instead of a Interpolations.GriddedInterpolation, so I had to leave the const out.

(Note that if your function is smooth and you can choose the sampling points, then you can get much better accuracy than linear interpolation with a much smaller number of points, e.g. using FastChebInterp.jl.)

Theoretically I would assume this function to be smooth, as it is the 0.01 percentile of a given distribution without closed-form density but I plotted it and found out it seems to have some irregularities. It’s a distribution only defined by its characteristic function, and I believe continuinty in its parameters depends on the specific parametrization being used. The one I chose might not be the one that provides that, but it is the one that makes the most sense within my context. I will keep your suggestion in mind for the future, thanks!

I can’t reproduce:

module Foo
using Interpolations
init_interp() = linear_interpolation(1:0.2:5, log.(1:0.2:5))
const itp = init_interp()
end

makes itp an interpolation object as expected.

I just went back and did it again and you’re right, it works as expected. I must’ve been hallucinating…