Using Numerical Integration and ApproxFun in Turing.jl

I’m trying to write some code to “show how it works” when it comes to imposing high quality prior information on a Bayesian regression problem. I’m running into implementation issues and could use some thoughts @stevengj in particular might have some ideas because of ApproxFun, but also any of the Turing crew or anyone else please feel free to chip in.

Suppose you want to fit a function to a particular data set, and you have some theoretical reason to believe the function should have some properties. Let’s for the moment assume we’re fitting a function on the standard Chebyshev domain [1,1] because it avoids some complications.

Here are some examples of the kinds of information we might have:

  1. We think the function near the left endpoint should be not far from say 0.0, and on the right endpoint not far from 1.0.
  2. The function is strictly positive
  3. The function has non-negative derivative
  4. The function should be near 0.3 at 0.0

I’m representing this function as a 5th order Chebyshev series using ApproxFun, and I can impose the conditions in 1 and 4 no problem with addlogprob!

coefs ~ MvNormal(zeros(6),1000.0.*I(6))
f = Fun(Chebyshev(),coefs)
Turing.addlogprob!(logpdf(Gamma(3,.01/2),f(-1.0)) # function near -1 is near 0
Turing.addlogprob!(logpdf(Gamma(20,1.0/19),f(1.0)) # function near 1 is near 1.0
Turing.addlogprob!(logpdf(Gamma(20,0.3/19),f(0.0)) # function near 0 is near 0.30

Ideally what I’d like to do for condition (2) and (3) is to integrate a function like:

lpp(x) = 10*(logistic((x-0.01)/.01)-1)

Which is kind of a smooth step function, it goes very negative for negative quantities, and is near 0 for positive quantities. I’d then add this to the log probability and down-weight any functions that spend any time with negative values…

val,prec = quadgk(x->lpp(f(x)),-1,1)
Turing.@addlogprob!(val) ## imposing condition 2 approximately

But so far that doesn’t work right

What’s the best way to impose these kind of integral based conditions in Turing? There are a number of issues:

  1. Interaction with gradients in Turing?
  2. Numerical stability?
  3. Type compatibility.

I’ve tried a number of ideas, and would be happy to give details of how it failed, but first I thought maybe I’d see if anyone has some idea of the “canonical” way to utilize numerical integration results inside Turing?

See e.g. numerical methods - How can I find a non-negative interpolation function? - Mathematics Stack Exchange

In particular, it’s easy to constrain a function f(x) to be nonnegative by expressing it as f(x) = p(x)^2 where p(x) is your ApproxFun (or whatever) function.

Yes, I certainly have thought about modeling the logarithm for example. But that doesn’t really help with the non-negative derivative condition. I could model the logarithm of the derivative, but then I need to integrate to evaluate the function…

And in general, I could imagine a constraint that doesn’t look like “f > 0” at all. Like

\frac{1}{q} \int_0^q (f(x) - g(x))^2 dx = 1

One thing I tried was to fit an ApproxFun:

foo = Fun(x->lpp(f(x)),Chebyshev())
ifoo = Integrate()*foo


But this ran into some issues with the autodiff, it seemed that ApproxFun probably didn’t know what to do with the specialized types for differentiation.

I might be able to sample with a different sampler, but without NUTS it could be extremely slow to get good results.

Then I tried doing quadgk on the ApproxFun, but got some kind of NAN issues. It works in the context of the REPL but inside Turing again interaction with autodiff seems problematic. I can provide some errors if needed.

ERROR: DomainError with 1.5:
integrand produced TrackedReal<Atz>(NaN, 0.0, Aaq, ---) in the interval (0.501, 2.499)
  [1] evalrule(f::var"#153#155"{Float64, Fun{Chebyshev{IntervalSets.ClosedInterval{Float64}, Float64}, ReverseDiff.TrackedReal{Float64, Float64, ReverseDiff.TrackedArray{Float64, Float64, 1, Vector{Float64}, Vector{Float64}}}, ReverseDiff.TrackedArray{Float64, Float64, 1, Vector{Float64}, Vector{Float64}}}}, a::Float64, b::Float64, x::Vector{Float64}, w::Vector{Float64}, gw::Vector{Float64}, nrm::typeof(norm))
    @ QuadGK ~/.julia/packages/QuadGK/XqIlh/src/evalrule.jl:37
  [2] #2
    @ ~/.julia/packages/QuadGK/XqIlh/src/adapt.jl:10 [inlined]

Note that MH sampling (diffusive without derivatives) seems to work well together with hcubature numerical integration. It’s slow but there aren’t type related errors etc. So it seems that the main issue is somehow dealing with autodiff types.

Specifically, the following code works with diffusive samplers, but fails with NUTS which requires gradients:

    f = Fun(Chebyshev(Interval(0.5,2.5)),coefs)
    df = Derivative(Chebyshev(Interval(0.5,2.5)),1)*f

    notneg,prec = hquadrature(x -> probneg(f(x)), .501,2.499)
    notnegderiv,prec = hquadrature(x -> probneg(df(x)), .501,2.499)

Giving an error like:

ERROR: MethodError: *(::BandedMatrices.BandedMatrix{Float64, Matrix{Float64}, Base.OneTo{Int64}}, ::ReverseDiff.TrackedArray{Float64, Float64, 1, Vector{Float64}, Vector{Float64}}) is ambiguous.

  *(x::AbstractMatrix, y::ReverseDiff.TrackedArray{V, D, 1}) where {V, D}
    @ ReverseDiff ~/.julia/packages/ReverseDiff/7pHoq/src/derivatives/linalg/arithmetic.jl:214
  *(x::AbstractArray, y::ReverseDiff.TrackedArray{V, D, 1}) where {V, D}
    @ ReverseDiff ~/.julia/packages/ReverseDiff/7pHoq/src/derivatives/linalg/arithmetic.jl:214
  *(x::AbstractMatrix, y::ReverseDiff.TrackedArray{V, D}) where {V, D}
    @ ReverseDiff ~/.julia/packages/ReverseDiff/7pHoq/src/derivatives/linalg/arithmetic.jl:214
  *(x::AbstractArray, y::ReverseDiff.TrackedArray{V, D}) where {V, D}
    @ ReverseDiff ~/.julia/packages/ReverseDiff/7pHoq/src/derivatives/linalg/arithmetic.jl:214
  *(A::ArrayLayouts.LayoutMatrix, B::AbstractVector)
    @ ArrayLayouts ~/.julia/packages/ArrayLayouts/zY7r9/src/mul.jl:211
  *(A::AbstractMatrix{T}, x::AbstractVector{S}) where {T, S}
    @ LinearAlgebra /var/local/dlakelan/julia/julia-1.9.0/share/julia/stdlib/v1.9/LinearAlgebra/src/matmul.jl:54

Possible fix, define
  *(::ArrayLayouts.LayoutMatrix, ::ReverseDiff.TrackedArray{V, D, 1}) where {V, D}

  [1] mul_coefficients(A::ApproxFunBase.SubOperator{Float64, ApproxFunBase.ConcreteDerivative{Chebyshev{IntervalSets.ClosedInterval{Float64}, Float64}, Int64, Float64}, Tuple{UnitRange{Int64}, UnitRange{Int64}}, Tuple{Int64, Int64}, Tuple{Int64, Int64}}, b::ReverseDiff.TrackedArray{Float64, Float64, 1, Vector{Float64}, Vector{Float64}})
    @ ApproxFunBase ~/.julia/packages/ApproxFunBase/RQAwE/src/Operators/SubOperator.jl:382
  [2] _mulcoeff_maybeconvert
    @ ~/.julia/packages/ApproxFunBase/RQAwE/src/Operators/general/algebra.jl:716 [inlined]
  [3] mulcoeff_maybeconvert
    @ ~/.julia/packages/ApproxFunBase/RQAwE/src/Operators/general/algebra.jl:719 [inlined]
  [4] mul_coefficients(::ApproxFunBase.ConcreteDerivative{Chebyshev{IntervalSets.ClosedInterval{Float64}, Float64}, Int64, Float64}, ::ReverseDiff.TrackedArray{Float64, Float64, 1, Vector{Float64}, Vector{Float64}})
    @ ApproxFunBase ~/.julia/packages/ApproxFunBase/RQAwE/src/Operators/general/algebra.jl:698
  [5] *(A::ApproxFunBase.ConcreteDerivative{Chebyshev{IntervalSets.ClosedInterval{Float64}, Float64}, Int64, Float64}, b::Function)
    @ ApproxFunBase ~/.julia/packages/ApproxFunBase/RQAwE/src/Operators/general/algebra.jl:735
  [6] fracfat(__model__::DynamicPPL.Model{typeof(fracfat), (:adip, :ffat), (), (), Tuple{Vector{Float64}, Vector{Float64}}, Tuple{}, DynamicPPL.DefaultContext}, __varinfo__::DynamicPPL.ThreadSafeVarInfo{DynamicPPL.TypedVarInfo{NamedTuple{(:coefs,), Tuple{DynamicPPL.Metadata{Dict{AbstractPPL.VarName{:coefs, Setfield.IdentityLens}, Int64}, Vector{DiagNormal}, Vector{AbstractPPL.VarName{:coefs, Setfield.IdentityLens}}, ReverseDiff.TrackedArray{Float64, Float64, 1, Vector{Float64}, Vector{Float64}}, Vector{Set{DynamicPPL.Selector}}}}}, ReverseDiff.TrackedReal{Float64, Float64, ReverseDiff.TrackedArray{Float64, Float64, 1, Vector{Float64}, Vector{Float64}}}}, Vector{Base.RefValue{ReverseDiff.TrackedReal{Float64, Float64, ReverseDiff.TrackedArray{Float64, Float64, 1, Vector{Float64}, Vector{Float64}}}}}}, __context__::DynamicPPL.SamplingContext{DynamicPPL.Sampler{NUTS{Turing.Essential.ReverseDiffAD{false}, (), AdvancedHMC.DiagEuclideanMetric}}, DynamicPPL.DefaultContext, TaskLocalRNG}, adip::Vector{Float64}, ffat::Vector{Float64})
    @ Main ~/Consulting/LakelandAppliedSciLLC/MathModelsBook/functionsandapproximation.qmd:14
  [7] _evaluate!!

Are you attached to Chebyshev functions? Considering your constraints, they don’t seem like the easiest choice. I have used B-splines for modeling monotonic functions in some projects and find it pretty pleasant, although I don’t use Bayesian methods like you’re describing and so maybe there is some issue that I’m not aware of.

I am attached to Chebyshev functions for various reasons. The monotonicity here is not really the issue though. In the general case, it’s more “we have some prior information expressable as an integral”. Although there’s nothing wrong with splines really, I don’t think splines make it easier to do numerical integration inside a Bayesian model?

Why not model the derivative as f'(x) = p(x)^2 \ge 0, where p(x) is a polynomial? Then f(x) = \int^x p(x)^2 dx \ge 0 is the integral of a polynomial, which you can obtain easily as another polynomial (e.g. ApproxFun can do this) that can be evaluated quickly? (Logarithms seem harder to work with than squared polynomials.)

1 Like

Yeah, squared polynomials do seem easier, and that’s an interesting tactic I will try in this problem.

Unfortunately the monotonicity and non-negativeness in this example doesn’t really get at my underlying question, which is how to best do numerical integration in a differentiable manner inside a Turing model because in the general case I want to express probabilistic constraints like the one I mentioned:

I should mention, this is an educational example for the purpose of teaching how to use path-integration to provide a prior over functions. So it’s not like I need to solve this particular problem for an application or something. Still, the problem of positivity as a constraint comes up a lot, so I maybe will incorporate the p(x)^2 idea in the educational material.

Hmm… thinking about it, I’m guessing if I’m working with ApproxFun anyway, I could do almost all the integration using ApproxFun, which is going to be faster than numerical integration anyway. Let me do some experiments and I’ll report back!

Ok, that was fit with:

    coefs ~ MvNormal(zeros(6),(2.0)^2 .*I(6))
    sig ~ Gamma(10,1.0/9)
    sqtdf = Fun(Chebyshev(Interval(0.5,2.5)),coefs)
    df = sqtdf * sqtdf
    f = Integral()*df
    f = f - f(.5)

That first addlogprob! is redundant now.

And the problem doesn’t require any path integration… :frowning: so now I have to come up with another example problem.

I don’t know about what effect the Bayesian model has on integration, but B-splines are just piecewise polynomials, so I assume that those integrals are easily available in closed form with a little scratch pad work. BSplineKit.jl has an integral function, for example, although it seems plenty doable to just write one’s own.

The f'(x) = p(x)^2 approach also seems perfectly fine if you’re okay with the derivative actually touching (and bouncing off of) zero in the way that p(x)^2 will for most coefficients. Just floating another option.

Yes, though from a computing perspective few people write out the expressions and hand-program them, the usual method to fit splines is to have some computer function to evaluate the spline right?


If there’s a computer function to construct the integral, that’s great though.

I haven’t actually followed this code to the end, but here is a link to the integral function in BSplineKit.jl. Since the standard scaling of a third order B-spline is a piecewise polynomial supported on [0,3], for example, I assume this code eventually works down to evaluating that integral with a change of variable formula or something.

Anyways, just food for thought there. @stevengj’s f'(x) = p(x)^2 suggestion would probably also work well. If you end up playing around and trying both and one works meaningfully better or worse than the other, I’d be curious to know!

In this case see above, the fit was accomplished extremely easily using ApproxFun and chebyshev polynomials… In fact, so easily that now for my real problem I need to invent a new problem where path integration is actually required :rofl:

Here’s a fit with an additional constraint on the function in the vicinity of x=1.0

1 Like