Using nlsolve inside Turing.jl- ForwardDiff.Dual type error

Hi, I’m trying to assess if Turing is a good fit for a particular dynamic pricing-type problem I have,
and I’m wondering if it’s possible to use nlsolve inside. I’m sure it is, but I can’t figure it out.
The code bellow returns a ` ERROR: TypeError: in typeassert, expected Float64, got a value of type ForwardDiff.Dual{Nothing, Float64, 3}`
from inside the nlsolve call.

I found the following example, but frankly didn’t understand it.
Also, I tried to add the @adjoint definition from this issue,
but don’t think that’s the problem.
I did notice that this question returns the exact same error type, but it’s not obvious to me how to fix it (perhaps by moving the inside nlsolve as a separate function and adding some type annotations?).
Here’s the attempt:

``````using Random
using Zygote
using NLsolve
using ForwardDiff
using Distributions
using StatsPlots
using Turing
Random.seed!(42);
# dummy supply function
function supplyFunc(p,baseSupply)
if p>-1 baseSupply+(p+1)^2 else 0 end
end
# dummy demand function
function demandFunc(p,baseDemand)
if p<20 baseDemand-(p+1)^2 else 0
end
end
# estimate supply and demand from prices, single datapoint to start with
@model supplyDemandFromPrices(price)=begin

sigma ~ truncated(Normal(0, 100), 0, Inf)
supply ~ truncated(Normal(0, 100), 0, Inf)
demand ~ truncated(Normal(0, 100), 0, Inf)
# The offending call- try to estimate clearing price
clearingPrice=nlsolve(x->[supplyFunc(x[1],supply)-demandFunc(x[1],demand)],zeros(1),method=:anderson, m=10,autodiff=:forward).zero[1]
#  assume observed price is normally distributed around the 'market' price
price~Normal(clearingPrice,sigma)
end;
modelP = supplyDemandFromPrices(6.0)
chainP = sample(modelP, NUTS(0.65), 100)# throws an error

``````

Not related but I think that Turing’s `TruncatedNormal(...)` might be faster than `truncated(Normal(...)...)`.

1 Like

There are a couple of issues with the above:

1. When calling `nlsolve` you provide `zeros(1)` as the initial point. This will create a `Vector{Float64}` which `nlsolve` will then re-use. `nlsolve` will also likely use this to infer the `eltype` of whatever they do internally because type-information means it will be faster. Then when you call `sample` with `NUTS` Turing will try to compute the gradient of the model. There are different approaches in Julia and ForwardDiff.jl is one of them. ForwarDiff.jl uses a special `Dual` number to perform keep track of the derivatives wrt. each input, hence when when we try to differentiate the model wrt. its parameters `supply` and `demand` will be `Dual{..., Float64, ...}` rather than simply a `Float64`. This in turn means that you will eventually call `nlsolve` giving it an init value of type `Vector{Float64}` but the output type will be of type `Dual` since `supply` and `demand` are of this type which is not what `nlsolve` expects. Hence errors. Solution: use `zeros(eltype(supply, 1), 1)` to construct an initial value with the correct element type.
2. It doesn’t seem like `nlsolve` supports ForwardDiff.jl-differentiation through it, and so even if we address (1) we’ll still run into errors. This is an issue for NLSolve.jl though, not Turing.jl.

With that being said, there’s a way to address (2) using reverse-mode AD: Reverse differentiation through nlsolve · Issue #205 · JuliaNLSolvers/NLsolve.jl · GitHub. It’s easy to define adjoints (rules for computing gradients for reverse-mode) using ChainRulesCore.jl, and this is used by Zygote.jl, i.e. rules defined in ChainRulesCore.jl propagate to Zygote.jl. This you can also do for other reverse-mode AD-backends, so you’d have to look to the specific one you want to use. I’m going to use Zygote.jl though as it’s the go-to for reverse-mode these days.

``````module TuringTest

using Random
using Zygote
using NLsolve
using ForwardDiff
using Distributions
using Turing

# New dependenciens.
using ChainRulesCore
using IterativeSolvers
using LinearMaps

Random.seed!(42);

# https://github.com/JuliaNLSolvers/NLsolve.jl/issues/205#issuecomment-865856826
# Required to make it possible to differentiate through `nlsolve`.
function ChainRulesCore.rrule(config::RuleConfig{>:HasReverseMode}, ::typeof(nlsolve), f, x0; kwargs...)
result = nlsolve(f, x0; kwargs...)
function nlsolve_pullback(Δresult)
Δx = Δresult[].zero
x = result.zero
_, f_pullback = rrule_via_ad(config, f, x)
JT(v) = f_pullback(v)[2] # w.r.t. x
# solve JT*Δfx = -Δx
L = LinearMap(JT, length(x0))
Δfx = gmres(L, -Δx)
∂f = f_pullback(Δfx)[1] # w.r.t. f itself (implicitly closed-over variables)
return (NoTangent(), ∂f, ZeroTangent())
end
return result, nlsolve_pullback
end

# Switch to Zygote.jl because this supports differentiation through `nlsolve`
# thanks to the `rrule` we defined above.
Turing.setadbackend(:zygote)

# dummy supply function
supplyFunc(p,baseSupply) = p>-1 ? baseSupply+(p+1)^2 : 0

# dummy demand function
demandFunc(p,baseDemand) = p<20 ? baseDemand-(p+1)^2 : 0

# estimate supply and demand from prices, single datapoint to start with
@model supplyDemandFromPrices(price)=begin
sigma ~ truncated(Normal(0, 100), 0, Inf)
supply ~ truncated(Normal(0, 100), 0, Inf)
demand ~ truncated(Normal(0, 100), 0, Inf)
# The offending call- try to estimate clearing price
clearingPrice = nlsolve(
x->[supplyFunc(x[1],supply)-demandFunc(x[1],demand)],
zeros(1),
method=:anderson,
m=10,
autodiff=:forward
).zero[1]
#  assume observed price is normally distributed around the 'market' price
price ~ Normal(clearingPrice,sigma)

# Useful for debugging.
return (; sigma, supply, demand, clearingPrice, price)
end;

model = supplyDemandFromPrices(6.0)

# Execute the model once without any differentiation just to make
# sure it works.
sigma, supply, demand, clearingPrice, price = model()

# Now we sample.
chains = sample(model, NUTS(), 1000)
``````

Now, this make take a while to sample from because

1. `nlsolve` and the computations required for its adjoint aren’t necessarily cheap. Of course not prohibitively expensive, but we’re going to have to call them a lot. `NUTS` has a max tree depth setting which effectively specifices the maximum number of steps (and thus gradient-calls) we’ll do. By default this is 10 which in turn means that we’re never going to take more than 2^10 = 1024 steps per iteration. This is a lot of steps per iteration though.
2. We’re using an iterative solver for a non-linear problem and so the answer is not exact. Inexact gradients will confuse the sampler, likely leading to saturation of the max tree depth, i.e. we’ll probably be taking as many steps as allowed (1024, as mentioned above) on every iteration.
1 Like