Recommended way to extract logprob and build gradients in Turing.jl?

Hi everyone,

What is the recommended way to extract the log density and build gradients from a Turing model? (DynamicPPL)

Use cases: eg, feeding it to something like Pathfinder.jl (if it didn’t Turing API) or novel samplers like ZigZagBoomerang

I have gone through the Turing tutorials, docs, and Discourse - either there is no reference or the answers are outdated. I’m hoping for someone to say “This is the way!” :slight_smile:


This is the best I found but the trade-offs aren’t clear to me:

#1 From Discourse via Turing.gradient_logp – Removed

## from https://discourse.julialang.org/t/gradient-of-latent-space-variable-in-turing/60941/2?u=svilupp
∇logp(x) = Turing.gradient_logp(Turing.ZygoteAD(), x, Turing.VarInfo(cond_model), cond_model) # not working
# removed in Aug-22, see https://github.com/TuringLang/Turing.jl/pull/1877

#2 Ala TuringBenchmark.jl with LogDensityProblems

# Code Tor's benchmarking suite, from https://github.com/torfjelde/TuringBenchmarking.jl
ad_backend=ForwardDiffAD{40}()
vi_orig = DynamicPPL.VarInfo(cond_model)
spl = DynamicPPL.SampleFromPrior()
vi_current = DynamicPPL.VarInfo(DynamicPPL.VarInfo(cond_model), spl, vi_orig[spl])
# DynamicPPL.link!(vi, spl) # unclear when to use it?
f = LogDensityProblems.ADgradient(
    ad_backend,
    Turing.LogDensityFunction(vi_current, cond_model, spl, DynamicPPL.DefaultContext())
)
theta = vi_current[spl]
logp(x) = LogDensityProblems.logdensity_and_gradient(f, x)[1]
∇logp(x) = LogDensityProblems.logdensity_and_gradient(f, x)[2]
@info "LogDensityFunction way:" logp(theta) ∇logp(theta)

#3 via SciML Optimization Problem

# From Pathfinder.jl code base: https://github.com/mlcolab/Pathfinder.jl/blob/main/src/optimize.jl and https://github.com/mlcolab/Pathfinder.jl/blob/main/src/integration/turing.jl
# requires AbstractDifferentiation package
prob = Turing.optim_problem(cond_model, Turing.MAP(); constrained=false, init_theta=theta)
# prob.prob is the OptimProblem
ad_backend=AD.ForwardDiffBackend()
logp(x) = -prob.prob.f.f(x, nothing)
∇logp(x) = only(AD.gradient(ad_backend, logp, x))
@info "OptimizationProblem way:" logp(theta) ∇logp(theta)

Set up code

using Random
using Turing
using LazyArrays
using LogDensityProblems
using Turing.Essential: ForwardDiffAD
using AbstractDifferentiation # needed for opt #3

# Generate mock data
X=hcat(ones(200),randn(200,3))
betas=randn(4)
@assert length(betas)==size(X,2)
y=(X*betas) .+ randn(200)

# create a toy model and condition it
@model function linear_model(X)
    dim_X=size(X,2)
    betas ~ filldist(Normal(),dim_X)
    sigma ~ Exponential(1)
    y ~ arraydist(LazyArray(@~ Normal.(X*betas,sigma)))
end;
cond_model=linear_model(X) | (;y);
theta=ones(5)
1 Like

It depends on what you want to use it for.

If you’re sampling, then generally you want to Approach #2, with a call
to link! if you’re working with a model whose support is bounded,
e.g. s ~ InverseGamma(2, 3) has support only on (0, ∞), but the
sampler assumes support is the unbounded, e.g. HMC requires unbounded
support. Calling link! will ensure two properties:

  1. You’re now working in unbounded space.
  2. The computation of the logjoint now involves a correction such that you’re still targetting the original model.

In contrast, when you’re doing optimization, as in Approach #3, you don’t actually need Property (2), but only Property (1).

Does that help? And yes, Approach #1 is generally now discouraged.

1 Like

That’s a great explanation! I haven’t appreciated that nuance. Thank you!

1 Like