Turing.jl for geophysical applications

Hello everyone.

I am writing this post with the hope that It helps people with background in physics, and want to learn about probabilistic programming using Turing.jl. My background is geophysics, and I only have a couple of weeks trying to learn how to use Turing.jl. I’ll share my advance so far, with the hope that the people with more experience can leave comments (performance tips, conceptual mistakes, etc), and the geophysics community can use as a guide to start probabilistic programming with Julia, and learn from my experience so far. Unfortunately, there is little information online about using Turing.jl fro geophysical applications so I’ve been more or less guessing the last couple of weeks.

I am solving a 1D-MT problem: I have a profile of resistivities in the subsurface (a vector), and a forward operator MT_forward that predicts field observations from this model, in the classical notation:

data= MT_forward(model).

The probabilistic model I built is the following:

@model function mt_inference(data,n_layers,λ)
    #Prior distribution for the possible values of resistivities
    rho ~ filldist(Uniform(0,2000),n_layers)
    #Calculate the expected value (using the MT forward model)
    exp_data = MT_forward(rho,n_layers)
   
    #Likelihood: as a function of the L2 norm of the model, and the L2 norm of the misfit
    l2_norm_misfit = sqrt(sum((data - exp_data).^2))   
    l2_norm_model = sqrt(sum(rho).^2)
    J= (l2_norm_misfit) .+ (λ*l2_norm_model)
    for i in eachindex(rhoref)
        data[i] ~ Normal(J,10)
    end
end;

Then this model is sampled using:

chain = sample(model, Turing.NUTS(), 100);

As as mentioned before, I just have a couple of weeks using Turing.jl, and a couple of months researching Bayesian inference, so I might have conceptual errors, but my interpretation of the above model is:

I have an uninformative prior on the possible resistivity values that my layers can get (an uniform distribution from 0 to 2000 ohm.m). This uniform distribution is repeated for the n-layers of my model.

As I understand the forward model can take any form of valid julia code (some comments about this later).

The likelihood will be maximum, for those models that minimize the misfit with the observed data, and that have minimum norm (with a regularization term λ).

What I’ve found so far:

  • NUTS is the best sampler (so far): I tried with MH, SMC, HMC, HMCDA, PG, and the only sampler that gets close to the expected results is NUTS. I actually think that there is some problem with my model, because I have no other explanation for the poor results that I get with all the other samplers (There are several cases in the literature of successful MT probabilistic inversions using MH).

  • The relationship between the number of samples and the computer time is not linear: I’ve been drawing 100 samples from my chains, which I get in around 80s. However increasing the amount of samples to 300, makes the running time extremely large.

  • Changing the AD backend: I changed the AD backend to :reversediff and it started giving problems with my forward operator. The code would crash in basic functions such as sqrt (which I had to change to .^0.5) and the hyperbolic tangent (which I had to redefine using exponentials). After making these adjustments in my forward operator, I could not spot any major difference when compared with :forwarddiff.

  • Tricks with the multivariate distributions in priors and likelihood: I’ve found mixed comments about how to define the multivariate distributions (either with a loop, filldist, lazyarrays, etc) and I still can’t decide what is the best. In terms of results apparently I am getting similar results, however using:

data ~ arraydist(LazyArray(@~Normal.(J,10)))

fails, which I wasn’t expecting because I’ve seen it being used in other models, although with Bernoulli distributions.

So far the code is running, and I am getting the expected results. However, I find the code quite slow (the fastest result is 80 seconds, inferring only 40 values). I know that It is probably my fault, so I’m open to run test suggestions, run diagnostics, and test any change that any of you guys may suggest.

Thanks!

Have you tried, as Turing.jl docs suggests:

When using ReverseDiff , to compile the tape only once and cache it for later use, the user has to call Turing.setrdcache(true) .

A much better and optimized approach is what I do in these Turing.jl tutorials:

y ~ MvNormal(J, 10^2 * I)
2 Likes

Thanks for your comments!!

I replaced the for loop by :

 a=fill(J,length(rhoref))
 rhoref ~MvNormal(J,100*I)

and I went from this:

 85.706096 seconds (576.86 M allocations: 117.183 GiB, 20.49% gc time, 12.25% compilation time: 100% of which was recompilation)

To this:

 48.475718 seconds (367.69 M allocations: 73.926 GiB, 17.72% gc time, 21.26% compilation time: 100% of which was recompilation)

Which is 70% faster! Thank you so much for the tip!

On top of that improvement I tried again changing the AD backend, adding:

Turing.setadbackend(:reversediff)
Turing.setrdcache(true)

before sampling. Initially I got this message:

 Warning: failed to find valid initial parameters in 10 tries; consider providing explicit initial parameters using the `init_params` keyword
└ @ Turing.Inference ~/.julia/packages/Turing/UCuzt/src/mcmc/hmc.jl:172

And the sampling never starts.

So I tried to add an initial model in the following way:

chain = sample(model, Turing.NUTS(), it, init_params= rho_init);

where rho_init its just a vector with constant numbers. I get the following warning several times:

 Warning: Incorrect ϵ = NaN; ϵ_previous = 0.07500000000000001 is used instead.
└ @ AdvancedHMC.Adaptation ~/.julia/packages/AdvancedHMC/dgxuI/src/adaptation/stepsize.jl:131

And my final result is basically that initial model that I gave in init_params.

In conclusion, so far the MvNormal distribution helped a lot, but changing the AD back end did not. Thanks.

By the way, thank you so much for all you tutorials and YouTube videos!

1 Like

Thanks! Glad you liked it.

ReverseDiff with the tape compiled in cache set to true can be annoying sometimes with the functions and operations.
Try to find other discussions on ReverseDiff and Turing here and see if you need to tweak any function or computation in your model.

Hi @Storopoli

I’m having the exact same issue as above with ReverseDiff, with a very different model (I’m implementing the PVL-Delta model in Turing). Specifically, if I don’t use Optim to estimate the parameters, it has the Warning: failed to find valid initial parameters in 10 tries message and hangs indefinitely (I tried for a day), if I do use Optim (which successfully converges) I get the Warning: Incorrect ϵ = NaN; ϵ_previous = ... message and then the sampling finishes but way too quickly.

Try to find other discussions on ReverseDiff and Turing here and see if you need to tweak any function or computation in your model.

As @Alejandro_Quiaro_San hasn’t returned since, I’m hoping they found a successful solution, but perhaps you could point me in the right direction for some search terms? This thread and a couple of github issues are the only ones that mention that Incorrect ϵ = NaN warning, and as far as I can tell my model isn’t doing anything ridiculous, and I’m losing my mind refactoring things over and over with no improvement.

Note: The sampler hanging (with no initial_params happens with both ReverseDiff and ForwardDiff, but ForwardDiff is so much slower due to the number of parameters I suppose)

Thanks :slight_smile:

I don’t know exactly what’s up in these scenarios, but my immediate worry in the original post is the line

exp_data = MT_forward(rho,n_layers)

and how this combines with ReverseDiff.jl in compiled mode.

ReverseDiff.jl in compiled mode cannot handle conditionals such as if statements, and so if you have something like that in your functional, you’ll have a bad time.

To maybe make this entire process a bit simpler, you might want to look at TuringBenchmarking.jl with the kwarg check=true (this will compare the gradient computations between the different AD backends before benchmarking).