Slow Turing.jl sampling compared to python pymc

mahe · June 26, 2025, 11:13am

I have a spatial model on a (currently) 5x5 grid, for two variables (total size 3737). And 483 observations for one variables. In this simple setting, I want to work in full space (no space reduction), and model my system as a multivariate normal distribution (I then want to validate the output against an ensemble Kalman Filter before making the model more complex). Here is a minimalist, working example of my Turing.jl model:

using Turing
using Distributions
using LinearAlgebra

n = 3737
N = 50
nobs = 483

ensemble = reshape(rand(filldist(Normal(0, 1), n*N)), n, N)
mean_ = mean(ensemble, dims=2)[:, 1]
cov_ = cov(ensemble, dims=2)
L = cholesky(cov_ + Diagonal(1e-6 * ones(n))).L
sst = randn(nobs)
sst_err = abs.(randn(nobs)*0.1) .+ 0.1
idx = collect(1:nobs)

@model function simplemodel(sst)
    random_scales ~ filldist(Normal(0, 1), n)
    full_field = mean_ .+ L * random_scales
    sst_field = full_field[idx]
    sst ~ MvNormal(sst_field, sst_err)
end

chain = sample(simplemodel(sst), NUTS(), 1000)

Unfortunately, it takes forever without even starting. On the other hand, an equivalent pymc implementation takes about 9 min to complete (with real data). And 20 min via pycall. Here the pymc model called via pycall:

function run_pymc_model(mean, chol, idx, sst_err, observed_sst)
    py"""
    import pymc as pm
    import numpy as np

    def build_and_run_model(mean, chol, idx, sst_err, observed_sst):
        # Define the PyMC model
        with pm.Model() as model:
            random_scales = pm.Normal('random_scales', mu=0, sigma=1, shape=mean.shape[0])
            full_field = mean + chol @ random_scales
            sst_observable = full_field[idx]
            pm.Normal('sst', mu=sst_observable, sigma=sst_err, observed=observed_sst)
            trace = pm.sample()
        return trace
    """

    build_and_run_model = pyimport("__main__").build_and_run_model

    trace = build_and_run_model(
        mean,
        chol,
        idx .- 1,  # Adjust for Python's 0-based indexing
        sst_err,
        observed_sst)

    return trace
end

trace  = run_pymc_model(mean_, L, idx, sst_err, sst)

Calling python from julia for performance seems to defy the purpose of using julia in the first place. Hence my question: is there a better way of implementing my julia model to match pymc performance?

juliohm · June 26, 2025, 11:16am

Hi @mahe ,

If you are simply interested in geospatial Gaussian processes, take a look into GeoStats.jl. It has efficient simulation methods for grids with millions of cells:

mahe · June 26, 2025, 11:34am

Thanks @juliohm I’ll take a look. I’m still interested in solving this though.

garrett · June 26, 2025, 11:35am

My first idea would be to pass the global variables (mean_, L, …) as arguments to simplemodel. That should provide some speedup.. There might be some other optimizations you could do with Turing, but that would be the first thing I would try.

mahe · June 26, 2025, 11:37am

Thanks @garrett , I’m now trying

@model function simplemodel(sst::Vector{Float64}, mean_::Vector{Float64}, L::LowerTriangular{Float64, Matrix{Float64}}, idx::Vector{Int64})
    n = length(mean_)
    random_scales ~ filldist(Normal(0, 1), n)
    full_field = mean_ .+ L * random_scales
    sst_field = full_field[idx]
    sst ~ MvNormal(sst_field, sst_err)
end

chain = sample(simplemodel(sst, mean_, L, idx), NUTS(), 1000)

let’s see…

mahe · June 26, 2025, 12:11pm

I should have started with Turing Performance Tips.
I am now using NUTS(adtype=AutoReverseDiff(true)) for the sampling, and define the model types precisely (the latter, as suggested by @garrett, was not enough to get the sampling started):

import ReverseDiff

@model function simplemodel(sst::Vector{Float64}, mean_::Vector{Float64}, L::LowerTriangular{Float64, Matrix{Float64}}, idx::Vector{Int64})
    n = length(mean_)
    random_scales ~ filldist(Normal(0, 1), n)
    full_field = mean_ .+ L * random_scales
    sst_field = full_field[idx]
    sst ~ MvNormal(sst_field, sst_err)
end

chain = sample(simplemodel(sst, mean_, L, idx), NUTS(adtype=AutoReverseDiff(true)), 1000)

finally kicks start the sampling. The toy example takes 4 min. I’ll try it on my real problem, but if that holds that would be faster than python.

EDIT: that works. 3 min 17 sec for one chain. Perfect. NOTE: comparison with pymc is beyond the point (for that I’d need a proper linkage to the BLAS library, or use other, faster samplers like jax etc).

EDIT2: surprisingly, sampling the prior is now causing me trouble:
prior_chain = sample(simplemodel(sst, ...), Prior(), 1000, progress=true) takes a fraction of seconds to complete the sampling, but full 6 mins to return a result.

EDIT3: for the prior sampling issue, I ended up adding return full_field to my model, and I now do priormodel = simplemodel(missing, ...); prior_sst_samples = hcat([priormodel() for _ in 1:1000]...) instead of sampling via sample(...). That takes 3 seconds, instead of 6 min with sample.

EDIT4: the Prior sampling issue was mostly solved in 0.39.3 (Prior() sampling takes longer than NUTS() · Issue #2604 · TuringLang/Turing.jl · GitHub)

garrett · June 26, 2025, 2:57pm

Kind of strange with the prior sampling, but I’m glad to hear the posterior sampling is working better!

mahe · June 26, 2025, 2:59pm

I wrote an issue with full reproductible example…

github.com/TuringLang/Turing.jl

Prior() sampling takes longer than NUTS()

opened 02:12PM - 26 Jun 25 UTC

perrette

bug

### Minimal working example ```julia using Turing using Distributions using Lin…earAlgebra import ReverseDiff n = 3737 N = 50 nobs = 483 ensemble = reshape(rand(filldist(Normal(0, 1), n*N)), n, N) mean_ = mean(ensemble, dims=2)[:, 1] cov_ = cov(ensemble, dims=2) L = cholesky(cov_ + Diagonal(1e-6 * ones(n))).L sst = randn(nobs) sst_err = abs.(randn(nobs)*0.1) .+ 0.1 idx = collect(1:nobs) @model function simplemodel(sst::Union{Missing,Vector{Float64}}, mean_::Vector{Float64}, L::LowerTriangular{Float64, Matrix{Float64}}, idx::Vector{Int64}) n = length(mean_) random_scales ~ filldist(Normal(0, 1), n) full_field = mean_ .+ L * random_scales sst_field = full_field[idx] sst ~ MvNormal(sst_field, sst_err) return full_field end chain = sample(simplemodel(sst, mean_, L, idx), NUTS(adtype=AutoReverseDiff(true)), 1000) prior_chain = sample(simplemodel(missing, mean_, L, idx), Prior(), 1000) ``` ### Description The first chain with NUTS takes 3-4 min to sample and return the chain on my machine. The prior chain takes of the order of 3 seconds to sample according to the progress bar, but the `prior_chain` returns after **6 min**. By contrast, the following code takes 2 sec to run: ```julia priormodel = simplemodel(missing) prior_samples = hcat([priormodel() for _ in 1:1000]...) ``` ### Julia version info <details><summary>versioninfo()</summary> ``` Julia Version 1.11.5 Commit 760b2e5b739 (2025-04-14 06:53 UTC) Build Info: Official https://julialang.org/ release Platform Info: OS: Linux (x86_64-linux-gnu) CPU: 14 × Intel(R) Core(TM) Ultra 5 125U WORD_SIZE: 64 LLVM: libLLVM-16.0.6 (ORCJIT, alderlake) Threads: 4 default, 0 interactive, 2 GC (on 14 virtual cores) Environment: JULIA_NUM_THREADS = 4 ``` </details> ### Manifest <details><summary>]st --manifest</summary> ``` [47edcb42] ADTypes v1.15.0 [621f4979] AbstractFFTs v1.5.0 [80f14c24] AbstractMCMC v5.6.3 [7a57a42e] AbstractPPL v0.11.0 [1520ce14] AbstractTrees v0.4.5 [7d9f7c33] Accessors v0.1.42 [79e6a3ab] Adapt v4.3.0 [0bf59076] AdvancedHMC v0.8.0 [5b7e9947] AdvancedMH v0.8.8 ⌅ [576499cb] AdvancedPS v0.6.2 [b5ca4192] AdvancedVI v0.4.1 [66dad0bd] AliasTables v1.1.3 [dce04be8] ArgCheck v2.5.0 [4fba245c] ArrayInterface v7.19.0 [13072b0f] AxisAlgorithms v1.1.0 [39de3d68] AxisArrays v0.4.7 [198e06fe] BangBang v0.4.4 [9718e550] Baselet v0.1.1 [76274a88] Bijectors v0.15.7 [082447d4] ChainRules v1.72.5 [d360d2e6] ChainRulesCore v1.25.2 [0ca39b1e] Chairmarks v1.3.1 [9e997f8a] ChangesOfVariables v0.1.10 [861a8166] Combinatorics v1.0.3 [38540f10] CommonSolve v0.2.4 [bbf7d656] CommonSubexpressions v0.3.1 [34da2185] Compat v4.16.0 [a33af91c] CompositionsBase v0.1.2 [88cd18e8] ConsoleProgressMonitor v0.1.2 [187b0558] ConstructionBase v1.6.0 [a8cc5b0e] Crayons v4.1.1 [9a962f9c] DataAPI v1.16.0 [864edb3b] DataStructures v0.18.22 [e2d170a0] DataValueInterfaces v1.0.0 [244e2a9f] DefineSingletons v0.1.2 [8bb1440f] DelimitedFiles v1.9.1 [b429d917] DensityInterface v0.4.0 [163ba53b] DiffResults v1.1.0 [b552c78f] DiffRules v1.15.1 [a0c0ee7d] DifferentiationInterface v0.7.1 [31c24e10] Distributions v0.25.120 [ced4e74d] DistributionsAD v0.6.58 [ffbed154] DocStringExtensions v0.9.5 [366bfd00] DynamicPPL v0.36.12 [cad2338a] EllipticalSliceSampling v2.0.0 [4e289a0a] EnumX v1.0.5 [e2ba6199] ExprTools v0.1.10 [55351af7] ExproniconLite v0.10.14 [7a1cc6ca] FFTW v1.9.0 [9aa1b823] FastClosures v0.3.2 [1a297f60] FillArrays v1.13.0 [6a86dc24] FiniteDiff v2.27.0 [f6369f11] ForwardDiff v1.0.1 [069b7b12] FunctionWrappers v1.1.3 [77dc65aa] FunctionWrappersWrappers v0.1.3 [d9f16b24] Functors v0.5.2 [46192b85] GPUArraysCore v0.2.0 [34004b35] HypergeometricFunctions v0.3.28 [22cec73e] InitialValues v0.3.1 ⌅ [a98d9a8b] Interpolations v0.15.1 [8197267c] IntervalSets v0.7.11 [3587e190] InverseFunctions v0.1.17 [41ab1584] InvertedIndices v1.3.1 [92d709cd] IrrationalConstants v0.2.4 [c8e1da08] IterTools v1.10.0 [82899510] IteratorInterfaceExtensions v1.0.0 [692b3bcd] JLLWrappers v1.7.0 [682c06a0] JSON v0.21.4 [ae98c720] Jieko v0.2.1 [5ab0869b] KernelDensity v0.6.9 [5be7bae1] LBFGSB v0.4.1 [8ac3fa9e] LRUCache v1.6.2 [b964fa9f] LaTeXStrings v1.4.0 [1d6d02ad] LeftChildRightSiblingTrees v0.2.0 ⌅ [6f1fad26] Libtask v0.8.8 [d3d80556] LineSearches v7.4.0 [6fdf6af0] LogDensityProblems v2.1.2 [996a588d] LogDensityProblemsAD v1.13.1 [2ab3a3ac] LogExpFunctions v0.3.29 [e6f89c97] LoggingExtras v1.1.0 [c7f686f2] MCMCChains v7.1.0 [be115224] MCMCDiagnosticTools v0.3.14 [e80e1ace] MLJModelInterface v1.11.1 [1914dd2f] MacroTools v0.5.16 [dbb5928d] MappedArrays v0.4.2 [128add7d] MicroCollections v0.2.0 [e1d29d7a] Missings v1.2.0 [2e0e35c7] Moshi v0.3.5 [d41bc354] NLSolversBase v7.10.0 [77ba4419] NaNMath v1.1.3 [86f7a689] NamedArrays v0.10.4 [c020b1a1] NaturalSort v1.0.0 [6fe1bfb0] OffsetArrays v1.17.0 [429524aa] Optim v1.13.2 [3bd65402] Optimisers v0.4.6 [7f7a1694] Optimization v4.4.0 [bca83a33] OptimizationBase v2.8.0 [36348300] OptimizationOptimJL v0.4.3 [bac558e1] OrderedCollections v1.8.1 [90014a1f] PDMats v0.11.35 [d96e819e] Parameters v0.12.3 [69de0a69] Parsers v2.8.3 [85a6dd25] PositiveFactorizations v0.2.4 ⌅ [aea7be01] PrecompileTools v1.2.1 [21216c6a] Preferences v1.4.3 [08abe8d2] PrettyTables v2.4.0 [33c8b6b6] ProgressLogging v0.1.4 [92933f4c] ProgressMeter v1.10.4 [43287f4e] PtrArrays v1.3.0 [1fd47b50] QuadGK v2.11.2 [74087812] Random123 v1.7.1 [e6cf234a] RandomNumbers v1.6.0 [b3c3ace0] RangeArrays v0.3.2 [c84ed2f1] Ratios v0.4.5 [c1ae055f] RealDot v0.1.0 [3cdcf5f2] RecipesBase v1.3.4 [731186ca] RecursiveArrayTools v3.33.0 [189a3867] Reexport v1.2.2 [ae029012] Requires v1.3.1 [37e2e3b7] ReverseDiff v1.16.1 [79098fc4] Rmath v0.8.0 [f2b01f46] Roots v2.2.8 [7e49a35a] RuntimeGeneratedFunctions v0.5.15 [26aad666] SSMProblems v0.5.0 [0bca4576] SciMLBase v2.102.1 [c0aeaf25] SciMLOperators v1.3.1 [53ae85a6] SciMLStructures v1.7.0 [30f210dd] ScientificTypesBase v3.0.0 [efcf1570] Setfield v1.1.2 [a2af1166] SortingAlgorithms v1.2.1 [9f842d2f] SparseConnectivityTracer v0.6.21 [dc90abb0] SparseInverseSubset v0.1.2 [0a514795] SparseMatrixColorings v0.4.21 [276daf66] SpecialFunctions v2.5.1 [171d559e] SplittablesBase v0.1.15 [90137ffa] StaticArrays v1.9.13 [1e83bf80] StaticArraysCore v1.4.3 [64bff920] StatisticalTraits v3.4.0 [10745b16] Statistics v1.11.1 [82ae8749] StatsAPI v1.7.1 [2913bbd2] StatsBase v0.34.5 [4c63d2b9] StatsFuns v1.5.0 [892a3eda] StringManipulation v0.4.1 [09ab397b] StructArrays v0.7.1 [2efcf032] SymbolicIndexingInterface v0.3.41 [3783bdb8] TableTraits v1.0.1 [bd369af6] Tables v1.12.1 [5d786b92] TerminalLoggers v0.1.7 [28d57a85] Transducers v0.4.84 [fce5fe82] Turing v0.39.1 [3a884ed6] UnPack v1.0.2 [efce3f68] WoodburyMatrices v1.0.0 [700de1a5] ZygoteRules v0.2.7 [f5851436] FFTW_jll v3.3.11+0 [1d5cc7b8] IntelOpenMP_jll v2025.0.4+0 [81d17ec3] L_BFGS_B_jll v3.0.1+0 [856f044c] MKL_jll v2025.0.1+1 [efe28fd5] OpenSpecFun_jll v0.5.6+0 [f50d1b31] Rmath_jll v0.5.1+0 [1317d2d5] oneTBB_jll v2022.0.0+0 [0dad84c5] ArgTools v1.1.2 [56f22d72] Artifacts v1.11.0 [2a0f44e3] Base64 v1.11.0 [ade2ca70] Dates v1.11.0 [8ba89e20] Distributed v1.11.0 [f43a241f] Downloads v1.6.0 [7b1f6079] FileWatching v1.11.0 [9fa8497b] Future v1.11.0 [b77e0a4c] InteractiveUtils v1.11.0 [4af54fe1] LazyArtifacts v1.11.0 [b27032c2] LibCURL v0.6.4 [76f85450] LibGit2 v1.11.0 [8f399da3] Libdl v1.11.0 [37e2e46d] LinearAlgebra v1.11.0 [56ddb016] Logging v1.11.0 [d6f4376e] Markdown v1.11.0 [a63ad114] Mmap v1.11.0 [ca575930] NetworkOptions v1.2.0 [44cfe95a] Pkg v1.11.0 [de0858da] Printf v1.11.0 [3fa0cd96] REPL v1.11.0 [9a3f8284] Random v1.11.0 [ea8e919c] SHA v0.7.0 [9e88b42a] Serialization v1.11.0 [1a1011a3] SharedArrays v1.11.0 [6462fe0b] Sockets v1.11.0 [2f01184e] SparseArrays v1.11.0 [f489334b] StyledStrings v1.11.0 [4607b0f0] SuiteSparse [fa267f1f] TOML v1.0.3 [a4e569a6] Tar v1.10.0 [8dfed614] Test v1.11.0 [cf7118a7] UUIDs v1.11.0 [4ec0a83e] Unicode v1.11.0 [e66e0078] CompilerSupportLibraries_jll v1.1.1+0 [deac9b47] LibCURL_jll v8.6.0+0 [e37daf67] LibGit2_jll v1.7.2+0 [29816b5a] LibSSH2_jll v1.11.0+1 [c8ffd9c3] MbedTLS_jll v2.28.6+0 [14a3606d] MozillaCACerts_jll v2023.12.12 [4536629a] OpenBLAS_jll v0.3.27+1 [05823500] OpenLibm_jll v0.8.5+0 [bea87d4a] SuiteSparse_jll v7.7.0+0 [83775a58] Zlib_jll v1.2.13+1 [8e850b90] libblastrampoline_jll v5.11.0+0 [8e850ede] nghttp2_jll v1.59.0+0 [3f19e933] p7zip_jll v17.4.0+2 Info Packages marked with ⌅ have new versions available but compatibility constraints restrict them from upgrading. To see why use `status --outdated -m` ``` </details>

Topic		Replies	Views
PyMC Models to Turing.jl Probabilistic programming question , turing , bayesian-inference	5	2052	December 11, 2020
Advice speeding up a turing model Probabilistic programming question , performance , turing	7	214	January 14, 2025
Comparison to numpyro Probabilistic programming	9	946	December 1, 2023
Performance issue with Turing (but not Optimization) Probabilistic programming turing	9	149	June 4, 2025
Slow MCMC sampling of complex polynomial model [Turing.jl] Probabilistic programming turing , mcmc	2	443	September 2, 2023

Slow Turing.jl sampling compared to python pymc

Related topics