Errors in inference in upgraded Julia version (1.6.3 vs. 1.7.3)


I’ve been running a parallelized (4 chains) Bayesian inference code using the NUTS sampler on Julia v1.6.3 (Cent OS - HPC cluster) and was getting results as expected and without any warning during inference. I’ve started using Julia v1.7.3 on a different machine (Cent OS - HPC cluster), and when I run the same code, one or two chains stay stuck with the initial sample value.

The plot below shows the histogram of one of the parameters. The green chain is stuck with the initial sample, while the posterior from the other three chains perfectly match the v1.6.3 results.

Weirdly, I do not even get any warnings while the inference code runs. Following are the versions of the packages:

Julia v1.6.3

(@v1.6) pkg> status
      Status `~/.julia/environments/v1.6/Project.toml`
  [5b7e9947] AdvancedMH v0.6.7
  [6e4b80f9] BenchmarkTools v1.3.1
  [336ed68f] CSV v0.10.4
  [a93c6f00] DataFrames v1.3.3
  [0c46a032] DifferentialEquations v7.1.0
  [31c24e10] Distributions v0.25.53
  [ced4e74d] DistributionsAD v0.6.38
  [61744808] DynamicalSystems v2.3.0
  [5789e2e9] FileIO v1.13.0
  [f6369f11] ForwardDiff v0.10.27
  [28b8d3ca] GR v0.64.2
  [a98d9a8b] Interpolations v0.13.6
  [033835bb] JLD2 v0.4.22
  [c7f686f2] MCMCChains v5.1.1
  [ce233488] MCMCTempering v0.1.1
  [6fafb56a] Memoization v0.1.14
  [961ee093] ModelingToolkit v8.8.0
  [429524aa] Optim v1.6.2
  [f0f68f2c] PlotlyJS v0.18.8
  [91a5bcdd] Plots v1.27.6
  [21216c6a] Preferences v1.3.0
  [37e2e3b7] ReverseDiff v1.13.0
  [cf7bdac0] SIAN v1.1.7
  [90137ffa] StaticArrays v1.4.4
  [2913bbd2] StatsBase v0.33.16
  [f3b207a7] StatsPlots v0.14.33
  [220ca800] StructuralIdentifiability v0.3.9
  [fce5fe82] Turing v0.20.4
  [fdbf4ff8] XLSX v0.7.9
  [8ba89e20] Distributed
  [37e2e46d] LinearAlgebra
  [9a3f8284] Random

Julia v1.7.3

(@v1.7) pkg> status
      Status `~/.julia/environments/v1.7/Project.toml`
  [5b7e9947] AdvancedMH v0.6.7
  [6e4b80f9] BenchmarkTools v1.3.1
  [336ed68f] CSV v0.10.4
  [a93c6f00] DataFrames v1.3.4
  [0c46a032] DifferentialEquations v7.1.0
  [31c24e10] Distributions v0.25.60
  [ced4e74d] DistributionsAD v0.6.40
  [61744808] DynamicalSystems v2.3.0
  [5789e2e9] FileIO v1.14.0
  [f6369f11] ForwardDiff v0.10.30
  [28b8d3ca] GR v0.64.3
  [a98d9a8b] Interpolations v0.13.6
  [c7f686f2] MCMCChains v5.3.0
  [6fafb56a] Memoization v0.1.14
  [961ee093] ModelingToolkit v8.11.0
  [429524aa] Optim v1.7.0
  [f0f68f2c] PlotlyJS v0.18.8
  [91a5bcdd] Plots v1.29.0
  [21216c6a] Preferences v1.3.0
  [90137ffa] StaticArrays v1.4.4
  [2913bbd2] StatsBase v0.33.16
  [f3b207a7] StatsPlots v0.14.34
  [220ca800] StructuralIdentifiability v0.3.9
  [fce5fe82] Turing v0.21.2
  [fdbf4ff8] XLSX v0.7.10
  [8ba89e20] Distributed
  [37e2e46d] LinearAlgebra
  [9a3f8284] Random

Can someone please help me out? I really don’t know what the problem is as no warnings are cropping up, let alone errors. :neutral_face:

Not that I necessarily can help, but it is usually helpful to post a MWE.
I assume that’ll help you get some more replies and answers.

Sorry for the dumb question, but what is an MWE? I really have no idea… :sweat_smile:

Minimum Working Example - see

Thank you! Here is an MWE and an explanation:

Section 1: Importing packages and required files

using Distributed
using Turing
num_chains = 4;

@everywhere using DifferentialEquations, Interpolations, XLSX, DataFrames, StatsPlots
@everywhere using Distributions, DistributionsAD, MCMCChains, Turing
@everywhere using LinearAlgebra, Random
@everywhere Random.seed!(18431);
@everywhere using ForwardDiff, Preferences
@everywhere Turing.setadbackend(:forwarddiff)
@everywhere set_preferences!(ForwardDiff, "nansafe_mode" => true)

@everywhere include("multi_variant_model.jl")
# File containing the ODE model
@everywhere include("init_conds_functions.jl")
# File containing the initial conditions vector
@everywhere include("inference_data_modifier_functions.jl")
# File containing function required to modify the ODE solution
# as per the requirement of inference code

@everywhere tot_weeks = 20          # Number of time points
@everywhere N = 1000000             # Total population
@everywhere V = 1                   # Number of mutants
@everywhere tspan = (1, tot_weeks)
@everywhere tspan = Float64.(tspan) # Time period for simulation of ODEs

# Reading the observation data
@everywhere truth_data = DataFrame(XLSX.readtable("trueData.xlsx","Sheet1"; infer_eltypes = true)...)

Section 2: Define the observation model

@everywhere @model function truth_data_fitting!(data, ODEtspan, num_mutants, tot_pop)

    # Priors of the parameters - Structural model
    β₁ ~ LogNormal(log(2.8), 0.5)
    α₁ ~ LogNormal(log(7/6.5), 0.5)
    θ₁ ~ LogNormal(log(7/3.2), 0.5)
    γᵢ₁ ~ LogNormal(log(7/10), 0.5)
    γₙ₁ ~ LogNormal(log(7/8), 0.5)
    ϕᵣ₁ ~ Beta(5.0, 1.5)
    ϕᵢ₁ ~ Beta(1.5, 5.0)
    ϕₖ₁ ~ Beta(1.5, 5.0)

    # Priors of parameters - Observation model
    σ₁ ~ Gamma(1.0, 5.0) # Infected - 1ˢᵗ variant
    σ₃ ~ Gamma(1.0, 5.0) # Total hospitalizations
    σ₄ ~ Gamma(1.0, 5.0) # Critical hospitalizations
    σ₅ ~ Gamma(1.0, 5.0) # Deceased

    # Creating a dataframe of parameter priors
    param_prior_df = DataFrame(
        :β => fill(β₁, 1),
        :α => fill(α₁, 1),
        :θ => fill(θ₁, 1),
        :γᵢ => fill(γᵢ₁, 1),
        :γₙ => fill(γₙ₁, 1),
        :ϕᵣ => fill(ϕᵣ₁, 1),
        :ϕᵢ => fill(ϕᵢ₁, 1),
        :ϕₖ => fill(ϕₖ₁, 1),

    # Final parameter container
    const_params = [num_variants, tot_pop]
    final_params = [const_params; param_prior_df]

    # Defining the initial conditions and the problem
    I0₁ ~ Gamma(2.0, 30.0)
    u0 = zeros(eltype(I0₁), 5*num_mutants+5)
    u0[1] = tot_pop - I0₁
    u0[num_mutants+2] = I0₁
    u0[4*num_mutants+4] = I0₁
    inference_problem = ODEProblem(our_model_without_vaccination!, u0, ODEtspan, final_params)

    # Solve the model
    inference_solution = solve(inference_problem, AutoVern7(Rodas5()), saveat = 1.0, save_everystep=false)

    # Modify the solution to get the inference data
    inference_data = inferenceDataModifier_var2_noVaccine!(num_variants, inference_solution)

    # Inference using multivariate student's t-distributions
    data[:,1] ~ arraydist(LocationScale.(inference_data[:,1], σ₁.*sqrt.(data[:,1]), TDist.(4)))
    data[:,3] ~ arraydist(LocationScale.(inference_data[:,2], σ₃.*sqrt.(data[:,3]), TDist.(4)))
    data[:,4] ~ arraydist(LocationScale.(inference_data[:,3], σ₄.*sqrt.(data[:,4]), TDist.(4)))
    data[:,5] ~ arraydist(LocationScale.(inference_data[:,4], σ₅.*sqrt.(data[:,5]), TDist.(4)))

Section 3: Run the inference procedure and save the chains

@everywhere final_model = truth_data_fitting!(truth_data, tspan, V, N)
@everywhere chains = sample(final_model, NUTS(2000, 0.65), MCMCDistributed(), 5000, num_chains; progress=true)
write("chainData_truthFitting.jls", chains);

This code tries to run four chains parallelly. To reiterate, the code doesn’t give out any errors in either version of Julia. But in Julia v1.7.3, one chain doesn’t move from the initial estimate, while it doesn’t happen in v1.6.3.

Sorry but I think this fails on two of the three requirements of an MWE:

  • Minimial - it’s quite a lot of code
  • Working - it relies on files defined elsewhere, so it can’t run

I know it can be hard to get help on complex pieces of code, but at the same time narrowing things down to the smallest possible bit of code that reproduces your problem often in itselft makes you find the issue.

One thing I would say though is that you seem to be installing all packages in your default environment, which is never a good idea. It looks like your code actually only 8 packages, so those are the packages which should be in your environment. Step 1 therefore could be to create a fresh environment which only has the packages you need, and try to run the code using that in both 1.6 and 1.7.

Your current environments have many differences, some which could obviously be the cause of differences (e.g. a different Turing version), some which might be (e.g. a different Distributions version) and some of which we might not be able to see (as there might be differences between transitory dependencies).


Sincere apologies! :pray: I knew the code doesn’t work because of how I wrote it, thanks to the problem I was working on, but I tried to make it as minimal as possible. Moreover, since there were no problems in v1.6, I thought it wouldn’t be with my code and thus shared the package details. Nevertheless, I will keep MWE etiquette in mind next time onwards.

I knew the package versions were different but wasn’t expecting such huge discrepancies in the output. Sure, I will eliminate these differences and try again. :slight_smile: