Question about Bayesian Survival Analysis with Cox (proportional hazards) regression method in Julia with Turing

Dear All,
I am trying to implement Bayesian Survival Analysis with Cox (proportional hazards) regression method in Julia based on the PyMC3 code written by Austin Rochford https://austinrochford.com/posts/2015-10-05-bayes-survival.html.
Here are my codes:

using RDatasets,DataFrames,Plots,StatsPlots,Turing,Distributions,FillArrays
HSAUR = dataset("HSAUR","mastectomy");
HSAUR[!,:CMetastized]=levelcode.(HSAUR[!,:Metastized]).-1;

interval_length = 3
interval_bounds =0:interval_length:maximum(HSAUR[!,:Time])+interval_length+1
n_intervals =length(interval_bounds)-1
n_patients=nrow(HSAUR)
patients=1:n_intervals
death =zeros(Int64, n_patients, n_intervals)
exposure =zeros(Int64, n_patients, n_intervals)
last_period =floor.(Int64,HSAUR[!,:Time] ./ interval_length);

I don’t know if Julia has a similar function like greater_equal.outer in NumPy, so I just simply write the following code to covert:

for i in 1:n_patients
    death[i,last_period[i]]=HSAUR[i,:Event]
    for j in 1:n_intervals
       if HSAUR[i,:Time] >= interval_bounds[j]
          exposure[i,j]=1*interval_length
        else
          exposure[i,j]=0
        end   
    end
    exposure[i,last_period[i]]=HSAUR[i,:Time]-interval_bounds[last_period[i]]
end

Turing model and sampling code:

@model function cox_regression(death,exposure,x)
    λ0 ~Gamma(0.01, 0.01)
   # σ  ~Uniform(0.0, 10.0)
   # τ  =  σ^2
   # μβ  ~ Normal(0.0, 100)
   # β  ~ Normal(μβ,τ)
     β  ~ Normal(0.0,100)
    λ = exp.(β .* x).* fill(λ0,n_intervals)'
    μ  = exposure .*λ    
    for i in 1:length(death)
      death[i] ~ Poisson(μ[i])
    end
end

chain = sample(cox_regression(death,exposure,HSAUR[!,:CMetastized]), NUTS(0.65),1000)

I am new to Julia and Turing, I am not sure what is wrong that causes very long sampling time (seems never stop)… Could experts here advise me to solve the issue?

Thanks,

Ryan

1 Like

Sorry it’s taken a while to get a response, I’m guessing it’s because the overlap of folks in biology and probabilistic programming in this community is not (yet!) Very large - I’m going to post the link in some channels on slack and zulip to see if there’s anyone that can help. Hang in there!

Hi Ryan

Is this the exact code you tried to run?

I get an error

ERROR: ArgumentError: column name :CMetastized not found in the data frame; existing most similar names are: :Metastized

and then after correcting that

MethodError: no method matching *(::Float64, ::CategoricalArrays.CategoricalValue{String, UInt8})

These are all quick little fixes, but it would be much easier if you have a working MWE. one quick thing I noticed is that you use global variables in the model like n_intervals. In Julia you will tend to get much better performance if you pass those variables into the actual function.

in terms of the second error, try

x = HSAUR[!,:Metastized]  .== "yes"

This will creat a Bool vector wher yes = 1 and no = 0
then pass in x instead of HSAUR[!,:Metastized]

this atleast runs

using RDatasets,DataFrames,Plots,StatsPlots,Turing,Distributions,FillArrays
HSAUR = dataset("HSAUR","mastectomy");
HSAUR[!,:CMetastized]=levelcode.(HSAUR[!,:Metastized]).-1;

interval_length = 3
interval_bounds =0:interval_length:maximum(HSAUR[!,:Time])+interval_length+1
n_intervals =length(interval_bounds)-1
n_patients=nrow(HSAUR)
patients=1:n_intervals
death =zeros(Int64, n_patients, n_intervals)
exposure =zeros(Int64, n_patients, n_intervals)
last_period =floor.(Int64,HSAUR[!,:Time] ./ interval_length);

for i in 1:n_patients
    death[i,last_period[i]]=HSAUR[i,:Event]
    for j in 1:n_intervals
       if HSAUR[i,:Time] >= interval_bounds[j]
          exposure[i,j]=1*interval_length
        else
          exposure[i,j]=0
        end   
    end
    exposure[i,last_period[i]]=HSAUR[i,:Time]-interval_bounds[last_period[i]]
end

x = HSAUR[!,:Metastized]  .== "yes"
@model function cox_regression(death,exposure,x, n_intervals,nm = size(death))
    n, m = nm
    λ0 ~Gamma(0.01, 0.01)
     β  ~ Normal(0.0,100)
    λ = exp.(β .* x) .* fill(λ0,n_intervals)'
    μ  = exposure .*λ    
    for j in 1:m
        for i in 1:n
            death[i,j] ~ Poisson(μ[i,j])
        end
    end
end

chain = sample(cox_regression(death,exposure .+0.1,x,n_intervals), NUTS(0.65),1000)

I had to add 0.1 to your exposure array, (which I’m sure is not what you want, but I did it just to check that the model will run with valid inputs). The issue seems to be that there are instances with 0 exposure, which means 0 μ as well. and logpdf(Poisson(0),any_number_other_than_zero) = -Inf i.e. given your inputs, the posterior probability of any parameter values at all will always be zero and you’ll get flooded with warnings.

Warning: The current proposal will be rejected due to numerical error(s).
│   isfinite.((θ, r, ℓπ, ℓκ)) = (true, true, false, true)

I’d check that the calculations for exposure are equivalent to the python code. I’ve never used PyMC3, but essentially this is not a bug, you should not be able to sample from a model where the likelihood encounters y ~ Poisson(0) where y ≠ 0 on each iteration.

Hi Kevbonham,

Thanks for your help.

Best Regards,

Ryan

Hi EvoArt:

Thank you very much for looking into it. I did have the code to categorized CMetastized:
HSAUR[!,:CMetastized]=levelcode.(HSAUR[!,:Metastized]).-1;

I forgot to post that I used CategoricalArrays.jl . But your method works easier and faster.
And thanks again for your explanation. I agree with you, since the exposure has many 0, that sampler would be struggled with… the formula is: μ = exposureλ = exposureλ0 exp.(β . x) that’s why posterior is Poisson. shift by 0.1 should not affect the result. I don’t know how PyMC3 samples such a scenario and R also has a package, I will see if I can find what posterior other people used from paper. I tried to run the PyMC3 code, but it has many package dependency issues.

Best Regards,

Ryan