# Question about Bayesian Survival Analysis with Cox (proportional hazards) regression method in Julia with Turing

Dear All,
I am trying to implement Bayesian Survival Analysis with Cox (proportional hazards) regression method in Julia based on the PyMC3 code written by Austin Rochford https://austinrochford.com/posts/2015-10-05-bayes-survival.html.
Here are my codes:

``````using RDatasets,DataFrames,Plots,StatsPlots,Turing,Distributions,FillArrays
HSAUR = dataset("HSAUR","mastectomy");
HSAUR[!,:CMetastized]=levelcode.(HSAUR[!,:Metastized]).-1;

interval_length = 3
interval_bounds =0:interval_length:maximum(HSAUR[!,:Time])+interval_length+1
n_intervals =length(interval_bounds)-1
n_patients=nrow(HSAUR)
patients=1:n_intervals
death =zeros(Int64, n_patients, n_intervals)
exposure =zeros(Int64, n_patients, n_intervals)
last_period =floor.(Int64,HSAUR[!,:Time] ./ interval_length);

``````

I don’t know if Julia has a similar function like greater_equal.outer in NumPy, so I just simply write the following code to covert:

``````for i in 1:n_patients
death[i,last_period[i]]=HSAUR[i,:Event]
for j in 1:n_intervals
if HSAUR[i,:Time] >= interval_bounds[j]
exposure[i,j]=1*interval_length
else
exposure[i,j]=0
end
end
exposure[i,last_period[i]]=HSAUR[i,:Time]-interval_bounds[last_period[i]]
end
``````

Turing model and sampling code:

``````@model function cox_regression(death,exposure,x)
λ0 ~Gamma(0.01, 0.01)
# σ  ~Uniform(0.0, 10.0)
# τ  =  σ^2
# μβ  ~ Normal(0.0, 100)
# β  ~ Normal(μβ,τ)
β  ~ Normal(0.0,100)
λ = exp.(β .* x).* fill(λ0,n_intervals)'
μ  = exposure .*λ
for i in 1:length(death)
death[i] ~ Poisson(μ[i])
end
end

chain = sample(cox_regression(death,exposure,HSAUR[!,:CMetastized]), NUTS(0.65),1000)
``````

I am new to Julia and Turing, I am not sure what is wrong that causes very long sampling time (seems never stop)… Could experts here advise me to solve the issue?

Thanks,

Ryan

1 Like

Sorry it’s taken a while to get a response, I’m guessing it’s because the overlap of folks in biology and probabilistic programming in this community is not (yet!) Very large - I’m going to post the link in some channels on slack and zulip to see if there’s anyone that can help. Hang in there!

Hi Ryan

Is this the exact code you tried to run?

I get an error

``````ERROR: ArgumentError: column name :CMetastized not found in the data frame; existing most similar names are: :Metastized
``````

and then after correcting that

``````MethodError: no method matching *(::Float64, ::CategoricalArrays.CategoricalValue{String, UInt8})
``````

These are all quick little fixes, but it would be much easier if you have a working MWE. one quick thing I noticed is that you use global variables in the model like `n_intervals`. In Julia you will tend to get much better performance if you pass those variables into the actual function.

in terms of the second error, try

``````x = HSAUR[!,:Metastized]  .== "yes"
``````

This will creat a Bool vector wher yes = 1 and no = 0
then pass in `x` instead of `HSAUR[!,:Metastized]`

this atleast runs

``````using RDatasets,DataFrames,Plots,StatsPlots,Turing,Distributions,FillArrays
HSAUR = dataset("HSAUR","mastectomy");
HSAUR[!,:CMetastized]=levelcode.(HSAUR[!,:Metastized]).-1;

interval_length = 3
interval_bounds =0:interval_length:maximum(HSAUR[!,:Time])+interval_length+1
n_intervals =length(interval_bounds)-1
n_patients=nrow(HSAUR)
patients=1:n_intervals
death =zeros(Int64, n_patients, n_intervals)
exposure =zeros(Int64, n_patients, n_intervals)
last_period =floor.(Int64,HSAUR[!,:Time] ./ interval_length);

for i in 1:n_patients
death[i,last_period[i]]=HSAUR[i,:Event]
for j in 1:n_intervals
if HSAUR[i,:Time] >= interval_bounds[j]
exposure[i,j]=1*interval_length
else
exposure[i,j]=0
end
end
exposure[i,last_period[i]]=HSAUR[i,:Time]-interval_bounds[last_period[i]]
end

x = HSAUR[!,:Metastized]  .== "yes"
@model function cox_regression(death,exposure,x, n_intervals,nm = size(death))
n, m = nm
λ0 ~Gamma(0.01, 0.01)
β  ~ Normal(0.0,100)
λ = exp.(β .* x) .* fill(λ0,n_intervals)'
μ  = exposure .*λ
for j in 1:m
for i in 1:n
death[i,j] ~ Poisson(μ[i,j])
end
end
end

chain = sample(cox_regression(death,exposure .+0.1,x,n_intervals), NUTS(0.65),1000)
``````

I had to add 0.1 to your `exposure` array, (which I’m sure is not what you want, but I did it just to check that the model will run with valid inputs). The issue seems to be that there are instances with 0 exposure, which means 0 μ as well. and `logpdf(Poisson(0),any_number_other_than_zero) = -Inf` i.e. given your inputs, the posterior probability of any parameter values at all will always be zero and you’ll get flooded with warnings.

``````Warning: The current proposal will be rejected due to numerical error(s).
│   isfinite.((θ, r, ℓπ, ℓκ)) = (true, true, false, true)
``````

I’d check that the calculations for exposure are equivalent to the python code. I’ve never used PyMC3, but essentially this is not a bug, you should not be able to sample from a model where the likelihood encounters `y ~ Poisson(0)` where y ≠ 0 on each iteration.

Hi Kevbonham,

Best Regards,

Ryan

Hi EvoArt:

Thank you very much for looking into it. I did have the code to categorized CMetastized:
HSAUR[!,:CMetastized]=levelcode.(HSAUR[!,:Metastized]).-1;

I forgot to post that I used CategoricalArrays.jl . But your method works easier and faster.
And thanks again for your explanation. I agree with you, since the exposure has many 0, that sampler would be struggled with… the formula is: μ = exposureλ = exposureλ0 exp.(β . x) that’s why posterior is Poisson. shift by 0.1 should not affect the result. I don’t know how PyMC3 samples such a scenario and R also has a package, I will see if I can find what posterior other people used from paper. I tried to run the PyMC3 code, but it has many package dependency issues.

Best Regards,

Ryan