Cox Proportional Modelling in Julia

The following code of mine shows Surv function not found. Can someone help me out on this

using DataFrames, SurvivalAnalysis, Survival

# Load example data
df = DataFrame(duration = [5, 3, 9, 8, 7, 4, 3, 2, 1],
                event = [1, 1, 1, 0, 0, 0, 0, 1, 1],
                x1 = randn(9),
                x2 = randn(9))

# Fit the proportional hazards model
fit = coxph(Surv(df.duration, df.event) ~ df.x1 + df.x2, df)

# Summarize the results

I can’t reproduce this:

julia> coxph(Surv(df.duration, df.event) ~ df.x1 + df.x2, df)
ERROR: MethodError: no method matching ~(::SurvivalAnalysis.IntSurv, ::Vector{Float64})

That said, I’m not sure I understand what you are doing - you seem to be mixing SurvivalAnalysis with Survival, two packages which implement similar (and overlapping) functionality, but aren’t meant to be used together as far as I know. SurvivalAnalysis doesn’t implement a Cox-PH model at present, and Surv is a type defined in the Survival package.

In Survival, a Cox-PH model is fit as follows:

julia> df.event = EventTime.(df.duration, df.event .== 1);

julia> coxph(@formula(event ~ x1 + x2), df)
StatsModels.TableRegressionModel{CoxModel{Float64}, Matrix{Float64}}

event ~ x1 + x2

     Estimate  Std.Error    z value  Pr(>|z|)
x1   0.474264   0.590107   0.803691    0.4216
x2  -0.310167   0.497345  -0.623646    0.5329

See the docs here:

In Survival, the syntax is:

f = kaplan_meier(@formula(Surv(Y, D) ~ 1), data)

However as I said above there’s no coxph method.

Also note that your syntax for calling the fitting procedure is off - the first arugment should be a formula, which you either create by using the @formula macro or by putting together Term objects yourself, not by passing the actual data directly.

I recommend you read the docs of both Survival and SurvivalAnalysis to understand the correct syntax and the functionality of each package, and maybe the docs of StatsModels as well to understand how @formula works.

That is really great, for the clarity you gave to use either of survival or survival analysis package. Despite that, can you provide me the reference to perform the test-train split and train the model using survival regression and make a prediction on that, in a similar style to what you have mentioned.

Splitting the data into test and train samples is just elementary data processing, nothing to do with either of those packages. I guess you just have to make sure that you randomly split on observation ids rather than rows (if you have panel data).

By “training” the model I assume you mean just fitting it - I’ve shown how to do this above. I don’t think Survival implements a predict function, so you’ll probably have to roll your own predictions from the estimated coefficients.

Can you please help me out in formulating a way of predicting using the estimated coefficients as shown in your explanation of the code?