Conditional logit regression -solved using cox regression

I noticed on here a thread which can’t now find about conditional/multinomial logit. Actually, there is no need to implement anything new - this is trivial if cox proportional hazards regression exists and allows for left-censoring. All that needs to be done is to define the ‘start/stop’ times for the various strata as being non-overlapping time intervals but with the same intervals for every member of the same stratum. Then the solution is identical to the conditional logit.

The only difficulty I have is the the required package EventHistory does not seem to be compatible with my Julia 1.2 (I get the dreaded “Unsatisfiable requirements detected” error when I try to use it).
[I have a standard installation of Julia on the latest MacOSX - I guess Julia hasn’t caught up yet]

EventHistory isn’t maintained AFAICT. You can try Survival.jl, which should work on Julia 1.2.

1 Like


Has mlogit and polr models. I plan to add some of the conditional models, but if you can contribute the Cox variant I would be happy to include it.

1 Like

Hi Jose,

Thanks for that.   At this point, all I could do is add an expanded

description because
one of the required packages (EventHistory) for the cox proportional
hazards regression does
not work for Julia on my Mac. However, I have lots of experience
with this trick, and
it has one huge advantage, which is to allow for ties (like in a
race, where two competitors
cannot be separated).

Perhaps if I get access to another computer where Julia runs

properly I may be able to write
and test a script.

Regards,

DavidE

Thank you - that is very useful to know.

I have now loaded Survival.jl, but I can’t seem to find any examples or useful documentation of that package - the very terse documentation for it on the github site is not very helpful for a ‘newbie’ such as myself. Do you know of anywhere I can find an example of someone using this for cox regression in Julia? If I were to have this, I do think I could offer a useful contribution as suggested above.
Thanks in advance for any help.

I agree that the documentation could do with some improvement - and this being open source, you might be the perfect person to do it, once you’ve worked out how to get the results you need!

Fortunately the package has a test suite, so this gives an example of the Cox regression in action:

Thank you again for that. It looks like my idea won’t work with this version.
The (apparently unsupported) previous EventHistory had allowed for left-censoring, which was crucial
for applying my idea; It appears that the more recent (and supported) ‘EventTime’ as used here does not allow for that.

There are many reasons why left-censoring is quite essential in a cox regression package,
and my application is a case-in-point. Or perhaps there is another way to run the Cox model using
this package which allows for left-censoring? If not, I would suggest that it be placed high on the
‘to do’ list in order to make this package more widely applicable. If I am missing something here and left-censoring is allowed by the current version then I would love to know how that works. In any case, I am grateful for any info or help on this.

Do you have any resources or good inspiration for an implementation based on mlogit/polr? The infrastructure has been developed for those so I suppose it would only take some flavor syntax,

@formula(response | options ~ id | explanatory variables)

Maybe something like that where the table looks like (for the cmlogit)

outcome option id price income
true    train  1  $50   $60,000
false   flight 1  $500  $60,000

Then we would only have to adapt the objective function (conditional log-likelihood) and compute the gradient. For ologit, I had to use a trick for the AD to work properly (inspired on the MASS implementation).

I have another (really wacky but effective) approximate (but extremely accurate) solution to this using ordinary logistic regression. Assume the strata are 1,2,…m, and imagine a binary encoding of the first m integers (there will be approximately log(m) base 2 variables required, which is typically not large in relation to n in practice). Assign these to the strata, making factors of them, and then include them with the other input variables when running logistic regression. The resulting model will be nearly identical to the conditional logit, as will the inference, provided the factors are treated as nuisance variables. With slight modification (see below) this circumvents having to write a new routine for conditional logit. As noted above, typically log(m) will be much smaller than n (the sample size) which is important to ensure the numerical stability of the optimisation. [Construction of the binary code is done easily using the base 2 logarithm of each integer and just using the digits]. The above approximation becomes exact if the link function of logit is replaced by the logarithm.
I hope this makes sense and helps. [At least until someone adds left-censoring to the cox regression]

The approach I was considering was Stata’s clogit (page 15 on methods for estimation). Any thoughts on that? The package already has variable absorption so high dimensional features are not an issue. However, I will need to check how code up the gradient of the conditional log-likelihood ("The derivatives of the conditional log-likelihood can also be computed recursively by taking derivatives of the recursive formula f", basically homework). If it is in any of the textbooks like (Stata cites, Greene 2018, chap. 18) it might be easier to find it.

Same for McFadden’s.

If you are talking about the derivatives of the negative log-likelihood, they are quite easy. All you have to do is center the inputs by their probability-weighted averages (as in, estimated probs.), call this matrix x. If the binary output variable is y, then the gradient is merely x’y and the Hessian is given by x’Diag( p )x, where p denote the estimated probabilities (i.e., exp(x b) renormalised by strata).
Does this help? And (by the way) something like R’s clogit would be very useful.

I would have to check if the thresholds do not need to be fitted in that approach. For example, the Hessian in the polr is blockwise where one block is parameters (à la way you described), then there is the parameters cross thresholds, and lastly the last block is the thresholds cross thresholds…
\partial \beta \partial \beta \partial \theta\partial\beta
\partial \theta \partial \beta \partial \theta\partial\theta
Those I could not get manually at all (thanks for AD)… still the gradient also have two components \partial \beta \ \partial \theta.
See the code in for polr.