How to make this AFT regression stable?

mkarikom · January 20, 2021, 8:37pm

I have adapted this excellent post by @aaowens to fit a right-censored AFT regression model, but for some reason the likelihood is not compatible with forwarddiff.

using Distributions,Optim
function AFT(pvec,data)
    function l(theta)
        α = exp(theta[1])
        θ = exp(theta[2])
        wdist = Weibull(α,θ)
        -sum(logpdf(wdist,theta[3:end]' * d[3:end] - d[2])^(d[1]) + logccdf(wdist,theta[3:end]' * d[3:end] - d[2])^(1-d[1]) for d in data)
    end
    l(pvec)
    optimize(l,pvec,BFGS(),autodiff = :forward)
end

a = 2.0
b = 2.0
n = 100
beta = [0.1,0.3,.8]
wb = Weibull(a,b)
X = rand(n,3)
delta = rand([0,1],n) # delta[i] = 1 iff Y[i] censored
Y = X * beta
for i in 1:n
    if delta[i] == 1
        Y[i] = Y[i] - Y[i]*rand()
    end
end

paramInit = fill(0.1,5)
resp = AFT(paramInit, collect(eachrow([delta Y X])))

julia> resp = AFT(paramInit, collect(eachrow([delta Y X])))
 * Status: failure (line search failed)

 * Candidate solution
    Final objective value:     Inf

 * Found with
    Algorithm:     BFGS

 * Convergence measures
    |x - x'|               = 0.00e+00 ≤ 0.0e+00
    |x - x'|/|x'|          = 0.00e+00 ≤ 0.0e+00
    |f(x) - f(x')|         = NaN ≰ 0.0e+00
    |f(x) - f(x')|/|f(x')| = NaN ≰ 0.0e+00
    |g(x)|                 = 1.15e+02 ≰ 1.0e-08

 * Work counters
    Seconds run:   0  (vs limit Inf)
    Iterations:    1
    f(x) calls:    1
    ∇f(x) calls:   1

Paul_Soderlind · January 20, 2021, 8:57pm

I am not really sure, but my guess is that your l function cannot handle the dual numbers of ForwardDiff.jl. In short, theta will be a dual number and the output of the function needs to be one too.

As an example, see cell 4 or this notebook. In particular, see how s2 is set up so it can host dual numbers.

mkarikom · January 20, 2021, 9:35pm

Thanks @Paul_Soderlind, can you explain what it is about the explicit allocation of s2 or the subsequent updates that ensures LL_t will be a dual?

    s2    = zeros(typeof(alpha),T)                 #works with ForwardDiff 
    s2[1] = omega + alpha*s2_0 + beta1*s2_0        #simple, but slow approach
    for t = 2:T                                    #using filter() is perhaps quicker
      s2[t] = omega + alpha*u[t-1]^2 + beta1*s2[t-1]
    end

contradict · January 20, 2021, 10:21pm

This line doesn’t make sense.

-sum(logpdf(wdist,d[3:end]' * d[2:end])^(d[1]) * logccdf(wdist,d[3:end]' * d[2:end])^(1-d[1]) for d in data)

First, you are iterating over data which is a Vector{Float64}, so each d is an individual Float64, not a vector (this is the source of the indexing error you see). I suspect you meant for data to be a Vector{Vector{Float64}}, something like [rand(wb, 10) for _ in 1:10]? Or maybe some reshape of Y?

When you fix the shape of the data, d[3:end] is a different length than d[2:end], so that product will not work. Perhaps that should be d[3:end]' * d[2:end-1]?

Paul_Soderlind · January 21, 2021, 10:00am

s2 = zeros(typeof(alpha),T) means that when alpha is a dual number, then s2 will be as well.

However, it is not the explicit allocation per se that is important. What matters is that the output (likelihood value) can be a dual number when the parameters are. You can easily check this by trying ForwardDiff.gradient(l,paramInit).

As far as I can see, the sum, pdf and cdf of the Weibull should survive this, so the problem might be elsewhere.

mkarikom · January 22, 2021, 1:01am

Fixed a bunch of bugs, thanks for all the suggestions.
The following seems to run fine, but has identifiability issues. While the relative magnitude of positive beta seems roughly correct, it can’t get the sign right.
I tried adding a ridge regularizer only on the linear coefficients (commented) but that led to stability issues as well

using Revise,Distributions,Optim,DistributedTopicModels,Survival,LineSearches,Plots,Random,LinearAlgebra

n = 10000
# beta = [1,10]
beta = [10,1]
# beta = [1,-1]
# beta = [-1,1]
s = Random.seed!(1)
X = rand(s,n,2)
delta = rand([0,1],n) # delta[i] = 1 iff Y[i] censored
μ = X * beta

histogram(μ)

λ = 1.1
α = .1
wd = Weibull(λ,α)
s = Random.seed!(1)
ϵ = rand(s,wd,n)

Y = log.(0.1(μ) + ϵ)
# Y = log.(0.1(μ) + ϵ)*0.1

histogram(Y)

Ycen = [delta[i] == 1 ? Y[i]-rand()*Y[i] : Y[i] for i in 1:n]
paramInit = fill(0.1,3)

function tAFT(pvec,data)
    function f(w)
        exp(w - exp(w))
    end
    function F(w)
        exp(-exp(w))
    end
    function l(theta)
        -sum(log(f((d[2]-d[3:end]'*theta[2:end])/theta[1])^d[1]) + log(F((d[2]-d[3:end]'*theta[2:end])/theta[1])^(1-d[1])) for d in data)
        # -sum(log(f((d[2]-d[3:end]'*theta[2:end])/theta[1])^d[1]) + log(F((d[2]-d[3:end]'*theta[2:end])/theta[1])^(1-d[1])) + 0.1*norm(theta[3:end])^2 for d in data)
    end
    optimize(l,pvec,BFGS(linesearch=LineSearches.BackTracking()),autodiff = :forward)
    # optimize(l,pvec,LBFGS(),autodiff = :forward)
end


resp = tAFT(paramInit, collect(eachrow([delta Y X])))
resp.minimizer

julia> resp = tAFT(paramInit, collect(eachrow([delta Y X])))
 * Status: success

 * Candidate solution
    Final objective value:     8.489688e+03

 * Found with
    Algorithm:     BFGS

 * Convergence measures
    |x - x'|               = 4.88e-15 ≰ 0.0e+00
    |x - x'|/|x'|          = 1.87e-15 ≰ 0.0e+00
    |f(x) - f(x')|         = 0.00e+00 ≤ 0.0e+00
    |f(x) - f(x')|/|f(x')| = 0.00e+00 ≤ 0.0e+00
    |g(x)|                 = 7.78e-08 ≰ 1.0e-08

 * Work counters
    Seconds run:   1  (vs limit Inf)
    Iterations:    76
    f(x) calls:    150
    ∇f(x) calls:   77


julia> resp.minimizer
3-element Array{Float64,1}:
 2.6089785118820537
 2.0258941072666534
 0.362426844935407

Topic		Replies	Views
Getting ForwardDiff.Dual to propagate through function New to Julia optim , forwarddiff , autodiff	5	762	December 13, 2023
Optim, Forward Differentiation and OnceDifferentiable with MLE Optimization (Mathematical) optim	18	2704	November 28, 2020
ForwardDiff fast construction of vectors/matrices without cats Optimization (Mathematical) question	6	665	August 12, 2019
Maximum Likelihood Multivariate Model Optimization (Mathematical) optim	15	1569	January 10, 2020
ForwardDiff AD Error using distributed within Turing @model Probabilistic programming turing , forwarddiff , bayesian-inference , ordinarydiffeq	0	298	January 17, 2023

How to make this AFT regression stable?

Related topics