How to make this AFT regression stable?

I have adapted this excellent post by @aaowens to fit a right-censored AFT regression model, but for some reason the likelihood is not compatible with forwarddiff.

using Distributions,Optim
function AFT(pvec,data)
    function l(theta)
        α = exp(theta[1])
        θ = exp(theta[2])
        wdist = Weibull(α,θ)
        -sum(logpdf(wdist,theta[3:end]' * d[3:end] - d[2])^(d[1]) + logccdf(wdist,theta[3:end]' * d[3:end] - d[2])^(1-d[1]) for d in data)
    end
    l(pvec)
    optimize(l,pvec,BFGS(),autodiff = :forward)
end

a = 2.0
b = 2.0
n = 100
beta = [0.1,0.3,.8]
wb = Weibull(a,b)
X = rand(n,3)
delta = rand([0,1],n) # delta[i] = 1 iff Y[i] censored
Y = X * beta
for i in 1:n
    if delta[i] == 1
        Y[i] = Y[i] - Y[i]*rand()
    end
end

paramInit = fill(0.1,5)
resp = AFT(paramInit, collect(eachrow([delta Y X])))

julia> resp = AFT(paramInit, collect(eachrow([delta Y X])))
 * Status: failure (line search failed)

 * Candidate solution
    Final objective value:     Inf

 * Found with
    Algorithm:     BFGS

 * Convergence measures
    |x - x'|               = 0.00e+00 ≤ 0.0e+00
    |x - x'|/|x'|          = 0.00e+00 ≤ 0.0e+00
    |f(x) - f(x')|         = NaN ≰ 0.0e+00
    |f(x) - f(x')|/|f(x')| = NaN ≰ 0.0e+00
    |g(x)|                 = 1.15e+02 ≰ 1.0e-08

 * Work counters
    Seconds run:   0  (vs limit Inf)
    Iterations:    1
    f(x) calls:    1
    ∇f(x) calls:   1

I am not really sure, but my guess is that your l function cannot handle the dual numbers of ForwardDiff.jl. In short, theta will be a dual number and the output of the function needs to be one too.

As an example, see cell 4 or this notebook. In particular, see how s2 is set up so it can host dual numbers.

Thanks @Paul_Soderlind, can you explain what it is about the explicit allocation of s2 or the subsequent updates that ensures LL_t will be a dual?

    s2    = zeros(typeof(alpha),T)                 #works with ForwardDiff 
    s2[1] = omega + alpha*s2_0 + beta1*s2_0        #simple, but slow approach
    for t = 2:T                                    #using filter() is perhaps quicker
      s2[t] = omega + alpha*u[t-1]^2 + beta1*s2[t-1]
    end

This line doesn’t make sense.

-sum(logpdf(wdist,d[3:end]' * d[2:end])^(d[1]) * logccdf(wdist,d[3:end]' * d[2:end])^(1-d[1]) for d in data)

First, you are iterating over data which is a Vector{Float64}, so each d is an individual Float64, not a vector (this is the source of the indexing error you see). I suspect you meant for data to be a Vector{Vector{Float64}}, something like [rand(wb, 10) for _ in 1:10]? Or maybe some reshape of Y?

When you fix the shape of the data, d[3:end] is a different length than d[2:end], so that product will not work. Perhaps that should be d[3:end]' * d[2:end-1]?

s2 = zeros(typeof(alpha),T) means that when alpha is a dual number, then s2 will be as well.

However, it is not the explicit allocation per se that is important. What matters is that the output (likelihood value) can be a dual number when the parameters are. You can easily check this by trying ForwardDiff.gradient(l,paramInit).

As far as I can see, the sum, pdf and cdf of the Weibull should survive this, so the problem might be elsewhere.

Fixed a bunch of bugs, thanks for all the suggestions.
The following seems to run fine, but has identifiability issues. While the relative magnitude of positive beta seems roughly correct, it can’t get the sign right.
I tried adding a ridge regularizer only on the linear coefficients (commented) but that led to stability issues as well

using Revise,Distributions,Optim,DistributedTopicModels,Survival,LineSearches,Plots,Random,LinearAlgebra

n = 10000
# beta = [1,10]
beta = [10,1]
# beta = [1,-1]
# beta = [-1,1]
s = Random.seed!(1)
X = rand(s,n,2)
delta = rand([0,1],n) # delta[i] = 1 iff Y[i] censored
μ = X * beta

histogram(μ)

λ = 1.1
α = .1
wd = Weibull(λ,α)
s = Random.seed!(1)
ϵ = rand(s,wd,n)

Y = log.(0.1(μ) + ϵ)
# Y = log.(0.1(μ) + ϵ)*0.1

histogram(Y)

Ycen = [delta[i] == 1 ? Y[i]-rand()*Y[i] : Y[i] for i in 1:n]
paramInit = fill(0.1,3)

function tAFT(pvec,data)
    function f(w)
        exp(w - exp(w))
    end
    function F(w)
        exp(-exp(w))
    end
    function l(theta)
        -sum(log(f((d[2]-d[3:end]'*theta[2:end])/theta[1])^d[1]) + log(F((d[2]-d[3:end]'*theta[2:end])/theta[1])^(1-d[1])) for d in data)
        # -sum(log(f((d[2]-d[3:end]'*theta[2:end])/theta[1])^d[1]) + log(F((d[2]-d[3:end]'*theta[2:end])/theta[1])^(1-d[1])) + 0.1*norm(theta[3:end])^2 for d in data)
    end
    optimize(l,pvec,BFGS(linesearch=LineSearches.BackTracking()),autodiff = :forward)
    # optimize(l,pvec,LBFGS(),autodiff = :forward)
end


resp = tAFT(paramInit, collect(eachrow([delta Y X])))
resp.minimizer

julia> resp = tAFT(paramInit, collect(eachrow([delta Y X])))
 * Status: success

 * Candidate solution
    Final objective value:     8.489688e+03

 * Found with
    Algorithm:     BFGS

 * Convergence measures
    |x - x'|               = 4.88e-15 ≰ 0.0e+00
    |x - x'|/|x'|          = 1.87e-15 ≰ 0.0e+00
    |f(x) - f(x')|         = 0.00e+00 ≤ 0.0e+00
    |f(x) - f(x')|/|f(x')| = 0.00e+00 ≤ 0.0e+00
    |g(x)|                 = 7.78e-08 ≰ 1.0e-08

 * Work counters
    Seconds run:   1  (vs limit Inf)
    Iterations:    76
    f(x) calls:    150
    ∇f(x) calls:   77


julia> resp.minimizer
3-element Array{Float64,1}:
 2.6089785118820537
 2.0258941072666534
 0.362426844935407