I have adapted this excellent post by @aaowens to fit a right-censored AFT regression model, but for some reason the likelihood is not compatible with forwarddiff.
using Distributions,Optim
function AFT(pvec,data)
function l(theta)
α = exp(theta[1])
θ = exp(theta[2])
wdist = Weibull(α,θ)
-sum(logpdf(wdist,theta[3:end]' * d[3:end] - d[2])^(d[1]) + logccdf(wdist,theta[3:end]' * d[3:end] - d[2])^(1-d[1]) for d in data)
end
l(pvec)
optimize(l,pvec,BFGS(),autodiff = :forward)
end
a = 2.0
b = 2.0
n = 100
beta = [0.1,0.3,.8]
wb = Weibull(a,b)
X = rand(n,3)
delta = rand([0,1],n) # delta[i] = 1 iff Y[i] censored
Y = X * beta
for i in 1:n
if delta[i] == 1
Y[i] = Y[i] - Y[i]*rand()
end
end
paramInit = fill(0.1,5)
resp = AFT(paramInit, collect(eachrow([delta Y X])))
julia> resp = AFT(paramInit, collect(eachrow([delta Y X])))
* Status: failure (line search failed)
* Candidate solution
Final objective value: Inf
* Found with
Algorithm: BFGS
* Convergence measures
|x - x'| = 0.00e+00 ≤ 0.0e+00
|x - x'|/|x'| = 0.00e+00 ≤ 0.0e+00
|f(x) - f(x')| = NaN ≰ 0.0e+00
|f(x) - f(x')|/|f(x')| = NaN ≰ 0.0e+00
|g(x)| = 1.15e+02 ≰ 1.0e-08
* Work counters
Seconds run: 0 (vs limit Inf)
Iterations: 1
f(x) calls: 1
∇f(x) calls: 1
I am not really sure, but my guess is that your l
function cannot handle the dual numbers of ForwardDiff.jl. In short, theta
will be a dual number and the output of the function needs to be one too.
As an example, see cell 4 or this notebook. In particular, see how s2
is set up so it can host dual numbers.
Thanks @Paul_Soderlind, can you explain what it is about the explicit allocation of s2
or the subsequent updates that ensures LL_t
will be a dual?
s2 = zeros(typeof(alpha),T) #works with ForwardDiff
s2[1] = omega + alpha*s2_0 + beta1*s2_0 #simple, but slow approach
for t = 2:T #using filter() is perhaps quicker
s2[t] = omega + alpha*u[t-1]^2 + beta1*s2[t-1]
end
This line doesn’t make sense.
-sum(logpdf(wdist,d[3:end]' * d[2:end])^(d[1]) * logccdf(wdist,d[3:end]' * d[2:end])^(1-d[1]) for d in data)
First, you are iterating over data
which is a Vector{Float64}
, so each d
is an individual Float64
, not a vector (this is the source of the indexing error you see). I suspect you meant for data
to be a Vector{Vector{Float64}}
, something like [rand(wb, 10) for _ in 1:10]
? Or maybe some reshape of Y
?
When you fix the shape of the data, d[3:end]
is a different length than d[2:end]
, so that product will not work. Perhaps that should be d[3:end]' * d[2:end-1]
?
s2 = zeros(typeof(alpha),T)
means that when alpha is a dual number, then s2
will be as well.
However, it is not the explicit allocation per se that is important. What matters is that the output (likelihood value) can be a dual number when the parameters are. You can easily check this by trying ForwardDiff.gradient(l,paramInit)
.
As far as I can see, the sum, pdf and cdf of the Weibull should survive this, so the problem might be elsewhere.
Fixed a bunch of bugs, thanks for all the suggestions.
The following seems to run fine, but has identifiability issues. While the relative magnitude of positive beta seems roughly correct, it can’t get the sign right.
I tried adding a ridge regularizer only on the linear coefficients (commented) but that led to stability issues as well
using Revise,Distributions,Optim,DistributedTopicModels,Survival,LineSearches,Plots,Random,LinearAlgebra
n = 10000
# beta = [1,10]
beta = [10,1]
# beta = [1,-1]
# beta = [-1,1]
s = Random.seed!(1)
X = rand(s,n,2)
delta = rand([0,1],n) # delta[i] = 1 iff Y[i] censored
μ = X * beta
histogram(μ)
λ = 1.1
α = .1
wd = Weibull(λ,α)
s = Random.seed!(1)
ϵ = rand(s,wd,n)
Y = log.(0.1(μ) + ϵ)
# Y = log.(0.1(μ) + ϵ)*0.1
histogram(Y)
Ycen = [delta[i] == 1 ? Y[i]-rand()*Y[i] : Y[i] for i in 1:n]
paramInit = fill(0.1,3)
function tAFT(pvec,data)
function f(w)
exp(w - exp(w))
end
function F(w)
exp(-exp(w))
end
function l(theta)
-sum(log(f((d[2]-d[3:end]'*theta[2:end])/theta[1])^d[1]) + log(F((d[2]-d[3:end]'*theta[2:end])/theta[1])^(1-d[1])) for d in data)
# -sum(log(f((d[2]-d[3:end]'*theta[2:end])/theta[1])^d[1]) + log(F((d[2]-d[3:end]'*theta[2:end])/theta[1])^(1-d[1])) + 0.1*norm(theta[3:end])^2 for d in data)
end
optimize(l,pvec,BFGS(linesearch=LineSearches.BackTracking()),autodiff = :forward)
# optimize(l,pvec,LBFGS(),autodiff = :forward)
end
resp = tAFT(paramInit, collect(eachrow([delta Y X])))
resp.minimizer
julia> resp = tAFT(paramInit, collect(eachrow([delta Y X])))
* Status: success
* Candidate solution
Final objective value: 8.489688e+03
* Found with
Algorithm: BFGS
* Convergence measures
|x - x'| = 4.88e-15 ≰ 0.0e+00
|x - x'|/|x'| = 1.87e-15 ≰ 0.0e+00
|f(x) - f(x')| = 0.00e+00 ≤ 0.0e+00
|f(x) - f(x')|/|f(x')| = 0.00e+00 ≤ 0.0e+00
|g(x)| = 7.78e-08 ≰ 1.0e-08
* Work counters
Seconds run: 1 (vs limit Inf)
Iterations: 76
f(x) calls: 150
∇f(x) calls: 77
julia> resp.minimizer
3-element Array{Float64,1}:
2.6089785118820537
2.0258941072666534
0.362426844935407