Hi,
I have a conceptually simple objective function (an MLE estimator) that has a lot of logs. This kind of objective function is very common in Statistics and Economics.
For example, consider the following objective function:
L(\theta)=\sum_{i=1}^{N} \sum_{j=0}^{J} y_{i j} \cdot \log \left(\sum_{r=1}^{R} \theta_{r} \cdot s_{i j}^{r}\right)
Where s_{ij}^r are constants and \theta = (\theta_1,...,\theta_R)^\prime are parameters. A MWE of the objective function, the gradient and the Hessian are:
using Random, LinearAlgebra, BenchmarkTools, FiniteDiff
# setup
const N = 2000 # observations
const J = 4 # options
const R = 20 # parameters
# constants
const s = [rand(J+1,R) for i=1:N] # s[i][j,r]
for i=1:N, r=1:R
s[i][:,r] /= sum(s[i][:,r])
end
# weights
y = [Float64.(rand(30:60,J+1)) for i=1:N]
# params
θ = rand(R)
θ /= sum(θ)
# objective function
function obj(
y::Vector{Vector{Float64}}, # N × J+1
s::Vector{Matrix{Float64}}, # N × J+1 × R
θ::Vector{Float64} # R-
)
absθ = abs.(θ)
N = length(s)
R = length(θ)
J = length(y[1])-1
Nt = Threads.nthreads()
llk = zeros(Float64, Nt)
Threads.@threads for i = 1:N
tid = Threads.threadid()
ss = zeros(Float64, J+1)
for r=1:R
ss += absθ[r] .* s[i][:,r]
end
@inbounds @simd for j=1:J+1
llk[tid] += y[i][j]*log(ss[j])
end
end
return sum(llk)/N
end
# gradient
function grad(
y::Vector{Vector{Float64}},
s::Vector{Matrix{Float64}},
θ::Vector{Float64}
)
absθ = abs.(θ)
N = length(s)
R = length(θ)
J = length(y[1])-1
Nt = Threads.nthreads()
dllk = zeros(Float64,R,Nt)
Threads.@threads for i = 1:N
tid = Threads.threadid()
ss = zeros(Float64, J+1)
for r=1:R
ss += absθ[r] .* s[i][:,r]
end
@inbounds @simd for j=1:J+1
@views dllk[:,tid] .+= y[i][j] .* s[i][j,:] ./ ss[j]
end
end
return sum(dllk,dims=2)[:,1]/N
end
# hessian
function hess(
y::Vector{Vector{Float64}},
s::Vector{Matrix{Float64}},
θ::Vector{Float64}
)
absθ = abs.(θ)
N = length(s)
R = length(θ)
J = length(y[1])-1
Nt = Threads.nthreads()
hllk = zeros(Float64,R,R,Nt)
Threads.@threads for i = 1:N
tid = Threads.threadid()
ss = zeros(Float64, J+1)
for r=1:R
ss += absθ[r] .* s[i][:,r]
end
@inbounds @simd for j=1:J+1
@views hllk[:,:,tid] .+= - y[i][j] .* (s[i][j,:] * s[i][j,:]') ./ ss[j]^2
end
end
return sum(hllk,dims=3)[:,:,1]/N
end
# testing with numerical derivatives
function obj_wrapper(θ)
return obj(y,s,θ)
end
test_grad = grad(y,s,θ)
num_grad = FiniteDiff.finite_difference_gradient(obj_wrapper,θ)
test_grad ≈ num_grad
test_hess = hess(y,s,θ)
num_hess = FiniteDiff.finite_difference_hessian(obj_wrapper,θ)
sum(abs2,test_hess .- num_hess)
Scaling up a little.
## scaling up
# setup
const N = 300000 # observations
const J = 4 # options
const R = 40 # parameters
# constants
const s = [rand(J+1,R) for i=1:N] # s[i][j,r]
for i=1:N, r=1:R
s[i][:,r] /= sum(s[i][:,r])
end
y = [Float64.(rand(30:60,J+1)) for i=1:N]
# params
θ = rand(R)
θ /= sum(θ)
using BenchmarkTools
@btime obj($y,$s,$θ)
@btime grad($y,$s,$θ)
@btime hess($y,$s,$θ)
5.456 s (36300025 allocations: 4.33 GiB)
5.150 s (36300031 allocations: 4.33 GiB)
32.944 s (37800033 allocations: 22.39 GiB)
This is a very simple example but the performance of real applications of MLE estimators can get very nasty and the logs make it worse.
I wonder what packages and tricks one can use to speed up the optimization of functions like this while coding in Julia?
Thank you all.