Hi all, I am new to Julia, coming from MATLAB. I am writing some optimization code that tries to find parameters by minimizing the difference between some already computed probabilities (s
) and model estimated probabilities (σ
) as a function of parameters (β
) using Optim.jl. The main part of the code (complete reproducible code all the way below) is:
f(β) = sum((compute_σ(β) - s).^2)
function g!(G, β)
σ = compute_σ(β)
Δ = compute_Δ(σ)
G .= 2 .* Δ' * (σ - s)
return G
end
res = optimize(f, g!, zeros(K), LBFGS())
Note, however, that compute_σ(β)
is being called both on f
and g!
. In practice, this means that the gradient f'
is only a function of y=f(x)
and not of x
directly. So one can compute the gradient using already estimated quantities. In the real code, compute_σ(β)
is a very expensive computation that I would prefer not to have to do twice. Is there any way to create the same code but avoiding calculating it twice?
The full code is below. For those interested, this is trying to implement a multinomial logit as in BLP, 1995.
using Optim, Random, ForwardDiff, BenchmarkTools
Random.seed!(1234)
function accumarray!(out, ids, val)
for i in eachindex(ids, val)
out[ids[i]] += val[i]
end
return out
end
accumarray(ids, val) = accumarray!(zeros(eltype(val), ids[end]), ids, val)
function compute_σ(β)
Xβ = X*β
exp_Xβ = exp.(Xβ)
sum_Xβ = accumarray(id_mkt, exp_Xβ)
σ = exp_Xβ ./ sum_Xβ[id_mkt]
return σ
end
function compute_Δ(σ)
mult_mat = σ .* X
deriv_wide = zeros(eltype(σ), maximum(id_mkt), size(X, 2))
for k in axes(X, 2)
deriv_wide[:, k] = accumarray(id_mkt, view(mult_mat,:,k))
end
Δ = mult_mat .- σ .* deriv_wide[id_mkt, :]
return Δ
end
N = 1000; # number of mkts
J = 30; # number of products per market
K = 4; # number of product characteristics
# Define product characteristics (and normalize one product to 0)
X = randn(Float64, N*(J+1), K);
X[1:J+1:N*(J+1), :] .= 0.0;
# Define mkt_ids equivalent to id_fd_mkt and id_fd_ind
id_mkt = [i for i in 1:(N) for j = 1:(J+1)];
# Define truth to estimate
β_true = randn(K)
s = compute_σ(β_true)
# Define objective function
f(β) = sum((compute_σ(β) - s).^2)
function g!(G, β)
σ = compute_σ(β)
Δ = compute_Δ(σ)
G .= 2 .* Δ' * (σ - s)
return G
end
# Optimize
res = optimize(f, g!, zeros(K), LBFGS())