Optim.jl can't optimize function which includes BigFloat variables

This is a simple maximum likelihood estimation example from the tutorials of the Optm.jl, I just change the the log_ \sigma 's type from float to BigFloat in the Log_Likelihood function , however, the error message shows that no method matching big, I know there’s no error if it is float type, but sometime, there exists some questions including really big numbers, so I wonder, is there some way julia can optimize a function which includes BigFloat?

using Optim, NLSolversBase, Random
using LinearAlgebra: diag
Random.seed!(0);                            # Fix random seed generator for reproducibility

n = 500                             # Number of observations
nvar = 2                            # Number of variables
β = ones(nvar) * 3.0                # True coefficients
x = [ones(n) randn(n, nvar - 1)]    # X matrix of
ε = randn(n) * 0.5                  # Error variance
y = x * β + ε;                      # Generate Data
function Log_Likelihood(X, Y, β, log_σ)
    σ = exp(big(log_σ))
    llike = -n/2*log(2π) - n/2* log(σ^2) - (sum((Y - X * β).^2) / (2σ^2))
    llike = -llike
end
func = TwiceDifferentiable(vars -> Log_Likelihood(x, y, vars[1:nvar], vars[nvar + 1]),
                           ones(nvar+1); autodiff=:forward)
opt = optimize(func, ones(nvar+1))
parameters = Optim.minimizer(opt)
# ERROR: LoadError: MethodError: no method matching big(::ForwardDiff.Dual.....

try passing big numbers directly instead of big() it inside, the error seems to be Forwarddiff doesn’t specialize big() on their Dual number. But they have promotions rules and everything

2 Likes

As an aside, do you really need BigFloat or is something like DoubleFloats.jl sufficient for your purposes?

Other than that, passing log_σ as a BigFloat directly instead of creating it inside of your function is going to be better as @jling mentioned.

2 Likes

There is no point in converting a single term like σ to BigFloat if the rest of the computations are done in double precision. e.g. log(2π) is computed in double precision, and that will generally limit your accuracy to double precision even if parts of the computation are BigFloat.

(Aside: don’t use global variables like n in your functions if you can help it. In this case, you should get n = length(Y) from the arguments.)

In general, you should try to write code that works for any number type based on the types of the arguments. This can be a little tricky but the result is much more flexible code. For example:

function Log_Likelihood2(X, Y, β, log_σ)
    n = length(Y)
    σ = exp(log_σ)
    llike = n * log_σ + sum((Y .- X * β).^2) / 2σ^2
    return llike + n * log(2*oftype(llike, π)) / 2
end

Not that it uses oftype(llike, π) to compute π in the same precision as llike, and rearranges the calculation to avoid n/2 (Int/Int uses double precision, though technically dividing integers by 2 can be done exactly it still changes the precision).

Now you can compute in whatever precision you want — but you should probably pass all of the arguments in the same precision, as otherwise the accuracy will be limited by the least-precise argument.

This is also important for automatic differentiation, since that works by passing “just another number type” (ForwardDiff.Dual) that keeps track of derivatives.

2 Likes

On a separate note (since this is not the right solution to your problem as noted above), it seems like big(::Dual) should work. I filed an issue: missing method big(::Dual) · Issue #537 · JuliaDiff/ForwardDiff.jl · GitHub

This would make an easy first PR if someone wants to take it on.

1 Like

I’m wondering if this is why you think you need BigFloat precision? Your log(σ^2) expression will spuriously overflow and give Inf if log_σ > log(sqrt(floatmax())) ≈ 355. However, if you simply rewrite this to n * log_σ, as in my suggested code, the problem goes away and you should get an accurate, finite result even when log_σ is large.

(The other term, which divides by 2σ^2, will simply underflow to 0.0 when log_σ is large, so it isn’t a problem.)

While @jling has definitely answered your actual question, I can’t help but pile on with people who are asking whether it is sensible to use extended precision here: assuming that you care about the uncertainty for your point estimate of \sigma, it probably won’t be so small that the extra decimal places you’d get from using big are actual helpful. Like, if your estimate looks like \hat{\sigma} \pm 0.05 or something, then I think using extended precision to get \hat{\sigma} up to 100 decimal points doesn’t actually provide you with any additional information in the sense that the additional digits will change the actual likelihood a negligible amount.

Again, though, not really what you asked, so sorry if this doesn’t actually have anything to do with your real application and you were just looking to make a simple example.

1 Like