Why is this optimization problem so slow in Julia? In R it took 60s

I copied the data and the code into a repo GitHub - xiaodaigh/optim-slow-julia

I am trying to estimate the strength of Go player ( or any other two player broad game where the only outcomes are win or loser).

My data is quite simple. It’s a data frame column the Integer ID of the winner and loser of each game. Like this

using DataFrames
using CSV

data = CSV.read("goplayers.csv", DataFrame)

const P1=Int.(data.p11)
const P2=Int.(data.p22)

function loglik1(w1, w2)
    x = w1-w2
    log(1/(1 + exp(-x)))

init_w = rand(length(unique(vcat(P1, P2))))

function neg_log_lik(weights)
    -mapreduce(((p1, p2),)->loglik1(weights[p1], weights[p2]), +, zip(P1, P2))

using RCall

@rput P1 P2

# finishes in about 60s
@time R"""
w = runif(max(c(P1, P2)))
system.time(m <- optim(w, function(w) {
  x  = w[P1] - w[P2]
  p = 1/(1 + exp(-x))
}, method="BFGS"))

ping = m$par

@rget ping

using Optim: optimize, BFGS
# take forever
@time opm = optimize(neg_log_lik, init_w, BFGS())

It took a lot of time. But the same task too only 60s in R. So I am wondering what I am doing wrong here.

For this random example, it’s quite ok but for the specific data that I have it takes a long time.

Can you show the R code? It seems hard to compare without seeing them side-by-side.

  1. Pass an explicit gradient, or it’ll do finite differences
  2. Use LBFGS, not BFGS

where a is the dataframe

system.time(m <- optim(w, function(w) {
  x  = w[a$p11] - w[a$p22]
  p = 1/(1 + exp(-x))
}, method="BFGS"))

But I do the same in R? Not sure why. It seems hard to reproduce without the data as Julia seems to be ok with random data.

> system.time(m <- optim(w, function(w) {
  x  = w[a$p11] - w[a$p22]
  p = 1/(1 + exp(-x))
}, method="BFGS"))
Error in optim(w, function(w) { : object 'w' not found
Timing stopped at: 0 0 0.001

Can you send a working code snippet so that it’s easier to compare?

Ok. U just need to initialize the w

I am outside atm. Let me make a new in one script with dat

For me (on a MacBook Pro), the original code takes

 28.235802 seconds (12.17 M allocations: 23.930 GiB, 7.51% gc time, 11.78% compilation time)

 23.435813 seconds (1.57 M allocations: 23.396 GiB, 9.50% gc time)

Improving the kernel (avoid allocations and using @vectorize from LoopVectorization.jl), i.e.

using LoopVectorization

function neg_log_lik_optim(init_w)
    s = 0.0
    @vectorize for i in 1:length(winner_ids)
        a = winner_ids[i]
        b = loser_ids[i]
        x = init_w[a] - init_w[b]
        s += log(1/(1+exp(-x)))
    return s

@time opm = optimize(neg_log_lik_optim, init_w, BFGS())

I get

19.492552 seconds (14.23 M allocations: 814.709 MiB, 1.25% gc time, 51.31% compilation time)

  9.360662 seconds (1.24 k allocations: 5.054 MiB)

(I don’t know any R so I can’t yet compare to it.)

julia> versioninfo()
Julia Version 1.6.0
Commit f9720dc2eb (2021-03-24 12:55 UTC)
Platform Info:
  OS: macOS (x86_64-apple-darwin19.6.0)
  CPU: Intel(R) Core(TM) i7-9750H CPU @ 2.60GHz
  LIBM: libopenlibm
  LLVM: libLLVM-11.0.1 (ORCJIT, skylake)

(jl_DHogdh) pkg> st
      Status `/private/var/folders/1d/sbf5s24x6y306tgv_4hh06y00000gn/T/jl_DHogdh/Project.toml`
  [a93c6f00] DataFrames v1.1.1
  [bdcacae8] LoopVectorization v0.12.21
  [429524aa] Optim v1.3.0
1 Like

I have put the code and the data into a repo GitHub - xiaodaigh/optim-slow-julia

I can still replicate the issue.

R finishes in about 60s but Julia takes forever. Doesn’t seem to be terminating.

Note that generating the data randomly works doesn’t generate the issue. So seems to be something specific with the data.

1 Like

return -s

Just in case the question is not, how to make this approach faster but how to solve it: This is logistic regression and one can find the optimum with our GLM package.

1 Like

is it? I just happened to use the logistic transform though.

I need x = strength1-strength2
and that logit(x) = probability that player 1 wins.

How can you formulate that as a GLM?

I believe this time difference is caused simply by different default iteration limits in R and Julia. That is, neither of them reaches a convergence criterion, and both only stop when they reach a pre-specified number of function calls.


Yeah, note that the objective is invariant under addition of a constant to everybody, so convergence is an issue.

it will still converge though

On an side note, it’s a nice little puzzle to compute this function without running into accuracy problems and problems from overflow/underflow. Consider

julia> f(x) = log(1/(1+exp(-x)))

julia> f(100), Float64(f(big"100.0")) # spurious underflow: no correct digits
(0.0, -3.720075976020836e-44)

julia> f(-1000), Float64(f(big"-1000.0")) # spurious overflow
(-Inf, -1000.0)

julia> f(20), Float64(f(big"20.0"))  # only 8 correct digits
(-2.0611536942919273e-9, -2.061153620314381e-9)

One alternative is:

g(x) = x > 0 ? -log1p(exp(-x)) : x - log1p(exp(x))

Do you mean it converges according to your favorite criteria or that it actually stopped? Try turning on the trace.

I dont need to

@time opm = optimize(neg_log_lik, init_w, BFGS())

ping = opm.minimizer

@time opm = optimize(neg_log_lik, opm.minimizer, BFGS())

You can just run it twice like this. And notice that the values didn’t change after u run it again. It converges, but the constant is not uniquely pre-determined that’s all. Anyway, it’s easy to fix, just set one of the player’s strength to 0. It still converges either way whether u can someone to 0 or not…

When did this change occur? I can’t find anything in the Loopvectorization.jl readme or manual about the new name, nor any issue or pr pertaining to it.

I followed a discussion on Zulip about it. Maybe @avx is still working though.