Julia vs R vs Python: simple MLE problem

This is the draft of my new post (https://www.codementor.io/zhuojiadai/draft/gnqi4njro) on Julia vs R vs Python on a simple MLE problem. Unfortunately, Julia comes out slightly worse due to long compilation times.

Welcome suggestions on things to try and improve.

2 Likes

The formulas are not displaying correctly for me:

Your other linked post returns 404 HTTP status code:

1 Like

17? That seems a little high. I just ran it 3 times:

               _
   _       _ _(_)_     |  A fresh approach to technical computing
  (_)     | (_) (_)    |  Documentation: https://docs.julialang.org
   _ _   _| |_  __ _   |  Type "?help" for help.
  | | | | | | |/ _` |  |
  | | |_| | | | (_| |  |  Version 0.6.2 (2017-12-13 18:08 UTC)
 _/ |\__'_|_|_|\__'_|  |  Official http://julialang.org/ release
|__/                   |  x86_64-w64-mingw32

  7.220850 seconds (9.44 M allocations: 505.088 MiB, 2.15% gc time)
  0.000114 seconds (516 allocations: 23.469 KiB)
Julia has exited. Press Enter to start a new session.
               _
   _       _ _(_)_     |  A fresh approach to technical computing
  (_)     | (_) (_)    |  Documentation: https://docs.julialang.org
   _ _   _| |_  __ _   |  Type "?help" for help.
  | | | | | | |/ _` |  |
  | | |_| | | | (_| |  |  Version 0.6.2 (2017-12-13 18:08 UTC)
 _/ |\__'_|_|_|\__'_|  |  Official http://julialang.org/ release
|__/                   |  x86_64-w64-mingw32

  7.285036 seconds (9.44 M allocations: 505.088 MiB, 2.17% gc time)
  0.000107 seconds (516 allocations: 23.469 KiB)
Julia has exited. Press Enter to start a new session.
               _
   _       _ _(_)_     |  A fresh approach to technical computing
  (_)     | (_) (_)    |  Documentation: https://docs.julialang.org
   _ _   _| |_  __ _   |  Type "?help" for help.
  | | | | | | |/ _` |  |
  | | |_| | | | (_| |  |  Version 0.6.2 (2017-12-13 18:08 UTC)
 _/ |\__'_|_|_|\__'_|  |  Official http://julialang.org/ release
|__/                   |  x86_64-w64-mingw32

  7.057593 seconds (9.44 M allocations: 505.076 MiB, 3.66% gc time)
  0.000194 seconds (516 allocations: 23.469 KiB)

Also, the formulas arenā€™t displaying.

I wonder what message you are trying to convey with this post. Basically for any trivial problem, compilation time \gg runtime, so Julia is not going to do well. This is well known, but also pretty irrelevant, since for trivial problems, you can use any language you already know and it wonā€™t matter.

Writeups for real-life projects would be much more informative, even if they are small. Eg anything that runs over a minute or, heaven forbid, 5 minutes; and has 200+ LOC.

In particular, Julia is very suitable for ML / MAP, since it has robust optimizers and automatic differentiation. But an interested user would not learn about this from your post.

9 Likes

Is Optim.jl using ForwardDiff by default here?

Not if it used Nelder-Mead.
BFGS with forward mode AD would likely have been faster.

Iā€™m also having the same problem as @NaOH.

I think his point if that for very simple problems Julia is a bit annoying to use because of the compilation times, which seems like a fair assessment.

Only remark is that you maybe want to mention Optim.minimizer(mfit) to get the minimizer, which I think is the more robust interface.

1 Like

It might be worth mentioning Revise.jl. This package allows you to modify code without the need to restart your Julia session and recompile everything anew. It solves a slightly different problem, but it is an annoying one that I have encountered quite a bit during development.

1 Like

Yes, but the way itā€™s written isnā€™t fair because it makes it sound like itā€™s harmful in any case other than running the same code once or twice:

If you just wish to run the code once or twice then Juliaā€™s long compilation may be something you want to avoid.

Thatā€™s false. For one thing, compilation-time is fixed. You donā€™t have to ā€œsolve a problem twiceā€ to get side step compilation time if the calculation is long since at that point it wonā€™t cause any noticable difference anyways (Iā€™ve enountered that some people believe ā€œthe compilation runā€ is somehow slower: no, thatā€™s not the case. All it does it compile first. Thereā€™s no type-instabilities in the first run or anything like that.)

Also, the problem is constrained to only the first call, even if a different optimization problem is solved. Hereā€™s an example:

using Distributions, Optim
# hard coded data\observations
odr=[0.10,0.20,0.15,0.22,0.15,0.10,0.08,0.09,0.12]
Q_t = quantile.(Normal(0,1), odr)

# return a function that accepts `[mu, sigma]` as parameter
function neglik_tn(Q_t)
    maxx = maximum(Q_t)
    f(Ī¼Ļƒ) = -sum(logpdf.(Truncated(Normal(Ī¼Ļƒ[1],Ī¼Ļƒ[2]), -Inf, maxx), Q_t))
    f
end

neglikfn = neglik_tn(Q_t)

# optimize!
# start searching
@time res = optimize(neglikfn, [mean(Q_t), std(Q_t)]) # 7 seconds
@time res = optimize(neglikfn, [mean(Q_t), std(Q_t)]) # 0.0001 seconds

function neglik_tn2(Q_t)
    maxx = maximum(Q_t)
    f(Ī¼Ļƒ) = -sum(logpdf.(Truncated(Normal(Ī¼Ļƒ[1],Ī¼Ļƒ[2]), -Inf, maxx), Q_t))
    f
end
neglikfn2 = neglik_tn2(Q_t)
@time res = optimize(neglikfn2, [mean(Q_t), std(Q_t)]) # 0.06 seconds

A more accurate statement is:

If you wish to solve only a single optimization problem and that problem takes <10 seconds, then Juliaā€™s long initial compilation is something you want to avoid. For long enough problems, or for solving multiple optimization problems, the compilation time is not noticeable.

Maybe explain via some examples. I would write it as:

If you want to solve a problem that takes 3000 seconds to solve, the first time in a Julia session will make it take 3007 seconds making the total runtime more relevant to the total time than compilation. If you want to solve many 100 10 second optimization problems, the first will take 17 seconds, and subsequent calls will not have the compilation and will take roughly 10 seconds, making the total runtime 1007 seconds and thus the individual problem each is more relevant than the compilation time. If you want to solve a single 5 second optimization problem, itā€™ll be 12 seconds. Thus the compilation time issue can be annoying for interactive use when you want to solve a single problem but doesnā€™t effect performance-sensitive uses.

Julia v0.7 does have an interpreter though which promises to reduce compilation time in exchange for less optimizations and this may be an interesting middle ground for the specific interactive use case. Additionally, tools like PackageCompiler.jl may become more common, limiting the ā€œstartup timeā€ of package code that is noticed here.

So there is an issue, but IMO when you explain exactly what that issue is it becomes apparent that itā€™s due to how itā€™s being used, and whether that would be an issue for you is something for the reader to decide.

7 Likes

The ā€œthe codeā€ in ā€œIf you just wish to run the code once or twiceā€ clearly refers to ā€œsuch a simple optimization problemā€, not to any piece of code, but yes the issue could maybe explained a bit more in detail so people donā€™t get confused.

I have a function that does MLE for a user specified likelihood function. One of the methods runs a simple example, which serves as documentation. In my .juliarc file, I call this simple example, so that ML estimation is warmed up an ready to use without compilation when I need it (except for the user-specified likelihood function).

The example method:

function mleresults()
    println("execute edit(mleresults,()) to examine the example code")
    x = randn(100,3)
    Ī² = rand(3)
    println("true betas: ")
    prettyprint(Ī²)
    y = zeros(100)
    for i = 1:100
        y[i] = rand(Poisson(exp.(x[i,:]'Ī²)))
    end    
    model = Ī² -> poisson(Ī², y, x)
    mleresults(model, Ī², "simple ML example");
end    

the results:

julia> @time mleresults();
execute edit(mleresults,()) to examine the example code
true betas: 
     0.27935
     0.55297
     0.87107
************************************************************
simple ML example
MLE Estimation Results
BFGS convergence: Normal convergence
Average Log-L: -1.40224
Observations: 100

                estimate     st. err      t-stat     p-value
           1     0.36406     0.06912     5.26741     0.00000
           2     0.65008     0.06001    10.83261     0.00000
           3     0.87825     0.06666    13.17476     0.00000

Information Criteria
                   Crit.      Crit/n
       CAIC    297.26347     2.97263
        BIC    294.26347     2.94263
        AIC    286.44796     2.86448
************************************************************
  0.001996 seconds (1.67 k allocations: 490.592 KiB)

julia> 

Notice that the time to run this is essentially zero! This is in https://github.com/mcreel/Econometrics.jl, by the way.

1 Like

Sounds like you should just add that to your sysimage

I tried to do that about half a year ago or so, and didnā€™t quite figure it out. Maybe once 1.0 arrives Iā€™ll make the effort to learn how to do it. But putting commonly used stuff in .juliarc with toy usages is a convenient and easy way to have things ready for interactive use.

1 Like

I couldnā€™t have put it better myself, so Iā€™ve quoted you wholesale.

On my comptuer itā€™s down to 7.5 seconds now; perhaps I had something in the background. On the work computer it is still 12 seconds though.

You really need to fix the MathJax issues there (The LaTeX isnā€™t rendered well) before one could really read into your writing.

What are the timings for each?

Itā€™s fixed now. I am pretty sure it was just bug on the codementor.io platform where they donā€™t display drafts properly.

Compile times. I donā€™t really think we (Optim.jl) can do so much about that currently. If we can, Iā€™d be happy to help make it faster.

The real kicker is the following: you favorite computer scientist comes up with a new cool way to use arbitrary precision really really fast. What framework is most likely to work with this new number type (julia already has several implemented)? Nelder Mead in R, Python or Julia? And if none of them supports it, who will have the easiest time patching their package?

Now I know that a lot of people use what people might know as double precision numbers, but the flexibility of Julia is really not present in either R or Python to the extent that I know those languages/frameworks.

Let me also frame the compile time in another light. How long time did you spend from thinking about writing the blog post until you actually pressed enter on ā€œoptimize(ā€¦)ā€ ? A bit longer than 10 seconds I guess. It is of course a problem if youā€™re running 1000 different optimization problems, say for variations of a model, but then just donā€™t run your ā€œloop over modelsā€ outside of Julia. Instead of running 1000 scripts, run one script that runs 1000 models.

Edit:

I hope I didnā€™t come off as too defensive btw. I really do appreciate you spreading the word about Optim. I can survive the compile times, but truth be told we could always use more eyes and more feedback!

7 Likes

I have been thinking about whether I would prefer to use Julia or R in this instance. For work, I have written code to solve a much more complicated version of the problem in R. Thatā€™s before I started using Julia seriously.

Now I think about it, I would have preferred Julia over R for that problem. I might add that to the post once I find time to finish the Fannie Mae data analysis