Optim returns slightly different result in unit testing environment

I’ve been scratching my head in the past few hours with no answer. I hope someone has a clue.

I added some unit test cases for the BoxCoxTrans.jl package (unregistered) but the test fails only in the unit testing environment due to numerical precision problem. If I run the exact same code in REPL then the results matches exactly the test code.

If anyone wants to replicate the problem, this branch exhibits the problem with atol=1e-9.

(v0.7) pkg> add https://github.com/tk3369/BoxCoxTrans.jl#tk/precision-issue

Are you running into this (from NEWS.md)?

  • isapprox(x,y) now tests norm(x-y) <= max(atol, rtol*max(norm(x), norm(y))) rather than norm(x-y) <= atol + ... , and rtol defaults to zero if an atol > 0 is specified (#22742).

This bit me a few days ago.

EDIT: maybe not, since this only surprised me when updating from julia 0.6 to 0.7, and it looks like your package used 0.7 from the get-go.

No… A little more information below.

The log shows that Base.Test calculated lambda of -0.9917203620435803.

Test Failed at /Users/tomkwong/.julia/dev/BoxCoxTrans/test/runtests.jl:32
  Expression: ≈(λ, -0.99172, atol=precision)
   Evaluated: -0.9917203620435803 ≈ -0.99172 (atol=1.0e-9) 

If I run it from REPL, I get:

julia> lambda(𝐱)
-0.9917203225477127

I can confirm what you’re seeing both in the test failure and the REPL. But… maybe (probably) I’m missing something, but wouldn’t the test also fail if the REPL result was used?

Yes, it fails when I use REPL results in the test script.

For now, I’m working around the problem by using atol=1e-4 (see here), which is low enough for it to pass. But it bothers me because the calculation shouldn’t be any different depending on whether I run it from the REPL or not…

Bounds checking being on or off might cause e.g. SIMD to be used or not used which can slightly change the answer.

2 Likes

I agree with @kristoffer.carlsson, you should not expect exactly identical results in a different environment. You should pick tolerance based on what you asked your algorithm to achieve.

Specifically, consider an explicit tolerance in the call to optimize, possibly exposed to the user (with a default), and just test that the result is within that range.