LinearAlgebra: unstable floating point operations?

Hello, I had some scripts running and at some point I noticed that some unexpected error was thrown:

ERROR: LoadError: TaskFailedException:
DomainError with 1.0000000000000002:
acos(x) not defined for |x| > 1
Stacktrace:
 [1] acos_domain_error(::Float64) at ./special/trig.jl:671
 [2] acos(::Float64) at ./special/trig.jl:701
 [3] (::Main.TIP4PIce.var"#myangle#8")(::Array{Float64,1}, ::Array{Float64,1}) at /home/riccardo/Uni/thesis/ICETIP4P/src/lib/WaterMolecule.jl:36
 [4] count_hbonds(::Array{Water,1}, ::Int64, ::Float64) at /home/riccardo/Uni/thesis/ICETIP4P/src/lib/WaterMolecule.jl:49
 [5] macro expansion at /home/riccardo/Uni/thesis/ICETIP4P/src/defects.jl:35 [inlined]
 [6] (::var"#19#threadsfor_fun#7"{String,Array{String,1}})(::Bool) at ./threadingconstructs.jl:61
 [7] (::var"#19#threadsfor_fun#7"{String,Array{String,1}})() at ./threadingconstructs.jl:28
Stacktrace:
 [1] wait(::Task) at ./task.jl:267
 [2] macro expansion at ./threadingconstructs.jl:69 [inlined]
 [3] main(::String) at /home/riccardo/Uni/thesis/ICETIP4P/src/defects.jl:30
 [4] top-level scope at /home/riccardo/Uni/thesis/ICETIP4P/src/defects.jl:57
 [5] include(::Module, ::String) at ./Base.jl:377
 [6] exec_options(::Base.JLOptions) at ./client.jl:288
 [7] _start() at ./client.jl:484
in expression starting at /home/riccardo/Uni/thesis/ICETIP4P/src/defects.jl:57

This error in the application of acos resulted from the usage of the simple function
myangle(u, v) = acos( dot(u, v) / ( norm(u)*norm(v) ) )
that should evaluate the angle between two vectors u and v(they are Vector{Float64} of size 3 with limited values between 0 and some finite positive number).
The functions dot and norm come from the LinearAlgebra package.

This exact same error happened 6 times out of various millions(maybe billions) applications of that function; it might not seem like much but, well, it should not happen I guess.

Sadly, since I did not really expect this error and had these computations running, unsupervised, for a long time on huge datasets, I’m gonna have a hard time finding exactly what values of u and v produced the errors, but I will update the post if/when I find them.

In the meantime, my questions are: should I have expected this, or is there something sneaky going on? what would you recommend to deal with this?

So I know this isn’t what you asked for, but have you seen https://github.com/JeffreySarnoff/AngleBetweenVectors.jl? @JeffreySarnoff put a lot of thought into making this robust.

that said, you could just do something like

my_acos(x) = x ≈ 1 ? zero(x) : x ≈ -1 ? one(x)*π : acos(x)

julia> my_acos(x) = x ≈ 1 ? zero(x) : x ≈ -1 ? one(x)*π : acos(x)
my_acos (generic function with 1 method)

julia> my_acos(1 + 1e-10)
0.0

julia> my_acos(1 + 1e-5)
ERROR: DomainError with 1.00001:
acos(x) not defined for |x| > 1
Stacktrace:
 [1] acos_domain_error(::Float64) at ./special/trig.jl:671
 [2] acos(::Float64) at ./special/trig.jl:701
 [3] my_acos(::Float64) at ./REPL[3]:1
 [4] top-level scope at REPL[5]:1

You can also adjust the tolerance you use for checking if |x| is approximately 1.

6 Likes

Yes, this is sort of expected, not because something is sneaky going on, but because of the reality of floating point computation. You lose precision with each multiplication and addition that occur in dot and norm (and the division). There is some chance of getting an argument greater than one for acos, which indeed appears to have happened.

There is also a second problem in acos, which is that it wouldn’t be very accurate near 1 and -1. It has infinite slope at those values, and would be quite sensitive to small inaccuracies in the argument. This shouldn’t error out, but will cause the angle to be suspect anyway.

One might wonder whether you really need an angle at all. Perhaps there is enough information in the dot product between two normalized numbers, which would eliminate need for acos altogether. It could be argued that acos adds no information and reduces accuracy.

If you do indeed need the angle, @Mason has provided two good ways to avoid the error.

7 Likes

Thanks to both of you.
As often happens, it turns out that simple is not synonymous with easy: although I do know that floating point operations do mess up sometimes, I just assumed, erroneously, that such a simple task would not need anything more than a naive direct calculation.

@apo383 you make a good point about the need for acos; in fact, I need the values of these angles, but since I only need SOME of them(the small ones!) I could as well work with their cos, filter them, and only then apply acos, without messing up the calculations.

3 Likes

I wouldn’t do that, this is 78% slower (for acos(0.5) at least):

I would change to (where I can’t time a difference):

@noinline my_slow_acos(x) = x ≈ 1 ? zero(x) : x ≈ -1 ? one(x)*π : acos(x)

my_acos(x) = abs(x) <= one(x) ? acos(x) : my_slow_acos(x)  # one simple test for the common case
2 Likes

I @btimed the two functions on a Vector{Float64} containing 10^5 random numbers and your proposal produces a ~50% speedup with respect to @Mason’s , but I do not understand why.
Would you please explain?
And further, what is the purpose of @noinline here?

Thank you

As I said (renamed to) my_slow_acos(x) is 78% slower for one specific number, while the speed or slowdown depends on the exact numbers, and also more factors, like if you have memory accesses:

julia> numbers = rand(-1.0:0.1:1.0, 10^5);

julia> @btime acos.(numbers);
  1.913 ms (5 allocations: 781.39 KiB)

julia> @btime my_acos.(numbers);
  1.994 ms (5 allocations: 781.39 KiB)

julia> my_slow_acos(x) = x ≈ 1 ? zero(x) : x ≈ -1 ? one(x)*π : acos(x)

julia> @btime my_slow_acos.(numbers);
  2.863 ms (5 allocations: 781.39 KiB)

julia> @btime my_acos.(numbers);  # now without @noinline
  1.962 ms (5 allocations: 781.39 KiB)

I just know this (function and) operator, is slow, with all of its assembly code:

julia> @code_native 1.0 ≈ 1.1

That’s why I used @noinline. You might not have seen an effect in micobenchmarking, with the extra code polluting your level 1 cache, had it actually been inlined. I just knew my code should rarely exercise that code path, so it seemed better to force the hand of the compiler. I see the compiler actually took that decision anyway, to not inline (but I’ve seen it inline code I know not needed most of the time, i.e. exception handling and code related to such unusual code paths).

1 Like

This is a great fix! Is there a way to extend this to work for functions of differential variables? I am using MOLFiniteDifference.jl to solve a system of 2-dim PDEs in (t,x). Within the PDEs, I have a call to a term that is a function of the form c_h(t,x) = …

I made it to this page because I was having the acos() domain error when the PDEs were being solved. With the change to my_acos(x) that you recommended, I can’t even build the equations now without receiving this error saying there’s a type error:
TypeError: non-boolean (Num) used in boolean context

Stacktrace:
[1] my_acos(x::Num)
@ Main .\In[77]:5
[2] c_h(t::Num, x::Num)
@ Main .\In[84]:83
[3] c_1_neg(t::Num, x::Num)
@ Main .\In[84]:117
[4] q_1_star(t::Num, x::Num)
@ Main .\In[84]:129
[5] top-level scope
@ In[84]:139
[6] eval
@ .\boot.jl:373 [inlined]
[7] include_string(mapexpr::typeof(REPL.softscope), mod::Module, code::String, filename::String)
@ Base .\loading.jl:1196

Is there a way to extend my_acos(x) to be used with Nums as well?

Thanks!

I’m not familiar with the MOLFiniteDifference.jl, but try using acos(clamp(x,-1,1)).

Thanks for the response, but it looks like that still won’t work well with symbolics/Nums. Here is the error:

TypeError: non-boolean (Num) used in boolean context

Stacktrace:
[1] clamp(x::Num, lo::Int64, hi::Int64)
@ Base.Math .\math.jl:67
[2] c_h(t::Num, x::Num)
@ Main .\In[25]:83
[3] c_1_neg(t::Num, x::Num)
@ Main .\In[25]:117
[4] q_1_star(t::Num, x::Num)
@ Main .\In[25]:129
[5] top-level scope
@ In[25]:139
[6] eval
@ .\boot.jl:373 [inlined]
[7] include_string(mapexpr::typeof(REPL.softscope), mod::Module, code::String, filename::String)
@ Base .\loading.jl:1196

All of the described workarounds rely doing something like

if x < somevalue
  # fix x
elseif x > anothervalue
  # fix x
end

TypeError: non-boolean (Num) used in boolean context seems to indicate that the comparisons (e.g., x < somevalue) are evaluating to Num types instead of Bool types. Thus, they can’t be used in control flow. You’ll need to look into MOLFiniteDifference.jl package’s documentation to figure out how (if it’s even possible) to get boolean results from comparisons.

A very kludgy way to sidestep your issue might be to instead use acos(nearlyone*x) where nearlyone = 1-1e14 or some other just-less-than-one value. This will introduce a little error in your calculation, but will also bring slightly-bigger-than-one values into the legal range.

I’m afraid that, beyond these suggestions, I have no further help I can offer.