A subtle bug?

I got a Float64 number from the p-value of a GLM model. I could NOT show the fitted model with the following error:

ERROR: p-values must be in [0; 1]
 [1] error(::String) at ./error.jl:33
 [2] StatsBase.PValue(::Float64) at /Users/Thomas/.julia/packages/StatsBase/EA8Mh/src/statmodels.jl:430

after some investigations, I found that some p-value floats are strange:

julia> p = (GLM.coeftable(mod).cols[4])[5]    # mod is a model from GLM.lmI()

julia> typeof(p)

julia> p == 0

julia> p >= 0

julia> 1 >= p >= 0

julia> 0 <= p <= 1    # strange !!!

julia> p == 0.0    # strange !!!

in particular, that strange false from 0 <= p <= 1 is the cause of the error messages inside StatsBase.PValue().

I don’t know if it’s a bug from GLM or a more general and subtle bug in Julia? I’m using v1.5.3, thanks.

Without a MWE, I think it will be very difficult to debug your code. Can you please post the result of dump(p)?

julia> dump(p)
Float64 0.0

… there’s a long process to produce the data needed for the linear model… I don’t need how to give a MWE …

I can’t reproduce

julia> using StatsBase

julia> StatsBase.PValue(1e-100)

Would you mind reinstalling Julia using a newer version?

This is so far from reasonable that you don’t need to worry about the M. Any complete and working example which can reliably reproduce the output from your first message at the end would be good enough.

Pretty much the only ways to explain those results are:

  1. It’s not actually the same p everywhere.
  2. Someone has committed an absolutely heinous act of type piracy and redefined floating point comparisons.
1 Like

just installed v1.6.1 and the problem persists.

is it possible for me to save the data as a file and attach here?

it IS the same p everywhere. That’s why it’s “strange”.

I saved the design matrix X and response vector y into a .jld (by using JLD).
Then in a fresh session I loaded them and called mod = GLM.lm(X, y, true); and now the problem has gone?! that means I could not reproduce the bug by uploading the data…

Can you check the following?

julia> using StatsBase

julia> StatsBase.PValue(0.0)

first on a fresh session and then after running your code.

both on a fresh session and after running my code:

julia> StatsBase.PValue(0.0)

my work flow is:

julia> mod = GLM.lm(X, y, true);
julia> pvalues = GLM.coeftable(mod).cols[4];
julia> [println(i, " ", pvalues[i], " ", StatsBase.PValue(pvalues[i]) ) for i in 1:length(pvalues)]
1 0.0 <1e-99
2 2.83800947839654e-159 <1e-99
3 0.17914739722604017 0.1791
4 0.00728418728531684 0.0073
ERROR: p-values must be in [0; 1]
 [1] error(s::String)
   @ Base ./error.jl:33
 [2] StatsBase.PValue(v::Float64)
   @ StatsBase ~/.julia/packages/StatsBase/DU1bT/src/statmodels.jl:485

julia> p = pvalues[5]  # 1.6.1 shows NaN, 1.5.3 shows 0.0

julia> p == 0

julia> p == 0.0

julia> isnan(p)

julia> isfinite(p)

julia> 0 <= p <= 1

julia> 1 >= p >= 0

I have no idea, I could not reproduce it here… Thus, I will start to say random things :smiley:

Is it possible to execute changing the name of mod to something else? mod is a function in Base. It is not supposed to give you any problem, but I am lacking ideas…

EDIT: Wait, I am not understanding, p is NaN in 1.6 and still isnan(p) is false?

Yeah, unfortunately seems calculations of p-values are not quite right in a few key packages, we have the same error with:

a = [12,10,7,6,3,1]
b = [11,9,8,5,4,2]

Error showing value of type ExactMannWhitneyUTest{Float64}:
ERROR: p-values must be in [0; 1]

And this is an old problem: p val > 1 in ExactMannWhitneyUTest · Issue #126 · JuliaStats/HypothesisTests.jl · GitHub

Can you post the output of bitstring(p) for this weird p?

julia> bitstring(p)

now I know the cause of the problem: --math-mode=fast

I understand that any operation on NaN is unpredictable in fastmath mode, we need an important exception: isnan(NaN).

now in fastmath mode:

julia> isnan(NaN)

and this failure to detect NaN is the cause of all confusions. In this case, the following isnan(v) fails to catch the NaN and throws an error:

struct PValue <: Real
    function PValue(v::Real)
        0 <= v <= 1 || isnan(v) || error("p-values must be in [0; 1]")

we often have no control whether a package would produce NaN, and on the other hand, the package has no control if the user is doing fastmath or not. I strongly recommend that isnan(NaN) to return true even in fastmath mode!

Exactly, which is why you should never use global "fast"math in real applications.


but why don’t we allow isnan(NaN) == true in fastmath mode?

" Asking isnan correctly is computationally expensive and slow "


Next thing you will get some other error somewhere else because of another assumption of IEEE math in another package. It’s just completely unsafe to use this globally for running code you do not control 100% yourself.


…it is just sad to hear that … :cold_sweat:

well… seems like I could only override my own isnan() to cope with the issue…
a question to ask: how could I detect if the current session is --math-mode=fast or not? thanks.