# Is this a bug of the Julia function "unique"?

I notice that any NaN values inside a data will mess up Julia’s unique function.

For example I have a matrix A with 3 duplicate rows:

``````A = [1 NaN 3; 1 NaN 3; 1 NaN 3];
3×3 Matrix{Float64}:
1.0  NaN  3.0
1.0  NaN  3.0
1.0  NaN  3.0
``````

If we apply `unique` to it as below:

``````B = unique(A, dims=1)
``````

We would expect the B to be like this:

``````1×3 Matrix{Float64}:
1.0  NaN  3.0
``````

But in reality, the B generated by Julia is as below:

``````4×3 Matrix{Float64}:
1.0  NaN  3.0
1.0  NaN  3.0
1.0  NaN  3.0
1.0  NaN  3.0
``````
1 Like

I suspect this is because

``````julia> NaN == NaN
false
``````

, which is just regular floating point behavior. What I find more disturbing is this:

``````julia> unique(A, dims=2)
3×4 Matrix{Float64}:
1.0  NaN  NaN  3.0
1.0  NaN  NaN  3.0
1.0  NaN  NaN  3.0
``````

That should definitely not happen. Do you mind opening an issue for this on the issue tracker?

4 Likes

Without words;)

``````julia> NaN == NaN
false
``````
1 Like
``````julia> A = [1 NaN 3; 1 NaN 3; 1 NaN 3]
3×3 Matrix{Float64}:
1.0  NaN  3.0
1.0  NaN  3.0
1.0  NaN  3.0

julia> unique(A)
3-element Vector{Float64}:
1.0
NaN
3.0
``````

`:rofl:`

1 Like

The single-argument version probably just iterates and uses `isequal`, while the `dims` version is more complicated and creates a hash per dimension - it’s unclear to me whether that is intended behavior for the multidimensional version, though it does make sense when you take `NaN != NaN` into account

2 Likes

Will do.

Thanks for pointing to that direction.

That’s what the IEEE 754 standard mandates: floating point - What is the rationale for all comparisons returning false for IEEE754 NaN values? - Stack Overflow.

7 Likes

Sorry for being sarcastic, I knew. Similar problem to SQL NULL values.

1 Like

I don’t think it is sarcastic. We’re doing Julia a favor after all. Anyone who truly loves Julia would want to help get these issues fixed.

this `(NaN != NaN)` is NOT a bug, IEEE standards require it.

Don’t use `NaN` for this, use `nothing` maybe

5 Likes

No, this is expected behavior for `NaN`. That `unique(A, dims=1)` does not use `isequal` for this may be an issue, but `NaN != NaN` is very much intended. It works the same in other languages that use IEEE floating points, though their `unique` may have a different interpretation.

3 Likes

I can confirm that Matlab does not have the same “unique” issue, despite the fact that NaN is also considered different from NaN:

``````>> NaN == NaN
ans =
logical
0
``````
1 Like

Right - the question is whether julia should change its behavior here and whether that change would be breaking (meaning it could be done in 2.0 at the earliest).

Until then, may I ask what you were using that `NaN` for/how you encountered this? Julia has seperate `missing` and `nothing` values, to model the absence of a value (though one should exist, we just don’t know it) and the knowledge of absence (i.e. there is no value to represent the result). It does not rely on having to use `NaN` for a purpose it was never meant to be used for. See the docs for more information:

https://docs.julialang.org/en/v1/manual/missing/

2 Likes

That ` NaN == NaN` is false is maybe logical?
I.e., not being something does not imply being the same thing. Example:

``````NaN == NaN      # false
x = 0/0         # NaN
y = Inf/Inf     # NaN
x == y          # false --> nice
``````
2 Likes

I’m processing some oceanographic data right now. When people do a CTD cast at a particular sampling station, there is a Cast number associated with it. Sometimes, people leave it blank when there is only a cast.

So you’re saying that you’re using `NaN` to represent “blank” values?

It is a bug because `unique` is documented to use `isequal` and

``````julia> isequal([1, NaN, 3], [1, NaN, 3])
true
``````
12 Likes

Great! Then I’ll also open an issue about the `unique(A, dims=2)` oddity.

Isn’t that also fixed by the PR?

I haven’t checked, but the array grows for that case:

``````julia> A = [1 NaN 3; 1 NaN 3; 1 NaN 3]
3×3 Matrix{Float64}:
1.0  NaN  3.0
1.0  NaN  3.0
1.0  NaN  3.0

julia> unique(A, dims=2)
3×4 Matrix{Float64}:
1.0  NaN  NaN  3.0
1.0  NaN  NaN  3.0
1.0  NaN  NaN  3.0
``````

and the unit test doesn’t cover that I guess I don’t see why `==` vs `isequal` should produce more values per row than existed previously.

1 Like