julia> A = [1 NaN 3; 1 NaN 3; 1 NaN 3]
3×3 Matrix{Float64}:
1.0 NaN 3.0
1.0 NaN 3.0
1.0 NaN 3.0
julia> unique(A)
3-element Vector{Float64}:
1.0
NaN
3.0
The single-argument version probably just iterates and uses isequal, while the dims version is more complicated and creates a hash per dimension - it’s unclear to me whether that is intended behavior for the multidimensional version, though it does make sense when you take NaN != NaN into account
No, this is expected behavior for NaN. That unique(A, dims=1) does not use isequal for this may be an issue, but NaN != NaN is very much intended. It works the same in other languages that use IEEE floating points, though their unique may have a different interpretation.
Right - the question is whether julia should change its behavior here and whether that change would be breaking (meaning it could be done in 2.0 at the earliest).
Until then, may I ask what you were using that NaN for/how you encountered this? Julia has seperate missing and nothing values, to model the absence of a value (though one should exist, we just don’t know it) and the knowledge of absence (i.e. there is no value to represent the result). It does not rely on having to use NaN for a purpose it was never meant to be used for. See the docs for more information:
I’m processing some oceanographic data right now. When people do a CTD cast at a particular sampling station, there is a Cast number associated with it. Sometimes, people leave it blank when there is only a cast.
I haven’t checked, but the array grows for that case:
julia> A = [1 NaN 3; 1 NaN 3; 1 NaN 3]
3×3 Matrix{Float64}:
1.0 NaN 3.0
1.0 NaN 3.0
1.0 NaN 3.0
julia> unique(A, dims=2)
3×4 Matrix{Float64}:
1.0 NaN NaN 3.0
1.0 NaN NaN 3.0
1.0 NaN NaN 3.0
and the unit test doesn’t cover that I guess I don’t see why == vs isequal should produce more values per row than existed previously.