Is this a bug of the Julia function "unique"?

Doesn’t grow with the PR so it’s probably a manifestation of the same issue. Note that B = unique(A, dims=1) also grows on master as shown by the OP.

2 Likes

Nice! I missed the 4 vs 3 rows in the OP and just thought it didn’t reduce dimensionality at all.

You mean they made a cast, but took no measurements? In that case, you’d use missing, since a value would exist but you don’t have it. The resulting container would have mixed type, like Union{Float64, Missing} for example if all other data is Float64.

1 Like

So Julia behaves sometimes IEEE 754 conformant and other times not? To put it mildly, I’m not very pleased.

Different functions have different semantics. == strictly follows IEEE 754 semantics, isequal doesn’t and it’s well documented what it does.

6 Likes

all documented here Mathematical Operations and Elementary Functions · The Julia Language

Julia provides additional functions to test numbers for special values, which can be useful in situations like hash key comparisons

2 Likes

@kristoffer.carlsson, is there a policy of backporting, or not, bugfixes, such as that PR of yours? If not, is it too late for 1.7, because we would really want bugs fixed (in the latest version), if this is for sure correct.

I tested your code (or I think its equivalent):

julia> A = [1 NaN 3; 1 NaN 3; 1 2 3];

julia> B = nunique(A, dims=2)
3×3 Matrix{Float64}:
 1.0  NaN    3.0
 1.0  NaN    3.0
 1.0    2.0  3.0

julia> B = nunique(A, dims=1)
2×3 Matrix{Float64}:
 1.0  NaN    3.0
 1.0    2.0  3.0

Your change also got this working (correctly?):

julia> B = nunique(C, dims=2)
3×3 Matrix{Union{Missing, Int64}}:
 1   missing  3
 1   missing  3
 1  2         3

Before:

julia> B = unique(C, dims=2)
ERROR: TypeError: non-boolean (Missing) used in boolean context

Looking at your code (or actually the full function at Github, and copy/pasting from there), I get annoying extra letters… not showing if I quote the code, so I don’t do that:

@generated function _unique_dims(A::AbstractArray{T,N}, dim::Integer) where {T,N}
quote
[…]
k = i_d
j_d = uniquerow[k]
 else
 j_d = i_d

there’s also other strangeness in the REPL (scrolling back), as if I copy/pasted a lot of times, but I didn’t… so far no crash.

That part seems right (so not funny?!).

Have you noticed the backport 1.6 and backport 1.7 labels that had been already added to the PR? :wink:

4 Likes

The made a cast and did the measurement, but forgot to write down the cast number.

The thing is that a lot of the time, such information is pulled from other sources, e.g., Matlab, etc., which uses NaN.

Can you sanitise the input and convert NaN to missing? Or are there some NaNs that aren’t missing values? Following what Matlab does because they can’t do better doesn’t seem a good argument.

4 Likes

That’s what I’m going to do. Many thanks!

well, it’s a bug, and is being fixed.

3 Likes