Improvement for slicing 2d arrays?

Consider the following 2d array:

X = Array{Float64, 2}(undef, 4, 2)
X[:,1] = [1,NaN,3,4]
X[:,2] = [2,3,4,5]

I want to filter out all rows which include at least one NaN. This could be done by instantiating a “mask” vector:

idx = .!any(isnan.(X), dims=2);
X[idx, :]

However the second line errs:

ERROR: BoundsError: attempt to access 4×2 Array{Float64,2} at index [Bool[1; 0; 1; 1], 1:1]

I think what is happening is that X[idx, :] expects that idx is either one-dimensional or two-dimensional with the same dimensions as X. Since the type is BitArray{2} then Julia concludes that we’re in the second scenario. To get around this I had to do:

idx_1d = reshape(idx, size(idx)[1])
X[idx_1d, :]

Now idx_1d is type BitArray{1} and this works, but is ugly. Given that this is such a normal use case for slicing, my suggestion is that slicing should consider the first scenario if BitArray{2} and the size is 1 in the second dimension. Admittedly I haven’t considered what side-effects this could cause elsewhere, but I thought I’d share this observation.

I think the indexing question is tricky. But you might also consider writing idx = map(r -> !any(isnan, r), eachrow(X)) or more compactly idx = .!any.(isnan, eachrow(X)).

3 Likes

I think this is more about reductions with dims keyword arg not dropping the reduced dimension. There’s a somewhat long discussion about it in the relevant github issue: array reductions (sum, mean, etc.) and dropping dimensions · Issue #16606 · JuliaLang/julia · GitHub.
In this specific case, I think rewriting it as @mcabbott has suggested is the nicest solution. But more generally, you can just drop the extra dimension yourself:

idx = .!any(isnan.(X), dims=2)[:,1]

or

idx = dropdims(.!any(isnan.(X), dims=2), dims=2)

a bit ugly too, but clearer than reshape, I think.

1 Like