# Improvement for slicing 2d arrays?

Consider the following 2d array:

``````X = Array{Float64, 2}(undef, 4, 2)
X[:,1] = [1,NaN,3,4]
X[:,2] = [2,3,4,5]
``````

I want to filter out all rows which include at least one `NaN`. This could be done by instantiating a “mask” vector:

``````idx = .!any(isnan.(X), dims=2);
X[idx, :]
``````

However the second line errs:

``````ERROR: BoundsError: attempt to access 4×2 Array{Float64,2} at index [Bool[1; 0; 1; 1], 1:1]
``````

I think what is happening is that `X[idx, :]` expects that `idx` is either one-dimensional or two-dimensional with the same dimensions as `X`. Since the type is `BitArray{2}` then Julia concludes that we’re in the second scenario. To get around this I had to do:

``````idx_1d = reshape(idx, size(idx)[1])
X[idx_1d, :]
``````

Now `idx_1d` is type `BitArray{1}` and this works, but is ugly. Given that this is such a normal use case for slicing, my suggestion is that slicing should consider the first scenario if `BitArray{2}` and the size is 1 in the second dimension. Admittedly I haven’t considered what side-effects this could cause elsewhere, but I thought I’d share this observation.

I think the indexing question is tricky. But you might also consider writing `idx = map(r -> !any(isnan, r), eachrow(X))` or more compactly `idx = .!any.(isnan, eachrow(X))`.

3 Likes

I think this is more about reductions with `dims` keyword arg not dropping the reduced dimension. There’s a somewhat long discussion about it in the relevant github issue: array reductions (sum, mean, etc.) and dropping dimensions · Issue #16606 · JuliaLang/julia · GitHub.
In this specific case, I think rewriting it as @mcabbott has suggested is the nicest solution. But more generally, you can just drop the extra dimension yourself:

``````idx = .!any(isnan.(X), dims=2)[:,1]
``````

or

``````idx = dropdims(.!any(isnan.(X), dims=2), dims=2)
``````

a bit ugly too, but clearer than `reshape`, I think.

1 Like