View and boolean indexing

This old thread brought up an important question about why view(x,boolean_indices) is slow. One reply said “So the TLDR is: Mixing views and logical indices is currently no good.”

It seems to be that we still have this issue. Any updates?

How could it be fast? If y = view(x,boolean_indices) does not allocate an index array (which is what it does), how could you get y[i] without looping sequentially through x? And the combination of allocating an index array and doing a double indirection on every getindex seems almost guaranteed to be slow…

Thanks. Probably deserves a comment in the docs. (Maybe there is one, but it has so far escaped me.)

For my type of applications (statistics, where some obs are invalid and flagged by a bool), this means that view() isn’t very interesting. That’s OK, although it causes some issues in a threaded setting (where allocations are particularly bad, it seems).

On 2nd thought, I realise that view(x::Matrix,inds,:) is actually very useful even if inds is a bitvector. Relatively small allocations (essentially the same as for a vector x) and quick.

If you need consecutive array-like indices, there’s no way around allocating an index array — that’s what view() does.

If you only need iteration, and no indexing at all, Iterators.filter() can help.

If you do need indexing, and keeping the same indices as in the original array is fine, look at allocation-free skip():

julia> using Skipper

julia> data = [
    (val=1, isbad=false),
    (val=2, isbad=true),
    (val=3, isbad=false)
]

julia> data_good = skip(x -> x.isbad, data)

julia> length(data_good)
2

julia> collect(eachindex(data_good))
2-element Vector{Int64}:
 1
 3

julia> data_good[3]
(val = 3, isbad = false)
2 Likes