Keep array elements that are NOT in a list of indexes

Suppose I have a list of indexes mylist, is there an equivalent of

setdiff(myarray, myarray[mylist])

but using directly the fact that I know which elements are to the eliminated (through mylist) in order to be more efficient?

Edit: I can indeed assume mylist is sorted so that deleteat! can be used.

You can use

filter(!in(myarray[mylist])), myarray)
1 Like

That deletes elements of myarray whose values match those in myarray[mylist], which might include elements at other indices.

If you only want to filter by the indices (not values), I would use something like:

 myarray[filter(!in(mylist), eachindex(myarray))]

or

 myarray[filter(!in(Set(mylist)), eachindex(myarray))]

if mylist is big enough that it’s worth converting to a Set.

1 Like

You can also use InvertedIndices.jl. I think it should be one of the more efficient methods these days.

using InvertedIndices: Not
myarray[Not(mylist)]
4 Likes

I would be careful here — if that mylist is a bunch of integers and you happen to have an array that unexpectedly uses Cartesian indices, this will silently not skip anything.

You probably want to be explicit and ask for LinearIndices or CartesianIndices rather than eachindex (unless of course your mylist set is built using stuff out of eachindex).

3 Likes

Often mylist would be sorted, and then the algorithm can be a merge-like and faster. For example:

list = collect(1:10)
mylist = [2,3,5,7]

function slash(mylist, list)
    foldl(eachindex(list); init=(sizehint!(eltype(list)[], length(list)-length(mylist)), 1)) do (res,j),i
    if j > length(mylist) || i != mylist[j]
        push!(res, list[i])
        return (res, j)
    else
        return (res, j+1)
    end
    end |> first
end

@btime slash($mylist, $list)
#  29.972 ns (2 allocations: 128 bytes)

function slash2(mylist, list)
    list[filter(!in(Set(mylist)), eachindex(list))]
end

@btime slash2($mylist, $list)
#  135.140 ns (10 allocations: 688 bytes)

The example shows slash which assumes presorting is 4x faster than slash2.

(BTW slash is needlessly using foldl, a simple for would be easier)

Ideally, a mylist of indices which are sorted could be storred in a SortedVector type which would automatically dispatch to a faster method.

1 Like

InvertedIndices is better than it was before, but still tends to be several times slower than deleteat!(copy(A), I) (just Base, no dependencies) or @delete A[I] (Accessors).

4 Likes
julia> using InvertedIndices: Not

julia> using BenchmarkTools

julia> v=rand(1:1000,100) 
100-element Vector{Int64}:
 549
 342
 738
   8
 616
 697
   ⋮
  94
 986
 764

julia> ei=[3,7,13,26,44,57,65,87,98]
9-element Vector{Int64}:
  3
  7
 13
 26
 44
 57
 65
 87
 98

julia> function slicesort(v,ei)
           d=Vector{Int}(undef,length(v)-length(ei))
           doff=1
           soff=1
          for i in ei
               N=i-soff
               unsafe_copyto!(d,doff,v,soff,N)
               soff=i+1
               doff+=N
           end
           copyto!(d,doff,v,soff,length(v)-soff+1)
          d
       end
slicesort (generic function with 1 method)

julia> @btime [v[i] for i in eachindex($v) if i ∉  $ei];
  3.000 μs (57 allocations: 2.80 KiB)

julia> @btime   v[Not(ei)];
  259.941 ns (2 allocations: 832 bytes)

julia> @btime  deleteat!(copy(v), ei);
  111.693 ns (1 allocation: 896 bytes)

julia> @btime slicesort(v,ei);
  96.093 ns (1 allocation: 816 bytes)

julia> v[Not(ei)]==deleteat!(copy(v),ei)==slicesort(v,ei)
true


2 Likes