Suppose I have a list of indexes mylist
, is there an equivalent of
setdiff(myarray, myarray[mylist])
but using directly the fact that I know which elements are to the eliminated (through mylist
) in order to be more efficient?
Edit: I can indeed assume mylist is sorted so that deleteat! can be used.
You can use
filter(!in(myarray[mylist])), myarray)
1 Like
That deletes elements of myarray
whose values match those in myarray[mylist]
, which might include elements at other indices.
If you only want to filter by the indices (not values), I would use something like:
myarray[filter(!in(mylist), eachindex(myarray))]
or
myarray[filter(!in(Set(mylist)), eachindex(myarray))]
if mylist
is big enough that it’s worth converting to a Set
.
1 Like
You can also use InvertedIndices.jl. I think it should be one of the more efficient methods these days.
using InvertedIndices: Not
myarray[Not(mylist)]
4 Likes
I would be careful here — if that mylist
is a bunch of integers and you happen to have an array that unexpectedly uses Cartesian indices, this will silently not skip anything .
You probably want to be explicit and ask for LinearIndices
or CartesianIndices
rather than eachindex
(unless of course your mylist
set is built using stuff out of eachindex
).
3 Likes
Dan
February 4, 2025, 11:41pm
6
Often mylist
would be sorted, and then the algorithm can be a merge-like and faster. For example:
list = collect(1:10)
mylist = [2,3,5,7]
function slash(mylist, list)
foldl(eachindex(list); init=(sizehint!(eltype(list)[], length(list)-length(mylist)), 1)) do (res,j),i
if j > length(mylist) || i != mylist[j]
push!(res, list[i])
return (res, j)
else
return (res, j+1)
end
end |> first
end
@btime slash($mylist, $list)
# 29.972 ns (2 allocations: 128 bytes)
function slash2(mylist, list)
list[filter(!in(Set(mylist)), eachindex(list))]
end
@btime slash2($mylist, $list)
# 135.140 ns (10 allocations: 688 bytes)
The example shows slash
which assumes presorting is 4x faster than slash2
.
(BTW slash
is needlessly using foldl
, a simple for
would be easier)
Ideally, a mylist
of indices which are sorted could be storred in a SortedVector type which would automatically dispatch to a faster method.
1 Like
aplavin
February 4, 2025, 11:56pm
7
InvertedIndices is better than it was before, but still tends to be several times slower than deleteat!(copy(A), I)
(just Base, no dependencies) or @delete A[I]
(Accessors).
4 Likes
julia> using InvertedIndices: Not
julia> using BenchmarkTools
julia> v=rand(1:1000,100)
100-element Vector{Int64}:
549
342
738
8
616
697
⋮
94
986
764
julia> ei=[3,7,13,26,44,57,65,87,98]
9-element Vector{Int64}:
3
7
13
26
44
57
65
87
98
julia> function slicesort(v,ei)
d=Vector{Int}(undef,length(v)-length(ei))
doff=1
soff=1
for i in ei
N=i-soff
unsafe_copyto!(d,doff,v,soff,N)
soff=i+1
doff+=N
end
copyto!(d,doff,v,soff,length(v)-soff+1)
d
end
slicesort (generic function with 1 method)
julia> @btime [v[i] for i in eachindex($v) if i ∉ $ei];
3.000 μs (57 allocations: 2.80 KiB)
julia> @btime v[Not(ei)];
259.941 ns (2 allocations: 832 bytes)
julia> @btime deleteat!(copy(v), ei);
111.693 ns (1 allocation: 896 bytes)
julia> @btime slicesort(v,ei);
96.093 ns (1 allocation: 816 bytes)
julia> v[Not(ei)]==deleteat!(copy(v),ei)==slicesort(v,ei)
true
2 Likes