Elegant way of exclusive indexing

TheLateKronos · February 22, 2022, 9:24am

In R, you can index all elements except for the second, by negative indexing: myvector[-2]. This is especially usefull if you want to make two complete subsets of a vector:

set1 = myvector[inds]
set2 = myvector[-inds]

What do you find to be the best way of implementing “negative indexing”?

I’ll start with my current inelegant solution:

julia> myvector = 1:10|>vec
1:10

julia> inds = [2, 5, 6]
3-element Vector{Int64}:
 2
 5
 6

julia> set1 = myvector[inds]
3-element Vector{Int64}:
 2
 5
 6

julia> set2 = myvector[[i ∉ inds for i in eachindex(myvector)]]
7-element Vector{Int64}:
  1
  3
  4
  7
  8
  9
 10

jar1 · February 22, 2022, 9:28am

https://github.com/JuliaData/InvertedIndices.jl

fredrikekre · February 22, 2022, 9:28am

You can use the InvertedIndices.jl package:

julia> using InvertedIndices

julia> v = 1:10;

julia> inds = [2, 5, 6];

julia> v[inds]
3-element Vector{Int64}:
 2
 5
 6

julia> v[Not(inds)]
7-element Vector{Int64}:
  1
  3
  4
  7
  8
  9
 10

TheLateKronos · February 22, 2022, 9:35am

Thanks for the reference! It is nice that this problem is already solved by a package. However, I feel that adding a package dependency seems a bit much for such a simple task, especially if it is only needed a single time.

The following example uses the Base function setdiff, which I found to be an improvement on my initial attempt:

julia> set2 = myvector[setdiff(eachindex(myvector), inds)]
7-element Vector{Int64}:
  1
  3
  4
  7
  8
  9
 10

TheLateKronos · February 22, 2022, 9:41am

You can use elementwise ∉ if you wrap the second collection in a single element iterable:

julia> set2 = myvector[eachindex(myvector) .∉ [inds]]
7-element Vector{Int64}:
  1
  3
  4
  7
  8
  9
 10

julia> set2 = myvector[eachindex(myvector) .∉ (inds,)]
7-element Vector{Int64}:
  1
  3
  4
  7
  8
  9
 10

rocco_sprmnt21 · February 22, 2022, 12:15pm

a slight refinement

v[∉(inds).(eachindex(v))]
v[∉(inds).(1:end)]

lmiq · February 22, 2022, 12:27pm

If you want to iterate (quickly) on these elements, without allocating an intermediate array, you can use Iterators.filter:

julia> x = collect(1:10);

julia> inds = [2,5,6];

julia> for i in Iterators.filter(!in(inds),eachindex(x))
           @show x[i]
       end
x[i] = 1
x[i] = 3
x[i] = 4
x[i] = 7
x[i] = 8
x[i] = 9
x[i] = 10

rfourquet · February 22, 2022, 12:30pm

There is also: deleteat!(collect(v), inds) (or copy instead of collect if v is already a Vector).

jonniedie · February 22, 2022, 2:35pm

For what it’s worth, InvertedIndices.jl only defines and exports the InvertedIndex type (also aliased as Not) and itself has no dependencies. I generally consider small self-contained packages like this as not actually being dependencies because they don’t really slow down precompilation or importing and there is no chance that they will break compatibility with other packages.

rafael.guerra · February 22, 2022, 3:19pm

The InvertedIndices.jl solution seems to allocate a lot and to be much slower than the other solutions posted.

The deleteat!() is the fastest but requires the indices to be sorted.

Palli · February 22, 2022, 5:08pm

Well it’s technically a (fast to load) dependency:

julia> @time using InvertedIndices
  0.019317 seconds (5.43 k allocations: 422.184 KiB, 54.67% compilation time)

While needing it may not be too common, I wander if it or something similar should be a Julia stdlib. Mostly for discovery. and, or and I think not have been suggested for Julia 2.0 (as aliases, or by me with slightly different semantics).

I’m not sure upper-cased Not is already claimed by some other package, also seems like a bad variable name…

If this where to be merged into Julia, then better sooner rather than later? The API could be the same, but clearly the speed could be improved. If not considered, put a link to it in the official docs, mention in

https://docs.julialang.org/en/v1/manual/noteworthy-differences/#Noteworthy-differences-from-R

Just curious, how would you do the same in Python?

rocco_sprmnt21 · February 22, 2022, 5:11pm

in the following function I tried to make the most of the info that inds is sorted.
Apart from the use of specific information what else can be done to improve the execution time?


function dat(v,idx)
mid=Int[]
pre=@view v[1:inds[1]-1]
post=@view v[inds[end]+1:end]
shinds=inds[2:end-1] .-inds[1]
@inbounds for (i,e) in enumerate(v[inds[1]+1:inds[end]-1])
        if i ∉ shinds
            push!(mid,e)
        end
    end
[pre; mid; post]
end

aplavin · February 22, 2022, 5:16pm

A performant and really flexible solution is to use Accessors.jl: @delete A[123] returns a copy of A without element at index 123.
It has the same performance as manual selection:

julia> A = rand(1000);

# manual
julia> @btime $A[[1:122; 124:end]];
  1.270 μs (2 allocations: 15.88 KiB)

# Accessors.jl
julia> @btime @delete A[123];
  1.377 μs (3 allocations: 16.00 KiB)

# InvertedIndices.jl: 100x slower!
julia> @btime $A[Not(123)];
  165.270 μs (6097 allocations: 183.12 KiB)

rafael.guerra · February 22, 2022, 5:21pm

@aplavin, how do we apply @delete to an array of indices as in OP?

aplavin · February 22, 2022, 5:26pm

Sorry, I didn’t read in enough details first, and only noticed the myvector[-2] example with a single index. Indeed, @delete only works for a single index now, saying as an author of its implementation (: Basically, I only needed single indices myself, and didn’t even think of more general cases here. The implementation is extremely simple though: https://github.com/JuliaObjects/Accessors.jl/blob/master/src/optics.jl#L412-L415, and I think extension PRs would be welcome.

lmiq · February 22, 2022, 5:26pm

These are my takes:

Preserving the original array:

julia> function f(x,inds)
           y = Vector{eltype(x)}(undef,length(x) - length(inds))
           j = 0
           for i in Iterators.filter(!in(inds),eachindex(x))
               j += 1
               @inbounds y[j] = x[i]
           end
           return y
       end
f (generic function with 1 method)

julia> x = collect(1:10); inds = [3,5,6];

julia> @btime f($x,$inds)
  72.919 ns (1 allocation: 112 bytes)
7-element Vector{Int64}:
  1
  2
  4
  7
  8
  9
 10

Modifying the original array:

julia> function f!(x,inds)
           for i in Iterators.reverse(Iterators.filter(in(inds),eachindex(x)))
               deleteat!(x,i)
           end
           return x
       end
f! (generic function with 1 method)

julia> @btime f!(x,$inds) setup=(x=copy($x)) evals=1
  121.000 ns (0 allocations: 0 bytes)
7-element Vector{Int64}:
  1
  2
  4
  7
  8
  9
 10

rafael.guerra · February 22, 2022, 5:29pm

@aplavin, yet @rfourquet’s deleteat!() solution still beats it for a single index:

A = rand(1000);
@btime deleteat!(copy($A), 123)  # 553 ns (1 allocation: 7.94 KiB)
@btime @delete $A[123];          # 970 ns (3 allocations: 16.00 KiB)

Palli · February 22, 2022, 5:37pm

While I’ve probably seen your package mentioned before, I would have never thought of looking at it for exclusive indexing. Nor from just a quick look at the README now. But it’s good to know of all these alternative ways to do this, hopefully we’ll get at a best way, for more than one index AND at the same time best syntax and something viable to merge into Julia.

The criteria to include in Julia is that it’s helpful for Julia itself. Well, I don’t see it off-hand how relevant it is… seems they can do without (or not?). At least I would like to know the full syntax of the most important competitors, R, Python, and MATLAB, and how they map to Julia in once place.

Please help with Julia’s (unofficial) wikibook and/or PR to Julia’s docs. I’m not sure how wanted it is to list every difference in the official docs. The former is also good to know of and maintain, easier to do than the official docs:

https://en.wikibooks.org/wiki/Introducing_Julia/Migrating_From_Other_Languages

aplavin · February 22, 2022, 5:38pm

Thanks, didn’t know that. Now I might make a PR to Accessors.jl with this performance improvement, as I often use its delete function.

rocco_sprmnt21 · February 22, 2022, 10:13pm

A = rand(1000);
@btime deleteat!(copy($A), 123) # 553 ns (1 allocation: 7.94 KiB)
@btime @delete $A[123]; # 970 ns (3 allocations: 16.00 KiB)

the following function also manages index arrays and has intermediate times with respect to those of the two functions above

function delat(v,idx)
mid=eltype(v)[]
shinds=idx[1:end] .-idx[1].+1
l=idx[1]
r=idx[end]
@inbounds for (i,e) in enumerate(v[l:r])
        if i ∉ shinds
            push!(mid,e)
        end
    end
[@view v[1:idx[1]-1]; mid; @view v[idx[end]+1:end]]
end

Topic		Replies	Views
Deleteat!(trues(3), [true, false, false]) got "ArgumentError: indices must be unique and sorted" General Usage bug	18	1119	June 20, 2020
Why isn't there a non-! version of deleteat!()? New to Julia abstractvector	8	471	December 6, 2024
In julia, how to realize a=b[-1:2] as in R without modifying b New to Julia	3	1091	January 12, 2017
Non-mutating deleteat/splice function General Usage	4	1628	May 25, 2018
Efficient (non-allocating, in-place) getindex! for BitArray? Performance question , indexing	19	1734	July 2, 2020

Elegant way of exclusive indexing

Related topics