Elegant way of exclusive indexing

In R, you can index all elements except for the second, by negative indexing: myvector[-2]. This is especially usefull if you want to make two complete subsets of a vector:

set1 = myvector[inds]
set2 = myvector[-inds]

What do you find to be the best way of implementing “negative indexing”?

I’ll start with my current inelegant solution:

julia> myvector = 1:10|>vec
1:10

julia> inds = [2, 5, 6]
3-element Vector{Int64}:
 2
 5
 6

julia> set1 = myvector[inds]
3-element Vector{Int64}:
 2
 5
 6

julia> set2 = myvector[[i ∉ inds for i in eachindex(myvector)]]
7-element Vector{Int64}:
  1
  3
  4
  7
  8
  9
 10

https://github.com/JuliaData/InvertedIndices.jl

6 Likes

You can use the InvertedIndices.jl package:

julia> using InvertedIndices

julia> v = 1:10;

julia> inds = [2, 5, 6];

julia> v[inds]
3-element Vector{Int64}:
 2
 5
 6

julia> v[Not(inds)]
7-element Vector{Int64}:
  1
  3
  4
  7
  8
  9
 10
11 Likes

Thanks for the reference! It is nice that this problem is already solved by a package. However, I feel that adding a package dependency seems a bit much for such a simple task, especially if it is only needed a single time.

The following example uses the Base function setdiff, which I found to be an improvement on my initial attempt:

julia> set2 = myvector[setdiff(eachindex(myvector), inds)]
7-element Vector{Int64}:
  1
  3
  4
  7
  8
  9
 10

You can use elementwise if you wrap the second collection in a single element iterable:

julia> set2 = myvector[eachindex(myvector) .∉ [inds]]
7-element Vector{Int64}:
  1
  3
  4
  7
  8
  9
 10

julia> set2 = myvector[eachindex(myvector) .∉ (inds,)]
7-element Vector{Int64}:
  1
  3
  4
  7
  8
  9
 10

a slight refinement

v[∉(inds).(eachindex(v))]
v[∉(inds).(1:end)]
4 Likes

If you want to iterate (quickly) on these elements, without allocating an intermediate array, you can use Iterators.filter:

julia> x = collect(1:10);

julia> inds = [2,5,6];

julia> for i in Iterators.filter(!in(inds),eachindex(x))
           @show x[i]
       end
x[i] = 1
x[i] = 3
x[i] = 4
x[i] = 7
x[i] = 8
x[i] = 9
x[i] = 10

1 Like

There is also: deleteat!(collect(v), inds) (or copy instead of collect if v is already a Vector).

3 Likes

For what it’s worth, InvertedIndices.jl only defines and exports the InvertedIndex type (also aliased as Not) and itself has no dependencies. I generally consider small self-contained packages like this as not actually being dependencies because they don’t really slow down precompilation or importing and there is no chance that they will break compatibility with other packages.

8 Likes

The InvertedIndices.jl solution seems to allocate a lot and to be much slower than the other solutions posted.

The deleteat!() is the fastest but requires the indices to be sorted.

Well it’s technically a (fast to load) dependency:

julia> @time using InvertedIndices
  0.019317 seconds (5.43 k allocations: 422.184 KiB, 54.67% compilation time)

While needing it may not be too common, I wander if it or something similar should be a Julia stdlib. Mostly for discovery. and, or and I think not have been suggested for Julia 2.0 (as aliases, or by me with slightly different semantics).

I’m not sure upper-cased Not is already claimed by some other package, also seems like a bad variable name…

If this where to be merged into Julia, then better sooner rather than later? The API could be the same, but clearly the speed could be improved. If not considered, put a link to it in the official docs, mention in

https://docs.julialang.org/en/v1/manual/noteworthy-differences/#Noteworthy-differences-from-R

Just curious, how would you do the same in Python?

in the following function I tried to make the most of the info that inds is sorted.
Apart from the use of specific information what else can be done to improve the execution time?


function dat(v,idx)
mid=Int[]
pre=@view v[1:inds[1]-1]
post=@view v[inds[end]+1:end]
shinds=inds[2:end-1] .-inds[1]
@inbounds for (i,e) in enumerate(v[inds[1]+1:inds[end]-1])
        if i ∉ shinds
            push!(mid,e)
        end
    end
[pre; mid; post]
end

A performant and really flexible solution is to use Accessors.jl: @delete A[123] returns a copy of A without element at index 123.
It has the same performance as manual selection:

julia> A = rand(1000);

# manual
julia> @btime $A[[1:122; 124:end]];
  1.270 μs (2 allocations: 15.88 KiB)

# Accessors.jl
julia> @btime @delete A[123];
  1.377 μs (3 allocations: 16.00 KiB)

# InvertedIndices.jl: 100x slower!
julia> @btime $A[Not(123)];
  165.270 μs (6097 allocations: 183.12 KiB)
2 Likes

@aplavin, how do we apply @delete to an array of indices as in OP?

Sorry, I didn’t read in enough details first, and only noticed the myvector[-2] example with a single index. Indeed, @delete only works for a single index now, saying as an author of its implementation (: Basically, I only needed single indices myself, and didn’t even think of more general cases here. The implementation is extremely simple though: https://github.com/JuliaObjects/Accessors.jl/blob/master/src/optics.jl#L412-L415, and I think extension PRs would be welcome.

1 Like

These are my takes:

Preserving the original array:

julia> function f(x,inds)
           y = Vector{eltype(x)}(undef,length(x) - length(inds))
           j = 0
           for i in Iterators.filter(!in(inds),eachindex(x))
               j += 1
               @inbounds y[j] = x[i]
           end
           return y
       end
f (generic function with 1 method)

julia> x = collect(1:10); inds = [3,5,6];

julia> @btime f($x,$inds)
  72.919 ns (1 allocation: 112 bytes)
7-element Vector{Int64}:
  1
  2
  4
  7
  8
  9
 10

Modifying the original array:

julia> function f!(x,inds)
           for i in Iterators.reverse(Iterators.filter(in(inds),eachindex(x)))
               deleteat!(x,i)
           end
           return x
       end
f! (generic function with 1 method)

julia> @btime f!(x,$inds) setup=(x=copy($x)) evals=1
  121.000 ns (0 allocations: 0 bytes)
7-element Vector{Int64}:
  1
  2
  4
  7
  8
  9
 10

@aplavin, yet @rfourquet’s deleteat!() solution still beats it for a single index:

A = rand(1000);
@btime deleteat!(copy($A), 123)  # 553 ns (1 allocation: 7.94 KiB)
@btime @delete $A[123];          # 970 ns (3 allocations: 16.00 KiB)

While I’ve probably seen your package mentioned before, I would have never thought of looking at it for exclusive indexing. Nor from just a quick look at the README now. But it’s good to know of all these alternative ways to do this, hopefully we’ll get at a best way, for more than one index AND at the same time best syntax and something viable to merge into Julia.

The criteria to include in Julia is that it’s helpful for Julia itself. Well, I don’t see it off-hand how relevant it is… seems they can do without (or not?). At least I would like to know the full syntax of the most important competitors, R, Python, and MATLAB, and how they map to Julia in once place.

Please help with Julia’s (unofficial) wikibook and/or PR to Julia’s docs. I’m not sure how wanted it is to list every difference in the official docs. The former is also good to know of and maintain, easier to do than the official docs:

https://en.wikibooks.org/wiki/Introducing_Julia/Migrating_From_Other_Languages

1 Like

Thanks, didn’t know that. Now I might make a PR to Accessors.jl with this performance improvement, as I often use its delete function.

1 Like

A = rand(1000);
@btime deleteat!(copy($A), 123) # 553 ns (1 allocation: 7.94 KiB)
@btime @delete $A[123]; # 970 ns (3 allocations: 16.00 KiB)

the following function also manages index arrays and has intermediate times with respect to those of the two functions above

function delat(v,idx)
mid=eltype(v)[]
shinds=idx[1:end] .-idx[1].+1
l=idx[1]
r=idx[end]
@inbounds for (i,e) in enumerate(v[l:r])
        if i ∉ shinds
            push!(mid,e)
        end
    end
[@view v[1:idx[1]-1]; mid; @view v[idx[end]+1:end]]
end