# Elegant way of exclusive indexing

Perhaps the best solution from this thread could be implemented as a non-mutating version of `deleteat`?

2 Likes

I wish so; I tried a while ago to add a non-mutating `push` function, but there was some hesitation, like

someone might try `push` and be very confused why it doesnāt mutate its argument, but Iām not sure thatās a good reason not to have this

or:

What I worry about here is people writing a = push(a::Array, x) even though it is insanely slow.

1 Like

some measures on different variants of some of the proposed functions

``````using BenchmarkTools

x = collect(1:10^6); inds = [300,500,600];

function f(x,inds)
y = Vector{eltype(x)}(undef,length(x) - length(inds))
j = 0
for i in Iterators.filter(!in(inds),eachindex(x))
j += 1
@inbounds y[j] = x[i]
end
return y
end
function f1(x,inds)
y = Vector{eltype(x)}(undef,length(x) - length(inds))
j = 0
for i in eachindex(x)
if !in(inds)(i)
j += 1
@inbounds y[j] = x[i]
end
end
return y
end
function f2(x,inds)
y = Vector{eltype(x)}(undef,length(x) - length(inds))
j = 0
for i in eachindex(x)
!in(inds)(i) || begin j+=1; y[j] = x[i] end
end
return y
end

function delat(v,idx)
mid=eltype(v)[]
shinds=idx[1:end] .-idx[1].+1
l=idx[1]
r=idx[end]
@inbounds for (i,e) in enumerate(v[l:r])
if i ā shinds
push!(mid,e)
end
end
[@view v[1:idx[1]-1]; mid; @view v[idx[end]+1:end]]
end

function delat1(v,idx)
mid=Int64[]
rng=idx[1]:idx[end]
for i in Iterators.filter(!in(inds),rng)
push!(mid, v[i])
end
[@view v[1:idx[1]-1] ; mid; @view v[idx[end]+1:end]]
end

@btime f(\$x,\$inds);                 # 9.532 ms (2 allocations: 7.63 MiB)
@btime f1(\$x,\$inds);                # 3.285 ms (2 allocations: 7.63 MiB)
@btime f2(\$x,\$inds);                # 1.773 ms (2 allocations: 7.63 MiB)
@btime delat(\$x,\$inds);             # 2.885 ms (19 allocations: 7.64 MiB)
@btime delat1(\$x,\$inds);            # 2.979 ms (407 allocations: 7.65 MiB)
@btime deleteat!(copy(\$x),\$inds);   # 3.260 ms (2 allocations: 7.63 MiB)
``````
1 Like
``````
function delat2(v,inds)
sind=sort(inds)
res=v[1:sind[1]-1]
@inbounds for i in 2:length(sind)
append!(res,v[sind[i-1]+1:sind[i]-1])
end
append!(res,v[sind[end]+1:end])
end
``````

I agree with these concern on the specific case of `push!`.

1. `push!` is very widespread, lots of first-step tutorials and code samples mention it. Many noviceās codes will use it and very often inside loops. The bang notation is not that common and depending on previous languages is very easy to forget.
2. While a `push` (non-mutating) makes sense theoretically, in practice is something you will almost never want to do, as it is incredibly wasteful. Functional languages like Haskell have a non-mutating `pushfront` over single-linked lists and because with immutability this can be reasonably performant, but a `push` like this is something hardly idiomatic anywhere. Calling `append!([element], old_vector)` or `push!(copy(old_vector), element)` is almost as easy, probably have a very similar performance to a smarter implementation, and makes intent very clear.

About `deleteat` these arguments get a little weaker, but making a function with delete in the name non-mutating is a little odd. Why not an entirely new name?

`f2` is very fast for short `inds` vectors, but slows down dramatically for longer ones, since the membership check get needlessly expensive. Sorting `inds` and iterating between them is likely to have better asymptotic efficiency (shouldnāt use repeated `append!`s, though.)

StaticArrays has a non-mutating `push` function.

1 Like

hi @DNF !

by ālonger onesā, you mean the `inds` vector, right?
In fact, if the order of `inds` is not used, the size of this makes the search inefficient.

are you referring to delat2 ()?
what would be better to use instead of the repeated `append!`?

Yes, Iām not sure what else it could refer to;)

Searching the entire `inds` vector at every step shouldnāt be necessary.

Yes. That function will repeatedly resize and reallocate the output vector. It will also allocate a vector for each slice into `v`. Perhaps you can avoid this with `sizehint!` and `view`, otherwise you can just copy the elements over one by one.

To me this is the obvious implementation, assuming `inds` are sorted and unique:

``````function deleteat(x::Vector, inds)
y = similar(x, length(x) - length(inds))
i1 = 1
j = 1
for i2 in inds
if i2 > i1
n = i2 - i1
copyto!(y, j, x, i1, n)
j += n
end
i1 = i2 + 1
end
if inds[end] < length(x)
n = length(x) - inds[end]
copyto!(y, j, x, inds[end] + 1, n)
end
return y
end
``````
2 Likes

@GunnarFarneback, the `deleteat!()` solution still seems to be the most performant?

``````x = rand(1000)
@btime deleteat!(copy(\$x), 1:2:1000)  # 1.020 Ī¼s (1 allocation: 7.94 KiB)
@btime deleteat(\$x, 1:2:1000)         # 1.930 Ī¼s (1 allocation: 4.06 KiB)
``````

Itās data dependent. Benchmark with data that is representative to your application. (Or if you want the full picture, with a large variety of sizes and index distributions.) Another anecdotal sample:

``````julia> x = collect(1:10^6);

julia> inds = 1:10:10^6;

julia> @btime deleteat!(copy(\$x), \$inds);
1.278 ms (2 allocations: 7.63 MiB)

julia> @btime deleteat(\$x, \$inds);
777.156 Ī¼s (2 allocations: 6.87 MiB)
``````
2 Likes

@GunnarFarneback, thank you for all the learnings and great codes.

It seems we can get a good speed up by using `unsafe_copyto!()` instead of `copyto!()`. Is it alright to do it in this case, or you wouldnāt recommend it?

It seems to be about as unsafe as `@inbounds`, so it basically depends on whether you trust my code to index correctly in all cases.

1 Like

I was talking in the context of plain old `Vector`, `StaticArrays` is not recommended for more than 100 elements, and the user will have done a conscious decision of choosing a non-Base type. But more important, `StaticArrays.push` will fail for `Vector`, it only works on its own types, avoiding big mistakes.

Thereās boolean negative indexing that sort of looks like your R example.

``````julia> R = rand(5, 1)
5Ć1 Matrix{Float64}:
0.2505445363095371
0.9314598357409927
0.8421641479557949
0.3357689166072829
0.8827575785297642

julia> B = BitArray(undef, 5, 1);

julia> inds = [2, 5]
2-element Vector{Int64}:
2
5

julia> B[inds] .= 1;

julia> R[B]
2-element Vector{Float64}:
0.9314598357409927
0.8827575785297642

julia> R[.!B]
3-element Vector{Float64}:
0.2505445363095371
0.8421641479557949
0.3357689166072829
``````

I think you probably want vectors instead of Nx1 matrices. The assignment operation is several times faster if you use

``````Rvec = rand(5)
Bvec = BitArray(undef, 5)
``````

``````R = rand(5, 1)
B = BitArray(undef, 5, 1)
``````
``````1.7.2> @btime (\$B[\$inds] .= 1);
36.117 ns (2 allocations: 96 bytes)

1.7.2> @btime (\$Bvec[\$inds] .= 1);
12.613 ns (0 allocations: 0 bytes)
``````

Vectors and Nx1 matrices are not the same in Julia.

(The same thing goes for other functions: `zeros`, `ones`, `fill`, etc. Use `zeros(5)`, not `zeros(5, 1)`, unless you very specifically need a matrix.)

1 Like

Negative boolean indexing is what all the expressions involving `ā` do, but in a one-liner. It seems to me like you are doing the same as @rocco_sprmnt21 suggested with `R[ā(inds).(1:end)]`. What is the advantage of your implementation?

As a side-note, the `1:end` is much shorter and easier to type than my `eachindex(my_vector)`. But `my_vector[ā(inds).(1:end)]` is hard to parse as a human. A happy marriage between the two would be

``````my_vector[1:end .ā [inds]]
``````

But this gets crushed in benchmarking by the also quite elegant `deleteat(copy(my_vector), inds)`:

``````julia> @btime v[1:end .ā [inds]]
595.506 ns (7 allocations: 368 bytes)
5-element Vector{Int64}:
2
4
6
8
10

julia> @btime deleteat!(copy(v), inds)
80.642 ns (1 allocation: 144 bytes)
5-element Vector{Int64}:
2
4
6
8
10
``````

Yes, but it is wrong in general. `begin:end` is better, though not sure if even that is completely generic (does it work with Star Wars indexing, for example?)

`eachindex`, otoh, should be completely generic.

Just looks. To me, R[-2] looks similar to R[.!B]. Given a solution to the thread had not been checked, I thought maybe the elegance being sought was stylish appearance.

1 Like

Fair point. The end command is pretty neat from your example, but there are a bit too many steps to get there, compared to alternatives.

I donāt feel like there is any single winner, so there is no single answer - there is rather a larger number of solutions that are all good enough, but none are perfect. I will personally probably use

``````deleteat!(my_vector|>copy, inds)
``````

, with piping allowing me to skip nested parenthesis (yuck).

I think that this reads well and is preformant. Additionally, it is completely general, without getting into `1:end` vs `begin:end` vs `eachindex(my_vector)`, all of which get increasingly general and long at the same time.

EDIT: Not completely general - the indices need to be sorted and unique for this. Luckily, `deleteat!` simply errors if you violate this, and `sort` and `unique` let you handle it without problem.