Elegant way of exclusive indexing

Perhaps the best solution from this thread could be implemented as a non-mutating version of deleteat?

2 Likes

I wish so; I tried a while ago to add a non-mutating push function, but there was some hesitation, like

someone might try push and be very confused why it doesnā€™t mutate its argument, but Iā€™m not sure thatā€™s a good reason not to have this

or:

What I worry about here is people writing a = push(a::Array, x) even though it is insanely slow.

1 Like

some measures on different variants of some of the proposed functions

using BenchmarkTools

x = collect(1:10^6); inds = [300,500,600];

function f(x,inds)
    y = Vector{eltype(x)}(undef,length(x) - length(inds))
    j = 0
    for i in Iterators.filter(!in(inds),eachindex(x))
        j += 1
        @inbounds y[j] = x[i]
    end
    return y
end
function f1(x,inds)
    y = Vector{eltype(x)}(undef,length(x) - length(inds))
    j = 0
    for i in eachindex(x)
        if !in(inds)(i)
        j += 1
        @inbounds y[j] = x[i]
        end
    end
    return y
end
function f2(x,inds)
    y = Vector{eltype(x)}(undef,length(x) - length(inds))
    j = 0
    for i in eachindex(x)
        !in(inds)(i) || begin j+=1; y[j] = x[i] end
    end
    return y
end

function delat(v,idx)
    mid=eltype(v)[]
    shinds=idx[1:end] .-idx[1].+1
    l=idx[1]
    r=idx[end]
    @inbounds for (i,e) in enumerate(v[l:r])
            if i āˆ‰ shinds
                push!(mid,e)
            end
        end
    [@view v[1:idx[1]-1]; mid; @view v[idx[end]+1:end]]
end
    

function delat1(v,idx)
    mid=Int64[]
    rng=idx[1]:idx[end]
    for i in Iterators.filter(!in(inds),rng)
        push!(mid, v[i])
    end
    [@view v[1:idx[1]-1] ; mid; @view v[idx[end]+1:end]]
end
    


@btime f($x,$inds);                 # 9.532 ms (2 allocations: 7.63 MiB)
@btime f1($x,$inds);                # 3.285 ms (2 allocations: 7.63 MiB)
@btime f2($x,$inds);                # 1.773 ms (2 allocations: 7.63 MiB)
@btime delat($x,$inds);             # 2.885 ms (19 allocations: 7.64 MiB)
@btime delat1($x,$inds);            # 2.979 ms (407 allocations: 7.65 MiB)
@btime deleteat!(copy($x),$inds);   # 3.260 ms (2 allocations: 7.63 MiB)
1 Like

function delat2(v,inds)
sind=sort(inds)
res=v[1:sind[1]-1]
@inbounds for i in 2:length(sind)
    append!(res,v[sind[i-1]+1:sind[i]-1])
end
append!(res,v[sind[end]+1:end])
end

I agree with these concern on the specific case of push!.

  1. push! is very widespread, lots of first-step tutorials and code samples mention it. Many noviceā€™s codes will use it and very often inside loops. The bang notation is not that common and depending on previous languages is very easy to forget.
  2. While a push (non-mutating) makes sense theoretically, in practice is something you will almost never want to do, as it is incredibly wasteful. Functional languages like Haskell have a non-mutating pushfront over single-linked lists and because with immutability this can be reasonably performant, but a push like this is something hardly idiomatic anywhere. Calling append!([element], old_vector) or push!(copy(old_vector), element) is almost as easy, probably have a very similar performance to a smarter implementation, and makes intent very clear.

About deleteat these arguments get a little weaker, but making a function with delete in the name non-mutating is a little odd. Why not an entirely new name?

f2 is very fast for short inds vectors, but slows down dramatically for longer ones, since the membership check get needlessly expensive. Sorting inds and iterating between them is likely to have better asymptotic efficiency (shouldnā€™t use repeated append!s, though.)

StaticArrays has a non-mutating push function.

1 Like

hi @DNF !

by ā€œlonger onesā€, you mean the inds vector, right?
In fact, if the order of inds is not used, the size of this makes the search inefficient.

are you referring to delat2 ()?
what would be better to use instead of the repeated append!?

Yes, Iā€™m not sure what else it could refer to;)

Searching the entire inds vector at every step shouldnā€™t be necessary.

Yes. That function will repeatedly resize and reallocate the output vector. It will also allocate a vector for each slice into v. Perhaps you can avoid this with sizehint! and view, otherwise you can just copy the elements over one by one.

To me this is the obvious implementation, assuming inds are sorted and unique:

function deleteat(x::Vector, inds)
    y = similar(x, length(x) - length(inds))
    i1 = 1
    j = 1
    for i2 in inds
        if i2 > i1
            n = i2 - i1
            copyto!(y, j, x, i1, n)
            j += n
        end
        i1 = i2 + 1
    end
    if inds[end] < length(x)
        n = length(x) - inds[end]
        copyto!(y, j, x, inds[end] + 1, n)
    end
    return y
end
2 Likes

@GunnarFarneback, the deleteat!() solution still seems to be the most performant?

x = rand(1000)
@btime deleteat!(copy($x), 1:2:1000)  # 1.020 Ī¼s (1 allocation: 7.94 KiB)
@btime deleteat($x, 1:2:1000)         # 1.930 Ī¼s (1 allocation: 4.06 KiB)

Itā€™s data dependent. Benchmark with data that is representative to your application. (Or if you want the full picture, with a large variety of sizes and index distributions.) Another anecdotal sample:

julia> x = collect(1:10^6);

julia> inds = 1:10:10^6;

julia> @btime deleteat!(copy($x), $inds);
  1.278 ms (2 allocations: 7.63 MiB)

julia> @btime deleteat($x, $inds);
  777.156 Ī¼s (2 allocations: 6.87 MiB)
2 Likes

@GunnarFarneback, thank you for all the learnings and great codes.

It seems we can get a good speed up by using unsafe_copyto!() instead of copyto!(). Is it alright to do it in this case, or you wouldnā€™t recommend it?

It seems to be about as unsafe as @inbounds, so it basically depends on whether you trust my code to index correctly in all cases.

1 Like

I was talking in the context of plain old Vector, StaticArrays is not recommended for more than 100 elements, and the user will have done a conscious decision of choosing a non-Base type. But more important, StaticArrays.push will fail for Vector, it only works on its own types, avoiding big mistakes.

Thereā€™s boolean negative indexing that sort of looks like your R example.

julia> R = rand(5, 1)
5Ɨ1 Matrix{Float64}:
 0.2505445363095371
 0.9314598357409927
 0.8421641479557949
 0.3357689166072829
 0.8827575785297642

julia> B = BitArray(undef, 5, 1);

julia> inds = [2, 5]
2-element Vector{Int64}:
 2
 5

julia> B[inds] .= 1;

julia> R[B]
2-element Vector{Float64}:
 0.9314598357409927
 0.8827575785297642

julia> R[.!B]
3-element Vector{Float64}:
 0.2505445363095371
 0.8421641479557949
 0.3357689166072829

I think you probably want vectors instead of Nx1 matrices. The assignment operation is several times faster if you use

Rvec = rand(5)
Bvec = BitArray(undef, 5)

instead of

R = rand(5, 1)
B = BitArray(undef, 5, 1)
1.7.2> @btime ($B[$inds] .= 1);
  36.117 ns (2 allocations: 96 bytes)

1.7.2> @btime ($Bvec[$inds] .= 1);
  12.613 ns (0 allocations: 0 bytes)

Vectors and Nx1 matrices are not the same in Julia.

(The same thing goes for other functions: zeros, ones, fill, etc. Use zeros(5), not zeros(5, 1), unless you very specifically need a matrix.)

1 Like

Negative boolean indexing is what all the expressions involving āˆ‰ do, but in a one-liner. It seems to me like you are doing the same as @rocco_sprmnt21 suggested with R[āˆ‰(inds).(1:end)]. What is the advantage of your implementation?

As a side-note, the 1:end is much shorter and easier to type than my eachindex(my_vector). But my_vector[āˆ‰(inds).(1:end)] is hard to parse as a human. A happy marriage between the two would be

my_vector[1:end .āˆ‰ [inds]]

But this gets crushed in benchmarking by the also quite elegant deleteat(copy(my_vector), inds):

julia> @btime v[1:end .āˆ‰ [inds]]
  595.506 ns (7 allocations: 368 bytes)
5-element Vector{Int64}:
  2
  4
  6
  8
 10

julia> @btime deleteat!(copy(v), inds)
  80.642 ns (1 allocation: 144 bytes)
5-element Vector{Int64}:
  2
  4
  6
  8
 10

Yes, but it is wrong in general. begin:end is better, though not sure if even that is completely generic (does it work with Star Wars indexing, for example?)

eachindex, otoh, should be completely generic.

Just looks. To me, R[-2] looks similar to R[.!B]. Given a solution to the thread had not been checked, I thought maybe the elegance being sought was stylish appearance.

1 Like

Fair point. The end command is pretty neat from your example, but there are a bit too many steps to get there, compared to alternatives.

I donā€™t feel like there is any single winner, so there is no single answer - there is rather a larger number of solutions that are all good enough, but none are perfect. I will personally probably use

deleteat!(my_vector|>copy, inds)

, with piping allowing me to skip nested parenthesis (yuck).

I think that this reads well and is preformant. Additionally, it is completely general, without getting into 1:end vs begin:end vs eachindex(my_vector), all of which get increasingly general and long at the same time.

EDIT: Not completely general - the indices need to be sorted and unique for this. Luckily, deleteat! simply errors if you violate this, and sort and unique let you handle it without problem.