Elegant way of exclusive indexing

TheLateKronos · February 24, 2022, 9:56am

Perhaps the best solution from this thread could be implemented as a non-mutating version of deleteat?

rfourquet · February 24, 2022, 11:22am

I wish so; I tried a while ago to add a non-mutating push function, but there was some hesitation, like

someone might try push and be very confused why it doesn’t mutate its argument, but I’m not sure that’s a good reason not to have this

or:

What I worry about here is people writing a = push(a::Array, x) even though it is insanely slow.

rocco_sprmnt21 · February 24, 2022, 3:39pm

some measures on different variants of some of the proposed functions

using BenchmarkTools

x = collect(1:10^6); inds = [300,500,600];

function f(x,inds)
    y = Vector{eltype(x)}(undef,length(x) - length(inds))
    j = 0
    for i in Iterators.filter(!in(inds),eachindex(x))
        j += 1
        @inbounds y[j] = x[i]
    end
    return y
end
function f1(x,inds)
    y = Vector{eltype(x)}(undef,length(x) - length(inds))
    j = 0
    for i in eachindex(x)
        if !in(inds)(i)
        j += 1
        @inbounds y[j] = x[i]
        end
    end
    return y
end
function f2(x,inds)
    y = Vector{eltype(x)}(undef,length(x) - length(inds))
    j = 0
    for i in eachindex(x)
        !in(inds)(i) || begin j+=1; y[j] = x[i] end
    end
    return y
end

function delat(v,idx)
    mid=eltype(v)[]
    shinds=idx[1:end] .-idx[1].+1
    l=idx[1]
    r=idx[end]
    @inbounds for (i,e) in enumerate(v[l:r])
            if i ∉ shinds
                push!(mid,e)
            end
        end
    [@view v[1:idx[1]-1]; mid; @view v[idx[end]+1:end]]
end
    

function delat1(v,idx)
    mid=Int64[]
    rng=idx[1]:idx[end]
    for i in Iterators.filter(!in(inds),rng)
        push!(mid, v[i])
    end
    [@view v[1:idx[1]-1] ; mid; @view v[idx[end]+1:end]]
end
    


@btime f($x,$inds);                 # 9.532 ms (2 allocations: 7.63 MiB)
@btime f1($x,$inds);                # 3.285 ms (2 allocations: 7.63 MiB)
@btime f2($x,$inds);                # 1.773 ms (2 allocations: 7.63 MiB)
@btime delat($x,$inds);             # 2.885 ms (19 allocations: 7.64 MiB)
@btime delat1($x,$inds);            # 2.979 ms (407 allocations: 7.65 MiB)
@btime deleteat!(copy($x),$inds);   # 3.260 ms (2 allocations: 7.63 MiB)

rocco_sprmnt21 · February 24, 2022, 10:52pm


function delat2(v,inds)
sind=sort(inds)
res=v[1:sind[1]-1]
@inbounds for i in 2:length(sind)
    append!(res,v[sind[i-1]+1:sind[i]-1])
end
append!(res,v[sind[end]+1:end])
end

Henrique_Becker · February 25, 2022, 1:46am

I agree with these concern on the specific case of push!.

push! is very widespread, lots of first-step tutorials and code samples mention it. Many novice’s codes will use it and very often inside loops. The bang notation is not that common and depending on previous languages is very easy to forget.
While a push (non-mutating) makes sense theoretically, in practice is something you will almost never want to do, as it is incredibly wasteful. Functional languages like Haskell have a non-mutating pushfront over single-linked lists and because with immutability this can be reasonably performant, but a push like this is something hardly idiomatic anywhere. Calling append!([element], old_vector) or push!(copy(old_vector), element) is almost as easy, probably have a very similar performance to a smarter implementation, and makes intent very clear.

About deleteat these arguments get a little weaker, but making a function with delete in the name non-mutating is a little odd. Why not an entirely new name?

DNF · February 25, 2022, 7:26am

f2 is very fast for short inds vectors, but slows down dramatically for longer ones, since the membership check get needlessly expensive. Sorting inds and iterating between them is likely to have better asymptotic efficiency (shouldn’t use repeated append!s, though.)

StaticArrays has a non-mutating push function.

rocco_sprmnt21 · February 25, 2022, 9:35am

hi @DNF !

by “longer ones”, you mean the inds vector, right?
In fact, if the order of inds is not used, the size of this makes the search inefficient.

are you referring to delat2 ()?
what would be better to use instead of the repeated append!?

DNF · February 25, 2022, 10:03am

Yes, I’m not sure what else it could refer to;)

Searching the entire inds vector at every step shouldn’t be necessary.

Yes. That function will repeatedly resize and reallocate the output vector. It will also allocate a vector for each slice into v. Perhaps you can avoid this with sizehint! and view, otherwise you can just copy the elements over one by one.

GunnarFarneback · February 25, 2022, 10:52am

To me this is the obvious implementation, assuming inds are sorted and unique:

function deleteat(x::Vector, inds)
    y = similar(x, length(x) - length(inds))
    i1 = 1
    j = 1
    for i2 in inds
        if i2 > i1
            n = i2 - i1
            copyto!(y, j, x, i1, n)
            j += n
        end
        i1 = i2 + 1
    end
    if inds[end] < length(x)
        n = length(x) - inds[end]
        copyto!(y, j, x, inds[end] + 1, n)
    end
    return y
end

rafael.guerra · February 25, 2022, 11:18am

@GunnarFarneback, the deleteat!() solution still seems to be the most performant?

x = rand(1000)
@btime deleteat!(copy($x), 1:2:1000)  # 1.020 μs (1 allocation: 7.94 KiB)
@btime deleteat($x, 1:2:1000)         # 1.930 μs (1 allocation: 4.06 KiB)

GunnarFarneback · February 25, 2022, 12:10pm

It’s data dependent. Benchmark with data that is representative to your application. (Or if you want the full picture, with a large variety of sizes and index distributions.) Another anecdotal sample:

julia> x = collect(1:10^6);

julia> inds = 1:10:10^6;

julia> @btime deleteat!(copy($x), $inds);
  1.278 ms (2 allocations: 7.63 MiB)

julia> @btime deleteat($x, $inds);
  777.156 μs (2 allocations: 6.87 MiB)

rafael.guerra · February 25, 2022, 4:21pm

@GunnarFarneback, thank you for all the learnings and great codes.

It seems we can get a good speed up by using unsafe_copyto!() instead of copyto!(). Is it alright to do it in this case, or you wouldn’t recommend it?

GunnarFarneback · February 25, 2022, 4:27pm

It seems to be about as unsafe as @inbounds, so it basically depends on whether you trust my code to index correctly in all cases.

Henrique_Becker · February 25, 2022, 5:07pm

I was talking in the context of plain old Vector, StaticArrays is not recommended for more than 100 elements, and the user will have done a conscious decision of choosing a non-Base type. But more important, StaticArrays.push will fail for Vector, it only works on its own types, avoiding big mistakes.

g2g · February 25, 2022, 8:49pm

There’s boolean negative indexing that sort of looks like your R example.

julia> R = rand(5, 1)
5×1 Matrix{Float64}:
 0.2505445363095371
 0.9314598357409927
 0.8421641479557949
 0.3357689166072829
 0.8827575785297642

julia> B = BitArray(undef, 5, 1);

julia> inds = [2, 5]
2-element Vector{Int64}:
 2
 5

julia> B[inds] .= 1;

julia> R[B]
2-element Vector{Float64}:
 0.9314598357409927
 0.8827575785297642

julia> R[.!B]
3-element Vector{Float64}:
 0.2505445363095371
 0.8421641479557949
 0.3357689166072829

DNF · February 25, 2022, 9:28pm

I think you probably want vectors instead of Nx1 matrices. The assignment operation is several times faster if you use

Rvec = rand(5)
Bvec = BitArray(undef, 5)

instead of

R = rand(5, 1)
B = BitArray(undef, 5, 1)

1.7.2> @btime ($B[$inds] .= 1);
  36.117 ns (2 allocations: 96 bytes)

1.7.2> @btime ($Bvec[$inds] .= 1);
  12.613 ns (0 allocations: 0 bytes)

Vectors and Nx1 matrices are not the same in Julia.

(The same thing goes for other functions: zeros, ones, fill, etc. Use zeros(5), not zeros(5, 1), unless you very specifically need a matrix.)

TheLateKronos · February 26, 2022, 3:11pm

Negative boolean indexing is what all the expressions involving ∉ do, but in a one-liner. It seems to me like you are doing the same as @rocco_sprmnt21 suggested with R[∉(inds).(1:end)]. What is the advantage of your implementation?

As a side-note, the 1:end is much shorter and easier to type than my eachindex(my_vector). But my_vector[∉(inds).(1:end)] is hard to parse as a human. A happy marriage between the two would be

my_vector[1:end .∉ [inds]]

But this gets crushed in benchmarking by the also quite elegant deleteat(copy(my_vector), inds):

julia> @btime v[1:end .∉ [inds]]
  595.506 ns (7 allocations: 368 bytes)
5-element Vector{Int64}:
  2
  4
  6
  8
 10

julia> @btime deleteat!(copy(v), inds)
  80.642 ns (1 allocation: 144 bytes)
5-element Vector{Int64}:
  2
  4
  6
  8
 10

DNF · February 26, 2022, 3:30pm

Yes, but it is wrong in general. begin:end is better, though not sure if even that is completely generic (does it work with Star Wars indexing, for example?)

eachindex, otoh, should be completely generic.

g2g · February 26, 2022, 3:58pm

Just looks. To me, R[-2] looks similar to R[.!B]. Given a solution to the thread had not been checked, I thought maybe the elegance being sought was stylish appearance.

TheLateKronos · February 26, 2022, 4:06pm

Fair point. The end command is pretty neat from your example, but there are a bit too many steps to get there, compared to alternatives.

I don’t feel like there is any single winner, so there is no single answer - there is rather a larger number of solutions that are all good enough, but none are perfect. I will personally probably use

deleteat!(my_vector|>copy, inds)

, with piping allowing me to skip nested parenthesis (yuck).

I think that this reads well and is preformant. Additionally, it is completely general, without getting into 1:end vs begin:end vs eachindex(my_vector), all of which get increasingly general and long at the same time.

EDIT: Not completely general - the indices need to be sorted and unique for this. Luckily, deleteat! simply errors if you violate this, and sort and unique let you handle it without problem.

Topic		Replies	Views
Minor matrix New to Julia	4	3052	December 6, 2018
Find indices or corresponding values except given indices New to Julia indexing	2	329	December 6, 2021
In julia, how to realize a=b[-1:2] as in R without modifying b New to Julia	3	1063	January 12, 2017
Vector with negative indices extending getindex General Usage question , base , methods	4	539	September 27, 2023
Easiest way to delete indices from Vector{Vector{}} New to Julia arrays , vector	5	1754	January 17, 2022

Elegant way of exclusive indexing

Related topics