Perhaps the best solution from this thread could be implemented as a non-mutating version of deleteat
?
I wish so; I tried a while ago to add a non-mutating push
function, but there was some hesitation, like
someone might try
push
and be very confused why it doesnāt mutate its argument, but Iām not sure thatās a good reason not to have this
or:
What I worry about here is people writing a = push(a::Array, x) even though it is insanely slow.
some measures on different variants of some of the proposed functions
using BenchmarkTools
x = collect(1:10^6); inds = [300,500,600];
function f(x,inds)
y = Vector{eltype(x)}(undef,length(x) - length(inds))
j = 0
for i in Iterators.filter(!in(inds),eachindex(x))
j += 1
@inbounds y[j] = x[i]
end
return y
end
function f1(x,inds)
y = Vector{eltype(x)}(undef,length(x) - length(inds))
j = 0
for i in eachindex(x)
if !in(inds)(i)
j += 1
@inbounds y[j] = x[i]
end
end
return y
end
function f2(x,inds)
y = Vector{eltype(x)}(undef,length(x) - length(inds))
j = 0
for i in eachindex(x)
!in(inds)(i) || begin j+=1; y[j] = x[i] end
end
return y
end
function delat(v,idx)
mid=eltype(v)[]
shinds=idx[1:end] .-idx[1].+1
l=idx[1]
r=idx[end]
@inbounds for (i,e) in enumerate(v[l:r])
if i ā shinds
push!(mid,e)
end
end
[@view v[1:idx[1]-1]; mid; @view v[idx[end]+1:end]]
end
function delat1(v,idx)
mid=Int64[]
rng=idx[1]:idx[end]
for i in Iterators.filter(!in(inds),rng)
push!(mid, v[i])
end
[@view v[1:idx[1]-1] ; mid; @view v[idx[end]+1:end]]
end
@btime f($x,$inds); # 9.532 ms (2 allocations: 7.63 MiB)
@btime f1($x,$inds); # 3.285 ms (2 allocations: 7.63 MiB)
@btime f2($x,$inds); # 1.773 ms (2 allocations: 7.63 MiB)
@btime delat($x,$inds); # 2.885 ms (19 allocations: 7.64 MiB)
@btime delat1($x,$inds); # 2.979 ms (407 allocations: 7.65 MiB)
@btime deleteat!(copy($x),$inds); # 3.260 ms (2 allocations: 7.63 MiB)
function delat2(v,inds)
sind=sort(inds)
res=v[1:sind[1]-1]
@inbounds for i in 2:length(sind)
append!(res,v[sind[i-1]+1:sind[i]-1])
end
append!(res,v[sind[end]+1:end])
end
I agree with these concern on the specific case of push!
.
-
push!
is very widespread, lots of first-step tutorials and code samples mention it. Many noviceās codes will use it and very often inside loops. The bang notation is not that common and depending on previous languages is very easy to forget. - While a
push
(non-mutating) makes sense theoretically, in practice is something you will almost never want to do, as it is incredibly wasteful. Functional languages like Haskell have a non-mutatingpushfront
over single-linked lists and because with immutability this can be reasonably performant, but apush
like this is something hardly idiomatic anywhere. Callingappend!([element], old_vector)
orpush!(copy(old_vector), element)
is almost as easy, probably have a very similar performance to a smarter implementation, and makes intent very clear.
About deleteat
these arguments get a little weaker, but making a function with delete in the name non-mutating is a little odd. Why not an entirely new name?
f2
is very fast for short inds
vectors, but slows down dramatically for longer ones, since the membership check get needlessly expensive. Sorting inds
and iterating between them is likely to have better asymptotic efficiency (shouldnāt use repeated append!
s, though.)
StaticArrays has a non-mutating push
function.
hi @DNF !
by ālonger onesā, you mean the inds
vector, right?
In fact, if the order of inds
is not used, the size of this makes the search inefficient.
are you referring to delat2 ()?
what would be better to use instead of the repeated append!
?
Yes, Iām not sure what else it could refer to;)
Searching the entire inds
vector at every step shouldnāt be necessary.
Yes. That function will repeatedly resize and reallocate the output vector. It will also allocate a vector for each slice into v
. Perhaps you can avoid this with sizehint!
and view
, otherwise you can just copy the elements over one by one.
To me this is the obvious implementation, assuming inds
are sorted and unique:
function deleteat(x::Vector, inds)
y = similar(x, length(x) - length(inds))
i1 = 1
j = 1
for i2 in inds
if i2 > i1
n = i2 - i1
copyto!(y, j, x, i1, n)
j += n
end
i1 = i2 + 1
end
if inds[end] < length(x)
n = length(x) - inds[end]
copyto!(y, j, x, inds[end] + 1, n)
end
return y
end
@GunnarFarneback, the deleteat!()
solution still seems to be the most performant?
x = rand(1000)
@btime deleteat!(copy($x), 1:2:1000) # 1.020 Ī¼s (1 allocation: 7.94 KiB)
@btime deleteat($x, 1:2:1000) # 1.930 Ī¼s (1 allocation: 4.06 KiB)
Itās data dependent. Benchmark with data that is representative to your application. (Or if you want the full picture, with a large variety of sizes and index distributions.) Another anecdotal sample:
julia> x = collect(1:10^6);
julia> inds = 1:10:10^6;
julia> @btime deleteat!(copy($x), $inds);
1.278 ms (2 allocations: 7.63 MiB)
julia> @btime deleteat($x, $inds);
777.156 Ī¼s (2 allocations: 6.87 MiB)
@GunnarFarneback, thank you for all the learnings and great codes.
It seems we can get a good speed up by using unsafe_copyto!()
instead of copyto!()
. Is it alright to do it in this case, or you wouldnāt recommend it?
It seems to be about as unsafe as @inbounds
, so it basically depends on whether you trust my code to index correctly in all cases.
I was talking in the context of plain old Vector
, StaticArrays
is not recommended for more than 100 elements, and the user will have done a conscious decision of choosing a non-Base type. But more important, StaticArrays.push
will fail for Vector
, it only works on its own types, avoiding big mistakes.
Thereās boolean negative indexing that sort of looks like your R example.
julia> R = rand(5, 1)
5Ć1 Matrix{Float64}:
0.2505445363095371
0.9314598357409927
0.8421641479557949
0.3357689166072829
0.8827575785297642
julia> B = BitArray(undef, 5, 1);
julia> inds = [2, 5]
2-element Vector{Int64}:
2
5
julia> B[inds] .= 1;
julia> R[B]
2-element Vector{Float64}:
0.9314598357409927
0.8827575785297642
julia> R[.!B]
3-element Vector{Float64}:
0.2505445363095371
0.8421641479557949
0.3357689166072829
I think you probably want vectors instead of Nx1 matrices. The assignment operation is several times faster if you use
Rvec = rand(5)
Bvec = BitArray(undef, 5)
instead of
R = rand(5, 1)
B = BitArray(undef, 5, 1)
1.7.2> @btime ($B[$inds] .= 1);
36.117 ns (2 allocations: 96 bytes)
1.7.2> @btime ($Bvec[$inds] .= 1);
12.613 ns (0 allocations: 0 bytes)
Vectors and Nx1 matrices are not the same in Julia.
(The same thing goes for other functions: zeros
, ones
, fill
, etc. Use zeros(5)
, not zeros(5, 1)
, unless you very specifically need a matrix.)
Negative boolean indexing is what all the expressions involving ā
do, but in a one-liner. It seems to me like you are doing the same as @rocco_sprmnt21 suggested with R[ā(inds).(1:end)]
. What is the advantage of your implementation?
As a side-note, the 1:end
is much shorter and easier to type than my eachindex(my_vector)
. But my_vector[ā(inds).(1:end)]
is hard to parse as a human. A happy marriage between the two would be
my_vector[1:end .ā [inds]]
But this gets crushed in benchmarking by the also quite elegant deleteat(copy(my_vector), inds)
:
julia> @btime v[1:end .ā [inds]]
595.506 ns (7 allocations: 368 bytes)
5-element Vector{Int64}:
2
4
6
8
10
julia> @btime deleteat!(copy(v), inds)
80.642 ns (1 allocation: 144 bytes)
5-element Vector{Int64}:
2
4
6
8
10
Yes, but it is wrong in general. begin:end
is better, though not sure if even that is completely generic (does it work with Star Wars indexing, for example?)
eachindex
, otoh, should be completely generic.
Just looks. To me, R[-2] looks similar to R[.!B]. Given a solution to the thread had not been checked, I thought maybe the elegance being sought was stylish appearance.
Fair point. The end command is pretty neat from your example, but there are a bit too many steps to get there, compared to alternatives.
I donāt feel like there is any single winner, so there is no single answer - there is rather a larger number of solutions that are all good enough, but none are perfect. I will personally probably use
deleteat!(my_vector|>copy, inds)
, with piping allowing me to skip nested parenthesis (yuck).
I think that this reads well and is preformant. Additionally, it is completely general, without getting into 1:end
vs begin:end
vs eachindex(my_vector)
, all of which get increasingly general and long at the same time.
EDIT: Not completely general - the indices need to be sorted and unique for this. Luckily, deleteat!
simply errors if you violate this, and sort
and unique
let you handle it without problem.