Deleteat!(trues(3), [true, false, false]) got "ArgumentError: indices must be unique and sorted"

I tested it in Julia 1.4.1

julia> deleteat!(trues(3), [true, false, false])
ERROR: ArgumentError: indices must be unique and sorted
Stacktrace:
 [1] deleteat!(::BitArray{1}, ::Array{Bool,1}) at ./bitarray.jl:976
 [2] top-level scope at REPL[18]:1

But the syntaxes below is allowed:

julia> deleteat!(trues(3), 1)
2-element BitArray{1}:
 1
 1

julia> deleteat!(collect(trues(3)), [true, false, false])
2-element Array{Bool,1}:
 1
 1

This methods simply has not been implemented yet, so deleteat!(trues(3), [true, false, false]) is interpreted as deleteat!(trues(3), [1, 0, 0]) , with [1, 0, 0] being the indices to be deleted: this is not valid.
You could open an issue for a feature request.

2 Likes

Logically, deleteat!() should work for any AbstractVector. For example, all below cases work

julia> deleteat!([1,2,3], [true, false, false])
2-element Array{Int64,1}:
 2
 3

julia> deleteat!(["a", "b", "c"], [true, false, false])
2-element Array{String,1}:
 "b"
 "c"

julia> deleteat!([true, true, true], [true, false, false])
2-element Array{Bool,1}:
 1
 1

julia> deleteat!([true, true, true], BitArray([true, false, false]))
2-element Array{Bool,1}:
 1
 1

julia> deleteat!(BitArray([true, true, true]), [1, 2])
1-element BitArray{1}:
 1

But this case doesn’t work:

julia> deleteat!(BitArray([true, true, true]), [true, false, false])
ERROR: ArgumentError: indices must be unique and sorted
Stacktrace:
 [1] deleteat!(::BitArray{1}, ::Array{Bool,1}) at ./bitarray.jl:976
 [2] top-level scope at REPL[4]:1

It is not seems very reasonable by my opinion.

It is just that the method for deleteat!(::BitArray{1}, ::Vector{Bool}) (or even deleteat!(::BitArray{1}, ::BitArray{1}) is not defined. However, a lot of other combinations work:

julia> a = collect(1:10);
julia> bools = Bool[true, false, true, false, true, false, true, false, true, false];
julia> bits = isodd.(b);
julia> typeof(bools)
Array{Bool,1}

julia> typeof(bits)
BitArray{1}

julia> deleteat!(a, bools)
5-element Array{Int64,1}:
  2
  4
  6
  8
 10

julia> a = collect(1:10);
julia> deleteat!(a, bits)
5-element Array{Int64,1}:
  2
  4
  6
  8
 10

julia> a = collect(1:10);
julia> deleteat!(bits, bools)
ERROR: ArgumentError: indices must be unique and sorted
Stacktrace:
 [1] deleteat!(::BitArray{1}, ::Array{Bool,1}) at ./bitarray.jl:976
 [2] top-level scope at REPL[33]:1

julia> deleteat!(bits, bits)
ERROR: ArgumentError: indices must be unique and sorted
Stacktrace:
 [1] deleteat!(::BitArray{1}, ::BitArray{1}) at ./bitarray.jl:976
 [2] top-level scope at REPL[34]:1

julia> deleteat!(bools, bits)
5-element Array{Bool,1}:
 0
 0
 0
 0
 0

julia> bools = Bool[true, false, true, false, true, false, true, false, true, false];

julia> deleteat!(bools, bools)
5-element Array{Bool,1}:
 0
 0
 0
 0
 0

Ah, you bested me, XD.

Reasonable or not, it’s a fact that the method doesn’t exist, so the only way forward is to implement it, or to open a feature request for it as I suggested.

2 Likes

No, it doesn’t work for immutable arrays, such as ranges and SVector, and for MVector, even though it’s mutable.

I think if you try to implement deleat! for bitarrays, you will see that it’s not so easy, since multiple values are packed into each byte.

1 Like

What definition of immutable are you using here?

julia> bits = isodd.(1:10);
julia> bits[2] = true
true

julia> bits
10-element BitArray{1}:
 1
 1
 1
 0
 1
...

julia> sort!(bits)
 0
 0
 0
 0
 1
 1
...

julia> isbits(bits)
false

julia> isimmutable(bits)
false

Bitarrays are not Immutable. I was responding to this

There are also mutable vectors without the deleteat! method, such as MVector and bitarrays.

Ah, ok, sorry I misunderstood.

About the difficulty of implementation, I can be wrong, but seems to me that you could simply reuse the code from Base but instead of calling _deleteend! (that I am not sure it is defined for BitArray) use resize! (that works for BitArray). You can do this in your code (without having to change Julia code) and prefix the methods headers with Base (i.e., function Base.deleteat!) and then you have deleteat! defined for BitArrays.

Ok, a naive implementation is simple, but is going to be very slow for bitarrays, which are made for batched operations. Iterating over and assigning individual elements of bitarrays is slow.

So, ok, you can write a simple version, but it would sort of go against the point of bitarrays.

I do not think a fast/batched version will do much better. If an element is deleted inside a block/batch, then you have two options: (1) do the naive algorithm done above for the rest of the block; (2) mask just what will come after it, shift the mask, and then OR it back, but this will only work correctly if this is the last block, otherwise the next block needs to considered too, in fact, if sufficient elements are deleted an arbitrary number of next block may need to be considered by those shifts. I really would like to see the performance difference between the two approaches.

Yes, exactly. This is why deleteat! is not a good fit for bitarrays. It’s hard to write a good implementation.

1 Like

I am not sure if the lack of a good implementation is a good argument
for no implementation at all. It will avoid the user inadvertently
using a data structure with poor performance, but this is kind of a
premature optimization. I agree with the post author that,
semantically, there is no reason for an implementation (even a naive
one) to be unavailable.

Thanks for the explanation and I can understand why it does not support deleteat!(bitarrays, boolvector). However, I wonder whether deleteat!(isodd.(1:4), [1,3]) also has a similar performance problem or not. As a user, I feel a little anti-intuitive that deleteat!(isodd.(1:4), [1,3]) is supported but deleteat!(isodd.(1:4), [true, false, true, false]) is not. If deleteat!(isodd.(1:4), [1,3]) indeed has a good reason to be supported, maybe this exceptional function behave could be mentioned in the document.

1 Like

It is a matter of taste whether the API should enforce usage that conforms to the performance model of the datastructure, or just fall back to something terribly slow. There are examples for both approaches in Base and commonly used packages.

I would argue that when using another approach gives you at least an O(n) relative speedup, omitting/not implementing the slow method is fine. As a user, I would rather learn about my code being suboptimal and refactor, than have a generic version which takes an hour instead of one second for medium-sized data. As @DNF said, deleting elements from bitarrays is just going to be slow.

O(n) relative speedup

You mean when the slow implementation is O(n^2) and the fast is O(n) (or slow O(n^3) and fast O(n^2))? Because I am almost sure worst case complexity stays the same. The greatest difference between the fast and the slow implementation that you will get is a constant factor in the size of the block and it only happens if the BitArray is just one block with just one deletion, in other cases (for example, deleting each odd index of a BitArray in which each block is just a very small fraction of the whole) then the performance will be comparable (if not worse for the “fast”) as they both will just degrade to the same behavior (of the “slow” algorithm) with get and set operations needing to apply shift and bitmask-and(or -or).

As a user, I would rather learn about my code being suboptimal and refactor, than have a generic version which takes an hour instead of one second for medium-sized data

I would say that this goes against the spirit of Julia, and modern languages in general, of avoiding premature optimization and “profile then optimize” (that is repeated so many times in this forum). I have a considerably large code written for my PhD, and there are some parts of it that execute for some milliseconds in executions of minutes/hours, I do not want to discover I need to copy code from Base that already works for my case, but was artificially restricted to other types when it could be a generic fallback, because of a constant-factor optimization could be implemented but it wasn’t, or even a O(n) factor. If I am doing my job right, I will profile, discover that most of the time is spent there, and implement an optimized version. However, there is great probability that such part of the code will never see more than 1% of total time, and now my developer time was wasted because some overzealous premature optimizer developer that decided to enforce this by types instead of just putting a note in the documentation.

1 Like

My understanding is that for the complexity for BitArray is something like O(mn), with m,n being the number of deletions and elements, respectively.

While I understand your POV about the implementation making choices for the user as premature optimization, consider that using BitArray is already an optimization which can be equally premature. If the number of elements is small, why not use Vector{Bool} etc?

No? If you check the generic/slow algorithm it already has a O(n) complexity. The code just do a constant number of operations by element of the vector, if it is a kept element, it is copied to the right position, if it is a deleted element, the offset used to compute the new position from the old position, for the next elements, is increased.

I never chose to use BitArray in my life, it is just the automatic return of broadcasted functions that return a Bool. In that sense, it is again a premature optimizer that chose it for me. I could, obviously, convert it to a Vector{Bool}, but this overhead alone could probably throw away the benefit of a “fast” algorithm for deleteat! (because, if not, then BitArrays deleteat! could be implemented more efficiently just doing this conversion) and would riddle my code of unnecessary conversions.

1 Like