Fastest way to negate some booleans in an array

Hi, I am trying to optimize a code for a number theory side project. Currently, I have a large vector of booleans, say

A=rand(Bool,10^8)

I want to negate all the even-index terms. One possible way that I have found is using map, here’s what btime returns

julia> @btime A[2:2:end]=map(!,A[2:2:end]);
  103.509 ms (9 allocations: 95.37 MiB)

I am wondering if there is anything that could be done to speed up this process more. I have tried the following as well

julia> @btime map!(!,A[2:2:end],A[2:2:end]);
  129.196 ms (8 allocations: 95.37 MiB)

which runs slightly slower than the first one.

Problem background: I am working on a fast method to calculate some multiplicative functions and one operation that I use all the time is A[p:p:end]= map(!,A[p:p:end]), this seems to be my most expensive operation.

If you have no problem in writing some extra lines:

julia> function negateAllEvenIndexes(A)
           for i in 2:2:length(A)
               A[i] = !A[i]
           end
           A
       end
julia> @btime negateAllEvenIndexes($A)
  31.877 ms (0 allocations: 0 bytes)
1 Like

Don’t be afraid to write a loop! There’s nothing magic about map or other constructs.

function negateeven1!(A)
    for i in firstindex(A)+isodd(firstindex(A)) : 2 : lastindex(A) # even-valued indices
        @inbounds A[i] = !A[i] # faster with @inbounds, but don't mess up the indexing!
    end
    return A
end

Your map! version will not update A because of array slicing. You need to use views, e.g.,

function negateeven2!(A)
    inds = firstindex(A)+isodd(firstindex(A)) : 2 : lastindex(A) # even-valued indices
    @views map!(!,A[inds],A[inds])
    return A
end

although I think the loop is more clear.

But the best I could do is

function negateeven3!(A)
    for i in eachindex(A)
        A[i] = xor(A[i],iseven(i))
    end
    return A
end

Having not looked, this probably uses vectorized instructions and that’s where the extra speed comes from.

1 Like

Just use views and this becomes much more efficient:

@views map!(!,A[2:2:end],A[2:2:end])

Or use broadcasting, even faster:

A[2:2:end] .= .!(@view A[2:2:end])
3 Likes

Loops are great. You could also do

@views A[2:2:end] .= .!A[2:2:end] 

Or 0:2:end if you’re into zero-based things…

Edit: typing too slow on phone…

If you’re willing to get your hands dirty with BitVector internals, you could do something like

julia> function neg_bitvector_evens(v)
           evens = 0xaaaaaaaaaaaaaaaa
           for i in eachindex(v.chunks)
               v.chunks[i] = xor(v.chunks[i], evens)
           end
           v
       end
julia> @btime neg_bv_evens(v) setup=(v = falses(10^8))
  277.700 μs (3 allocations: 4.28 KiB)
100000000-element BitVector:
 0
 1
 0
 1
 ⋮
 0
 1
4 Likes

Another option I think is attractive (generalizes well for other p):

julia> @btime foreach(2:2:length($A)) do i
       $A[i] = !$A[i]
       end
  21.342 ms (0 allocations: 0 bytes)

Remember to add $ to variables when doing @btime. Also, the last option suggested in original post doesn’t actually flip the bits because slicing without views creates a new vector:

map!(!,A[2:2:end],A[2:2:end]);

Also note, above method (and others) work with BitVectors as well. Because memory access is probably the constraining factor on the speed of operation, and the recent word of bits already in cache, the extra indexing operation for each negation will not cost much (for p=2 the low-level bit method is still much faster). If p is larger than word size then BitVector vs. Bool vector makes less difference.

are you all missing something?

julia> @btime foreach(2:2:length(A)) do i
              A[i] = !A[i]
              end setup=(A=rand(Bool,10^8))
  29.752 ms (0 allocations: 0 bytes)

julia> @btime let
              @view(A[2:2:end]) .⊻= true
              end setup=(A=rand(Bool,10^8));
  23.367 ms (0 allocations: 0 bytes)

You are right. I’ve used xor when testing as well.
In any case, taking optimization to extreme, and in a slightly humoristic suggestion:

julia> using MappedArrays

julia> @btime mappedarray((i,val)->ifelse(iseven(i),val,!val),1:10^8,$v) setup=(v=fill(true,10^8))
  4.009 ns (0 allocations: 0 bytes)
100000000-element mappedarray(var"#17#18"(), ::UnitRange{Int64}, ::BitVector) with eltype Bool:
 1
 0
 1
 0

also works :wink:

mapped array is fast in creation (O(1)) but subsequent access is much slower

Yep, access is slow. But we don’t know the usage pattern OP intends. It seems the goal is to flip prime sequences of entries.
Perhaps, the following ‘hybrid’ method will be optimized (and also sparkingly demonstrates Julia composability):

julia> using CircularArrays
julia> using MappedArrays
julia> using Primes

julia> P = primes(10);
julia> Pprod = prod(P)
210

julia> m = CircularArray(falses(Pprod))
210-element CircularVector(::BitVector):
 0
 0
 0

julia> v = falses(10^8);

julia> V = mappedarray((v1, v2)->v1⊻v2, view(m, 1:length(v)), v);

julia> V[1:10]
10-element Vector{Bool}:
 0
 0
 0
 0
 0
 0
 0
 0
 0
 0

julia> m[2:2:Pprod] .⊻= true;

julia> V[1:10]
10-element Vector{Bool}:
 0
 1
 0
 1
 0
 1
 0
 1
 0
 1

So essentially, for low primes, flipping the auxiliary vector m will flip all the virtual big vector V, and for higher primes, the big vector v should be flipped. Access time is longer but constant.