Hi, I am trying to optimize a code for a number theory side project. Currently, I have a large vector of booleans, say
A=rand(Bool,10^8)
I want to negate all the even-index terms. One possible way that I have found is using map, here’s what btime returns
julia> @btime A[2:2:end]=map(!,A[2:2:end]);
103.509 ms (9 allocations: 95.37 MiB)
I am wondering if there is anything that could be done to speed up this process more. I have tried the following as well
julia> @btime map!(!,A[2:2:end],A[2:2:end]);
129.196 ms (8 allocations: 95.37 MiB)
which runs slightly slower than the first one.
Problem background: I am working on a fast method to calculate some multiplicative functions and one operation that I use all the time is A[p:p:end]= map(!,A[p:p:end]), this seems to be my most expensive operation.
If you have no problem in writing some extra lines:
julia> function negateAllEvenIndexes(A)
for i in 2:2:length(A)
A[i] = !A[i]
end
A
end
julia> @btime negateAllEvenIndexes($A)
31.877 ms (0 allocations: 0 bytes)
Don’t be afraid to write a loop! There’s nothing magic about map or other constructs.
function negateeven1!(A)
for i in firstindex(A)+isodd(firstindex(A)) : 2 : lastindex(A) # even-valued indices
@inbounds A[i] = !A[i] # faster with @inbounds, but don't mess up the indexing!
end
return A
end
Your map! version will not update A because of array slicing. You need to use views, e.g.,
function negateeven2!(A)
inds = firstindex(A)+isodd(firstindex(A)) : 2 : lastindex(A) # even-valued indices
@views map!(!,A[inds],A[inds])
return A
end
although I think the loop is more clear.
But the best I could do is
function negateeven3!(A)
for i in eachindex(A)
A[i] = xor(A[i],iseven(i))
end
return A
end
Having not looked, this probably uses vectorized instructions and that’s where the extra speed comes from.
Another option I think is attractive (generalizes well for other p):
julia> @btime foreach(2:2:length($A)) do i
$A[i] = !$A[i]
end
21.342 ms (0 allocations: 0 bytes)
Remember to add $ to variables when doing @btime. Also, the last option suggested in original post doesn’t actually flip the bits because slicing without views creates a new vector:
map!(!,A[2:2:end],A[2:2:end]);
Also note, above method (and others) work with BitVectors as well. Because memory access is probably the constraining factor on the speed of operation, and the recent word of bits already in cache, the extra indexing operation for each negation will not cost much (for p=2 the low-level bit method is still much faster). If p is larger than word size then BitVector vs. Bool vector makes less difference.
julia> @btime foreach(2:2:length(A)) do i
A[i] = !A[i]
end setup=(A=rand(Bool,10^8))
29.752 ms (0 allocations: 0 bytes)
julia> @btime let
@view(A[2:2:end]) .⊻= true
end setup=(A=rand(Bool,10^8));
23.367 ms (0 allocations: 0 bytes)
Yep, access is slow. But we don’t know the usage pattern OP intends. It seems the goal is to flip prime sequences of entries.
Perhaps, the following ‘hybrid’ method will be optimized (and also sparkingly demonstrates Julia composability):
So essentially, for low primes, flipping the auxiliary vector m will flip all the virtual big vector V, and for higher primes, the big vector v should be flipped. Access time is longer but constant.