# Parallelization gives wrong result

I have a simple function which, given a vector of floats `x`, returns a `BitVector` that has 1βs in places where the entry is nonzero. In other words, `f(x) = (x .!=0.0)`. When I parallelize it, I get incorrect results:

``````using Random
x = rand([1.0,2.0,3.0],10)

function f(x, prl)
n = length(x)
y = falses(n)
if prl==0                    for i=1:n   y[i] = (x[i] !=0.0) end end
return y end

y0 = trues(length(x))
for k=1:100  if y0!=f(x,0) print("!") end end
#
for k=1:100  if y0!=f(x,1) print("!") end end
# !!!!!!!!
``````

Why does this happen? What am I doing wrong? Each thread writes in a different place in y.

Maybe this:

``````help?> ?BitArray
search: BitArray

BitArray{N} <: AbstractArray{Bool, N}

Space-efficient N-dimensional boolean array, using just one bit for each
boolean value.

BitArrays pack up to 64 values into every 8 bytes, resulting in an 8x space
efficiency over Array{Bool, N} and allowing some operations to work on 64
values at once.

By default, Julia returns BitArrays from broadcasting operations that
generate boolean elements (including dotted-comparisons like .==) as well
as from the functions trues and falses.

β Note
β
β  elements of a BitArray where at least one of them is a write is
``````
1 Like

Ouch, was not aware of that, thank you. So in this case, what does actually happen? One thread, when editing y[i], also locks some nearby y[j]?

The issue is that a `BitVector` is storing multiple values per byte. More specifically, it stores 64 values per `UInt64` (8 bytes). In order to write a single entry, it reads the entire `UInt64`, sets one of the bits to the value it wants, and then writes the entire `UInt64` back to memory.

This opens up the potential for a data race. The reason this is not thread-safe is that multiple threads can read the same `UInt64`, modify it by changing a single bit, and then write it back. This process takes time. If more than one reads it before the changes can get written back, then whichever writes it last will overwrite the work of everyone else. A lock would avoid this, but also would cause significant slowdowns.

A `Vector{Bool}` uses 1 byte per entry (rather than 1/8-byte like `BitVector`) and the bytes are addressed individually (rather than in blocks of 8 via `UInt64`), so it should be safe from data races in your example. I will recommend you use that instead.

1 Like

if you want to make this faster, you have to pack the bools into Ints yourself unfortunately.

Thank you @GunnarFarneback for the solution and @Oscar_Smith for a clear explanation.

One last thing: are there any other basic datastructures (from base) that are not thread-safe? Dict? Set?

I wouldnβt expect any data structures to be thread-safe apart from `Array{T, N}` and specifically task-oriented ones like `Channel`.

`DataFrame`s probably are? I presume a column of a df cannot be a `BitVector`?

`Vector{T}` also isnβt threadsafe in the sense that you can have races on resizing.

Right, that was much too broad. I should have constrained that to `getindex` and `setindex!` without data races for the same index.

``````julia> df = DataFrame(x = trues(5))
5Γ1 DataFrame
Row β x
β Bool
ββββββΌββββββ
1 β true
2 β true
3 β true
4 β true
5 β true

julia> typeof(df.x)
BitVector (alias for BitArray{1})
``````

Ouch. Ok, Iβll have to be careful what data structure I parallelize on. Tnx!

In general, locking data structures have mostly fallen out of favor in CS since they impose a substantial overhead of users that arenβt using multithreading, and they still they often arenβt great for performance compared to custom locking solutions.

1 Like