Parallelization gives wrong result

I have a simple function which, given a vector of floats x, returns a BitVector that has 1’s in places where the entry is nonzero. In other words, f(x) = (x .!=0.0). When I parallelize it, I get incorrect results:

using Random
x = rand([1.0,2.0,3.0],10)

function f(x, prl)
    n = length(x)
    y = falses(n)
    if prl==0                    for i=1:n   y[i] = (x[i] !=0.0) end end
    if prl==1   Threads.@threads for i=1:n   y[i] = (x[i] !=0.0) end end
    return y end

y0 = trues(length(x))
for k=1:100  if y0!=f(x,0) print("!") end end
# 
for k=1:100  if y0!=f(x,1) print("!") end end
# !!!!!!!!

Why does this happen? What am I doing wrong? Each thread writes in a different place in y.

Maybe this:

help?> ?BitArray
search: BitArray

  BitArray{N} <: AbstractArray{Bool, N}

  Space-efficient N-dimensional boolean array, using just one bit for each
  boolean value.

  BitArrays pack up to 64 values into every 8 bytes, resulting in an 8x space
  efficiency over Array{Bool, N} and allowing some operations to work on 64
  values at once.

  By default, Julia returns BitArrays from broadcasting operations that
  generate boolean elements (including dotted-comparisons like .==) as well
  as from the functions trues and falses.

  β”‚ Note
  β”‚
  β”‚  Due to its packed storage format, concurrent access to the
  β”‚  elements of a BitArray where at least one of them is a write is
  β”‚  not thread-safe.
1 Like

Ouch, was not aware of that, thank you. So in this case, what does actually happen? One thread, when editing y[i], also locks some nearby y[j]?

The issue is that a BitVector is storing multiple values per byte. More specifically, it stores 64 values per UInt64 (8 bytes). In order to write a single entry, it reads the entire UInt64, sets one of the bits to the value it wants, and then writes the entire UInt64 back to memory.

This opens up the potential for a data race. The reason this is not thread-safe is that multiple threads can read the same UInt64, modify it by changing a single bit, and then write it back. This process takes time. If more than one reads it before the changes can get written back, then whichever writes it last will overwrite the work of everyone else. A lock would avoid this, but also would cause significant slowdowns.

A Vector{Bool} uses 1 byte per entry (rather than 1/8-byte like BitVector) and the bytes are addressed individually (rather than in blocks of 8 via UInt64), so it should be safe from data races in your example. I will recommend you use that instead.

1 Like

if you want to make this faster, you have to pack the bools into Ints yourself unfortunately.

Thank you @GunnarFarneback for the solution and @Oscar_Smith for a clear explanation.

One last thing: are there any other basic datastructures (from base) that are not thread-safe? Dict? Set?

I wouldn’t expect any data structures to be thread-safe apart from Array{T, N} and specifically task-oriented ones like Channel.

DataFrames probably are? I presume a column of a df cannot be a BitVector?

Vector{T} also isn’t threadsafe in the sense that you can have races on resizing.

Right, that was much too broad. I should have constrained that to getindex and setindex! without data races for the same index.

julia> df = DataFrame(x = trues(5))
5Γ—1 DataFrame
 Row β”‚ x
     β”‚ Bool
─────┼──────
   1 β”‚ true
   2 β”‚ true
   3 β”‚ true
   4 β”‚ true
   5 β”‚ true

julia> typeof(df.x)
BitVector (alias for BitArray{1})

Ouch. Ok, I’ll have to be careful what data structure I parallelize on. Tnx!

In general, locking data structures have mostly fallen out of favor in CS since they impose a substantial overhead of users that aren’t using multithreading, and they still they often aren’t great for performance compared to custom locking solutions.

1 Like