Parallelization gives wrong result

Leo_I · May 25, 2023, 9:14pm

I have a simple function which, given a vector of floats x, returns a BitVector that has 1’s in places where the entry is nonzero. In other words, f(x) = (x .!=0.0). When I parallelize it, I get incorrect results:

using Random
x = rand([1.0,2.0,3.0],10)

function f(x, prl)
    n = length(x)
    y = falses(n)
    if prl==0                    for i=1:n   y[i] = (x[i] !=0.0) end end
    if prl==1   Threads.@threads for i=1:n   y[i] = (x[i] !=0.0) end end
    return y end

y0 = trues(length(x))
for k=1:100  if y0!=f(x,0) print("!") end end
# 
for k=1:100  if y0!=f(x,1) print("!") end end
# !!!!!!!!

Why does this happen? What am I doing wrong? Each thread writes in a different place in y.

GunnarFarneback · May 25, 2023, 9:23pm

Maybe this:

help?> ?BitArray
search: BitArray

  BitArray{N} <: AbstractArray{Bool, N}

  Space-efficient N-dimensional boolean array, using just one bit for each
  boolean value.

  BitArrays pack up to 64 values into every 8 bytes, resulting in an 8x space
  efficiency over Array{Bool, N} and allowing some operations to work on 64
  values at once.

  By default, Julia returns BitArrays from broadcasting operations that
  generate boolean elements (including dotted-comparisons like .==) as well
  as from the functions trues and falses.

  │ Note
  │
  │  Due to its packed storage format, concurrent access to the
  │  elements of a BitArray where at least one of them is a write is
  │  not thread-safe.

Leo_I · May 25, 2023, 9:31pm

Ouch, was not aware of that, thank you. So in this case, what does actually happen? One thread, when editing y[i], also locks some nearby y[j]?

mikmoore · May 25, 2023, 10:20pm

The issue is that a BitVector is storing multiple values per byte. More specifically, it stores 64 values per UInt64 (8 bytes). In order to write a single entry, it reads the entire UInt64, sets one of the bits to the value it wants, and then writes the entire UInt64 back to memory.

This opens up the potential for a data race. The reason this is not thread-safe is that multiple threads can read the same UInt64, modify it by changing a single bit, and then write it back. This process takes time. If more than one reads it before the changes can get written back, then whichever writes it last will overwrite the work of everyone else. A lock would avoid this, but also would cause significant slowdowns.

A Vector{Bool} uses 1 byte per entry (rather than 1/8-byte like BitVector) and the bytes are addressed individually (rather than in blocks of 8 via UInt64), so it should be safe from data races in your example. I will recommend you use that instead.

Oscar_Smith · May 25, 2023, 10:24pm

if you want to make this faster, you have to pack the bools into Ints yourself unfortunately.

Leo_I · May 26, 2023, 11:53am

Thank you @GunnarFarneback for the solution and @Oscar_Smith for a clear explanation.

One last thing: are there any other basic datastructures (from base) that are not thread-safe? Dict? Set?

GunnarFarneback · May 26, 2023, 12:06pm

I wouldn’t expect any data structures to be thread-safe apart from Array{T, N} and specifically task-oriented ones like Channel.

Leo_I · May 26, 2023, 12:23pm

DataFrames probably are? I presume a column of a df cannot be a BitVector?

Oscar_Smith · May 26, 2023, 12:43pm

Vector{T} also isn’t threadsafe in the sense that you can have races on resizing.

GunnarFarneback · May 26, 2023, 12:56pm

Right, that was much too broad. I should have constrained that to getindex and setindex! without data races for the same index.

nilshg · May 26, 2023, 1:51pm

julia> df = DataFrame(x = trues(5))
5×1 DataFrame
 Row │ x
     │ Bool
─────┼──────
   1 │ true
   2 │ true
   3 │ true
   4 │ true
   5 │ true

julia> typeof(df.x)
BitVector (alias for BitArray{1})

Leo_I · May 26, 2023, 1:52pm

Ouch. Ok, I’ll have to be careful what data structure I parallelize on. Tnx!

Oscar_Smith · May 26, 2023, 1:56pm

In general, locking data structures have mostly fallen out of favor in CS since they impose a substantial overhead of users that aren’t using multithreading, and they still they often aren’t great for performance compared to custom locking solutions.

Topic		Replies	Views
Threading issue - working on a view of a bool array General Usage	4	529	July 29, 2019
BitVector vs Vector{Bool} as default on comparison operations Performance	8	9046	November 2, 2020
Threads.@threads and BitArrays General Usage	3	472	December 4, 2019
Efficient way to get indices of true elements from BitArray New to Julia indexing	5	1781	February 3, 2022
Broadcast promoting Array{Bool} to BitArray General Usage	7	3130	June 26, 2017

Parallelization gives wrong result

Related topics