Odd behavior from @threads with bitwise AND

paulstey · April 12, 2017, 11:34pm

I might have misunderstood something about the expected behavior of the bitwise AND or the @threads macro, but I have observed something I think is odd.

Suppose we have a function that uses a for loop with @threads to perform the bitwise AND of two bitarrays in an element-wise fashion. That should agree with the result of using a & b, right? The toy example below disagrees with this when using @threads.


function bitwise_and2(a::BitArray, b::BitArray)
    n = length(a)
    res = falses(n)
    Threads.@threads for i = 1:n
        res[i] = a[i] & b[i]
    end
    res
end


function trials(n, n_iter)
    for i = 1:n_iter
        a = bitrand(n)
        b = bitrand(n)
        y1 = bitwise_and2(a, b)
        y2 = a & b

        println(i)
        @assert y2 == y1
    end
end 

trials(100, 1000)               # throws AssertionError

Removing the @threads macro from the bitwise_and2() function seems to consistently give results that agree with a & b.

Have I missed something obvious concerning @threads or the bitwise AND? I would appreciate any guidance on what is causing this behavior.

ihnorton · April 13, 2017, 1:52am

What is versioninfo()? I can’t reproduce this on macOS in either 0.5 or 0.6-dev.

paulstey · April 13, 2017, 2:09am

Here is the versioninfo():

Julia Version 0.5.1
Commit 6445c82 (2017-03-05 13:25 UTC)
Platform Info:
  OS: macOS (x86_64-apple-darwin13.4.0)
  CPU: Intel(R) Core(TM) i7-4870HQ CPU @ 2.50GHz
  WORD_SIZE: 64
  BLAS: libopenblas (USE64BITINT DYNAMIC_ARCH NO_AFFINITY Haswell)
  LAPACK: libopenblas64_
  LIBM: libopenlibm
  LLVM: libLLVM-3.7.1 (ORCJIT, haswell)

I also tried this on a machine running Ubuntu 16.04 with Julia 0.5.0, with the same result.

traktofon · April 13, 2017, 3:12am

Like @paulstey, I get AssertionErrors in this example. But I need to set JULIA_NUM_THREADS>1 before starting julia, otherwise Thread.nthreads() will be 1 and no errors happen. Interestingly, the errors happen more easily (i.e. in fewer iterations) if n is small. Happens on both 0.5.1 and latest master (Ubuntu 16.10 here).

My guess is that this is because in a BitArray, different indices share the same storage location:

julia> sizeof(BitArray(256))
32

and maybe Threads doesn’t take this into account. I guess a bug should be filed.

yuyichao · April 13, 2017, 3:27am

Short answer, this is not a bug. It’s an undefined behavior(UB) caused by race condition since different threads are storing to the same memory location. Like many other race condition, it may or may not work depending on the set up.

Long answer, from the semantics POV. This is racy because elements of bitarrays, like any bit objects, are not memory locations. We use a very similar memory model as C11/C++11 so you can find the precise definition of memory location and race condition here. Note that by the definition, a bit is not a memory location, scalar objects are. So when you are storing to the res BitArray, you are storing to the same memory location from multiple threads, causing a data race and therefore undefined behavior. OTOH, if you are operating on bytes or bigger objects, accessing different element of such object in an array is never a racy and the result is always well defined (well, assuming you don’t have UB else where). This is even true when the objects accessed by two threads are in the same cache line. That will only affect performance, not behavior.

From the hardware POV, this cannot be implemented for data since such hardware instructions doesn’t exist on any of the hardware we support (or any general purpose processors I’m aware of FWIW…). An access of a bit on all current CPUs are always load+mask or load+mask+store where the load and store are of at least byte size. Do note that this doesn’t mean this cannot be implemented for synchronization objects but such implemenetation will be crazily expensive and won’t be suitable for data access.

yuyichao · April 13, 2017, 3:38am

Also note that multi threaded element wise operation can be implemented for BitArrays. But they have to operate on the underlying byte storate directly instead of on bits. Not sure if we exposes public interface for that.

paulstey · April 13, 2017, 6:36pm

@traktofon, thanks for confirming this, I appreciate it. And I should have mentioned that this requires the environment variable JULIA_NUM_THREADS to be set greater than 1.

@yuyichao, thank you very much for the detailed explanation. It did not occur to me that a race condition might be the cause of this; but your explanation makes perfect sense. Thanks also for the cppreference link, it’s very illuminating!

Topic		Replies	Views
Threads.@threads and BitArrays General Usage	3	480	December 4, 2019
Threading issue - working on a view of a bool array General Usage	4	539	July 29, 2019
Inconsistent results when using Threads.@threads in a loop General Usage multithreading	17	861	June 10, 2023
Simple multi-thread loop with array Performance question , parallel , multithreading	11	764	April 13, 2021
Parallel computing of matrix with @threads General Usage parallel , multithreading , threads	3	554	August 4, 2022

Odd behavior from @threads with bitwise AND

Related topics