Odd behavior from @threads with bitwise AND

I might have misunderstood something about the expected behavior of the bitwise AND or the @threads macro, but I have observed something I think is odd.

Suppose we have a function that uses a for loop with @threads to perform the bitwise AND of two bitarrays in an element-wise fashion. That should agree with the result of using a & b, right? The toy example below disagrees with this when using @threads.


function bitwise_and2(a::BitArray, b::BitArray)
    n = length(a)
    res = falses(n)
    Threads.@threads for i = 1:n
        res[i] = a[i] & b[i]
    end
    res
end


function trials(n, n_iter)
    for i = 1:n_iter
        a = bitrand(n)
        b = bitrand(n)
        y1 = bitwise_and2(a, b)
        y2 = a & b

        println(i)
        @assert y2 == y1
    end
end 

trials(100, 1000)               # throws AssertionError 

Removing the @threads macro from the bitwise_and2() function seems to consistently give results that agree with a & b.

Have I missed something obvious concerning @threads or the bitwise AND? I would appreciate any guidance on what is causing this behavior.

What is versioninfo()? I can’t reproduce this on macOS in either 0.5 or 0.6-dev.

Here is the versioninfo():

Julia Version 0.5.1
Commit 6445c82 (2017-03-05 13:25 UTC)
Platform Info:
  OS: macOS (x86_64-apple-darwin13.4.0)
  CPU: Intel(R) Core(TM) i7-4870HQ CPU @ 2.50GHz
  WORD_SIZE: 64
  BLAS: libopenblas (USE64BITINT DYNAMIC_ARCH NO_AFFINITY Haswell)
  LAPACK: libopenblas64_
  LIBM: libopenlibm
  LLVM: libLLVM-3.7.1 (ORCJIT, haswell)

I also tried this on a machine running Ubuntu 16.04 with Julia 0.5.0, with the same result.

Like @paulstey, I get AssertionErrors in this example. But I need to set JULIA_NUM_THREADS>1 before starting julia, otherwise Thread.nthreads() will be 1 and no errors happen. Interestingly, the errors happen more easily (i.e. in fewer iterations) if n is small. Happens on both 0.5.1 and latest master (Ubuntu 16.10 here).

My guess is that this is because in a BitArray, different indices share the same storage location:

julia> sizeof(BitArray(256))
32

and maybe Threads doesn’t take this into account. I guess a bug should be filed.

Short answer, this is not a bug. It’s an undefined behavior(UB) caused by race condition since different threads are storing to the same memory location. Like many other race condition, it may or may not work depending on the set up.

Long answer, from the semantics POV. This is racy because elements of bitarrays, like any bit objects, are not memory locations. We use a very similar memory model as C11/C++11 so you can find the precise definition of memory location and race condition here. Note that by the definition, a bit is not a memory location, scalar objects are. So when you are storing to the res BitArray, you are storing to the same memory location from multiple threads, causing a data race and therefore undefined behavior. OTOH, if you are operating on bytes or bigger objects, accessing different element of such object in an array is never a racy and the result is always well defined (well, assuming you don’t have UB else where). This is even true when the objects accessed by two threads are in the same cache line. That will only affect performance, not behavior.

From the hardware POV, this cannot be implemented for data since such hardware instructions doesn’t exist on any of the hardware we support (or any general purpose processors I’m aware of FWIW…). An access of a bit on all current CPUs are always load+mask or load+mask+store where the load and store are of at least byte size. Do note that this doesn’t mean this cannot be implemented for synchronization objects but such implemenetation will be crazily expensive and won’t be suitable for data access.

6 Likes

Also note that multi threaded element wise operation can be implemented for BitArrays. But they have to operate on the underlying byte storate directly instead of on bits. Not sure if we exposes public interface for that.

@traktofon, thanks for confirming this, I appreciate it. And I should have mentioned that this requires the environment variable JULIA_NUM_THREADS to be set greater than 1.

@yuyichao, thank you very much for the detailed explanation. It did not occur to me that a race condition might be the cause of this; but your explanation makes perfect sense. Thanks also for the cppreference link, it’s very illuminating!