Thread-safe array building

I want to use threading to build a result. But it seems push! isn’t thread-safe:

    num_monte = Int(1e6)
    solution_data = Vector{Float64}()
    Threads.@threads for i in 1:num_monte
      push!(solution_data,i)
    end

With @parallel there is a reduction, which I can use (vcat). Is there some way to do something like that with threading?

3 Likes

Create nthreads() number of arrays and push to them based on threadid().

3 Likes
num_monte = Int(1e6)
solution_data = Vector{Vector{Float64}}()
for i in 1:Threads.nthreads()
  push!(solution_data,Float64[])
end
Threads.@threads for i in 1:num_monte
  push!(solution_data[Threads.threadid()],i)
end
vcat(solution_data...)
julia> vcat(solution_data...)
1000000-element Array{Float64,1}:
      1.0
      2.0
      3.0
      4.0
      5.0
      6.0
      7.0
      8.0
      9.0
     10.0
      ⋮
 999991.0
 999992.0
 999993.0
 999994.0
 999995.0
 999996.0
 999997.0
 999998.0
 999999.0
      1.0e6

Thanks!

5 Likes

If you know the total size in advance, it would probably be faster to allocate the entire vector at the beginning,
and then precalculate offsets into the vector for each thread, with each thread setting a separate section, so that no locking is needed, and no concatenation is needed at the end.

Is accessing an array like that thread safe?

I do know the total size in advance.

The number of threads isn’t necessarily a divisor of the number of steps in the loop. How does @threads make the decision for how much work goes to each thread? If I do something fancy, won’t I need to match that exactly?

Yes, the code would be more complex to deal with having one of the threads handle more steps (the remainder after dividing it up). It would probably only be worth the effort if the array were pretty large. You wouldn’t be able to use the @threads macro I think.

Note that there’ll likely be a small improvement if you do.

solution_data = Vector{Vector{Float64}}(Threads.nthreads())
@threads for i in 1:Threads.nthreads()
  solution_data[i] = Float64[]
end

Due to decreased memory conflict.

In general, you shouldn’t be using push! in the first place if you know the the size statically, whether or now you want to use threads or now.

I’m not sure what do you mean by fancy but you shouldn’t write code that rely on the iterations to run on a fixed number of threads (at runtime). They currently are but that’s not the model we have. The runtime is free to run you code on any number of threads if it thinks it should do so for various reasons.

3 Likes

Note: I don’t think that first part will actually improve performance, since you will be conflicting between each thread when you are just setting up a small number of entries in solution_data. (they will probably all hit the same cache line).

Sorry for reviving this old thread; I wanted to ask what exactly the intended behaviour of resizing/reallocating ops on arrays is, in multithreaded. As far as I understood the source (please confirm or correct), we have the following race behaviour:

Bitstype elements:

  1. One thread causes realloc (using resize or push) and while another thread reads.
    Effect: In bitstype arrays, the reading thread may see incorrect data (uninitialized memory).
    If another thread writes (setindex!) at the same time, the write may or may not be performed, or it may end up in the wrong place (we hit into a memmove).
    Safety goals (as I guessed from the array.c): No memory corruption should occur. Julia should not crash.

  2. One thread causes realloc/move, while another thread pushes or deletes.
    Effect (guess from array.c): Julia may crash on array.c / line 669:
    ‘’’ assert(oldlen == a->nrows &&
    “Race condition detected: recursive resizing on the same array.”); ‘’’

Non-bitstype (pointer-array):

Case 1 and 2: Julia may memory-corrupt, because the data-pointer in the array is updated before the memcopy/memset. (this is guessed from the source-code; I don’t have a POC)

Is this correct and intended behavior (“resizing a pointer-array needs locks, you cannot check for mistakes from within julia, and missing locks almost always lead to data corruption; in case of pointer arrays, they lead to memory-corruption, which may well be exploitable, depending on your surrounding code”)?

Or would memory corruption in such cases be considered a bug (even though resizing array operations are explicitly not threadsafe)?

To be honest, it is the assert about race-conditions that made me question whether julia is trying to crash rather than memory-corrupt here.

Mind you, I’m not asking you to completely verify my guesses about the actual array behavior; rather, I’m asking where in the sand the feature/bug line is drawn.

I think that’s the wrong way to look at it. Instead, this is just how the computer actually works and the possibly extra safety features just don’t exist yet to abstract it away

I’m not sure I understand you here. Thread-safety often comes with a performance big cost, so these are not necessarily “not yet” safety features, but might instead be “it’s a feature, not a bug”.

If I want to figure out how thread-unsafe array operations are, in practice, then I should read array.c. And if I try to figure out how thread-unsafe array ops are in theory, then I should read the docs (which are afaik not entirely clear on this), or ask people. I’m doing the latter know.

I am trying to figure out whether multithreaded pushes are allowed to corrupt data, crash julia or install malware (memory corrupt). I had an algorithm that would have profited from semi-memsafe lockfree push/read, i.e. one thread modifies/pushes, other threads are read-only.

One plausible implementation of arrays would have made this safe, in the sense that the reading threads always see valid data (either before or after the push). This would have been achieved by always using (on realloc)

  1. never memmove, always memcpy (no aliasing of old and new data), if the offset is too large, realloc instead of moving to front,
  2. update data,
  3. at last, update length. Possibly, one could also have made data-moving resizes create locks (data-moving resizes are slow and amortized-rare anyway, hence the lock is not necessarily too bad, performance-wise)

Julia has chosen not to implement it this way, but instead allow old and new data to alias in nontrivial ways, and overwrite the data pointer before initializing the memory it points to (unless I failed at reading comprehension).

This is totally fine. On the other hand, there were assertions that superficially appear to detect some races; so I wondered whether they are supposed to detect all memory-corrupting races – in this case, I would spend the time to possibly search and report some (maybe present) security bugs in julia-base.

If instead this is known intended behavior, then there is nothing to see here, except for possibly documenting (in this thread or via PR to documentation) that races on array-push! are a security-bug (in your julia code, not in julia-base).

Realistically, a racy array-push! might be too dangerous with the current codebase, either way. However, I’m not sure whether julia-base tries (or should try) to mitigate here (crash instead of pwn) .

Undefined.

It’s asserting for recursion caused by finalizer, it has nothing to do with multithreding though obviously a race condition can trigger it especially since a race condition can in principle trigger everything.

I think ThreadSafeArray would be a cool AbstractArray type to make.

Thanks! That explains the code in array.c.

So memory-corruption on multithreaded pointer-array operations (one thread resizes, others read-only) would currently be expected behavior and not a bug? (undefined as in “memory corruption” is kinda different from undefined as in “may get unexpected data”)

Undefined as in it’s an undefined behavior in both C and LLVM so it can do anything it feels like, including “to make demons fly out of your nose” if physically possible.

Note that as far as thread safety is concern, atomic operation, thread safe data structures, locks are generally used only for synchronization and not to hold data and so such a data type is not something you want to use in general. That said, there are of course use cases where you can use array for synchronization but it strongly depend on what you need and a ThreadSafeArray is unlikely going to be what you need since each concurrent operation you want to allow will almost always add to the cost of using such type. It’s in general more useful to create thread safe containers for different usage patterns. Since julia has a GC, there are indeed some tricks that can be done more easily than in a non-GC language (e.g. RCU).

1 Like

Thanks again.

Re undefined behavior, e.g. updating an immutable bitstype in an array is semi-defined: Sometimes julia/llvm updates the changed fields only, sometimes it updates all fields. However, it should not memory-corrupt.

RCU is what I will then do (and hope that it is not too hard on the gc, as I’d guess that this will not be nice to the “generational” part of generational gc).

As far as I have understood, it is thread-safe to read-copy-update on an Vector{mutable} (I have Vector{Vector{bitstype}} and would rewrite the inner Vector{bitstype}). This is only a single move of pointer-size, and hence should not corrupt/partial update on sane architectures - I just need to make sure myself that I don’t lose concurrent writes.

Unfortunately, atomics does not eat object-references, as far as I gathered? Would you recommend trying to take your code from atomics.jl, and using bitcasts to update object refs like they were ints? (I am very much not fluent in LLVM IR and hence not sure whether this would run into a brick wall; naively, the code_native should be the same?)

No, it’s not semi defined, it is allowed to corrupt memory as well as any kind of crash.

If different threads are updating different element then it well defined, doesn’t matter how big the element is. If not the it’s not well defined. It may crash and it doesn’t matter what’s the size.

Atomic update on object reference is not yet supported and is also not possible to implement with llvmcall

Thanks for the warning. In my boundless naivete I would have used something like the following; what are the failure modes? GC visibility? (fixable with some ccall to root the inserted object if the write was successful?)
Obviously, the below code will fail on 32bit archs, but otherwise?

Unholy hybrid of atomics.jl, the code_llvm resulting from atomics.jl, and by hand editing and type-forgetting:

@inline _foo_cas!(addr::Ptr{Int64}, cmp::Int64, newval::Int64) =
    Base.llvmcall("%tmp1 = cmpxchg i64* %0, i64 %1, i64 %2 acq_rel acquire
                           %tmp2 = extractvalue { i64, i1 } %tmp1, 0 
                           ret i64 %tmp2 ",    
                           Int64, Tuple{Ptr{Int64},Int64,Int64},
                           addr, cmp, newval) 

#returns Ptr{Void} because I don't know how to do fast unsafe_load(pointer_from_objref(x)) 
#hence, must compare pointers. equivalent to comparing object identities, so meh.
@inline function foo_cas!(addr::Ptr{T}, cmp::T, newval::T) where {T} 
   assert(isbits(T)==false) 
   cmp_ptr = convert(Int64, pointer_from_objref(cmp))
   newval_ptr = convert(Int64, pointer_from_objref(newval))
   convert(Ptr{Void}, _foo_cas!(convert(Ptr{Int64},addr), cmp_ptr, newval_ptr))
end

X = Vector{Vector{Int}}(100)
for i in 1:length(X)
    X[i] = collect(i:i+3)#[i:i+3]
end
nv = [37]
cmpval = X[13]
ov = foo_cas!(pointer(X,13), cmpval, nv)

assert(ov == pointer_from_objref(cmpval)) #we are single-threaded!
X[13]

Yes

No