Consider a situation where each thread is performing push!
on a unique vector.
i.e. no 2 threads can edit the same vector.
This can cause a race condition because push!
can invoke memory reallocation.
If 2 different threads invoke memory reallocation at the same time, then there is a race-condition.
Minimum verifiable example:
n = nthreads()
MAX_SIZE = 10000
queue_list = [Vector{Int64}() for i in 1:n]
A = [rand(1:1000*n) for _ in 1:MAX_SIZE]
@threads for thread_id in 1:n
for i in 1:MAX_SIZE
if thread_id == A[i]
push!(queue_list[thread_id], i)
end
end
end
One approach is to use allocate the maximum memory initially n*MAX_SIZE
.
n = nthreads()
MAX_SIZE = 10000
queue_list = [Vector{Int64}() for i in 1:n]
size_list = [0 for i in 1:n]
A = [rand(1:1000*n) for _ in 1:MAX_SIZE]
@threads for thread_id in 1:n
for i in 1:MAX_SIZE
if thread_id == A[i]
queue_list[thread_id][ size_list[thread_id]+=1 ] = i
end
end
end
Also, I know that the sum( [length(queue_list[i]) for i in 1:n] )
<= MAX_SIZE
Hence, using n*MAX_SIZE
memory is wasteful.
Is there a better way to go about this problem? (Memory efficient)
I do not want to use a lock or atomic instruction everytime push!
is called (It is expensive).
But I do not mind using a lock or atomic instruction everytime memory reallocation is called.
Your MWE should include a demonstration that it is failing.
I prefer thus the term “Minimal Verifiable Example”, since obviously you wouldn’t post it if it was working.
With that said,
I can not find the failure.
push!
ing to separate arrays should threadsafe, and AFAICT is threadsafe.
Memory reallocation should not have any race-condition.
I can not find any failure.
I ran the following code with nthreads()===16
.
using Base.Threads
using Random
# The code to be tested
function threaded_test(seed)
Random.seed!(seed) # Set the seed so that it will run the same
n = nthreads()
MAX_SIZE = 10000
queue_list = [Vector{Int64}() for i in 1:n]
A = [rand(1:1000*n) for _ in 1:MAX_SIZE]
@threads for thread_id in 1:n
for i in 1:MAX_SIZE
if thread_id == A[i]
push!(queue_list[thread_id], i)
end
end
end
queue_list
end
# Same code as above but without any thread level parallism
function serial_test(seed)
Random.seed!(seed) # Set the seed so that it will run the same
n = nthreads()
MAX_SIZE = 10000
queue_list = [Vector{Int64}() for i in 1:n]
A = [rand(1:1000*n) for _ in 1:MAX_SIZE]
for thread_id in 1:n
for i in 1:MAX_SIZE
if thread_id == A[i]
push!(queue_list[thread_id], i)
end
end
end
queue_list
end
# Now lets check that they are always the same.
for seed in 1:1000
@assert serial_test(seed) == threaded_test(seed)
end
I did not get any error
I’m not sure how to do that since race-conditions are not deterministic and memory allocation depends on the the reallocation strategy used by the vector
My concern is issues like if 2 vectors are reallocated together then they may face race conditions like being allocated the same memory.
What makes you think it is happening at all?
When posting a MVE, do include something on that.
Even if you can’t reliably reproduce it there must be something you output some where at some point that evidences it.
I’ve left the above code running on a loop of 100_000, with MAX_SIZE also set to 100,000 to try and get an error. I will check on it in 12 hours. It had been running fine for several minutes when I left.
UPDATE: Seems to have run fine.
1 Like
I’m not 100% sure its happening but it feels like it should be possible. (maybe the probability is less)
Race-conditions can happen when common memory is being edited.
During memory allocation, there must be some read/write operations happening to keep track of which memory is free or allocated.
Thanks for going through the trouble of running a 12 hr experiment but I suggest against it.
If an error is found then we know race-conditions are happening.
If an error is not found then we can’t be sure if a race-conditions can occur.
My intuition tells me the probability of such race-conditions happening are very low so I doubt if any error will show up.
Absolutely no.
More precisely, data race is defined as concurrent operation on data where at least one of them is a writer. Note that only race on data is an issue.
Sure, but irrelevant. And they are actually (almost) not shared.
As you pointed out, the only place where a data race could happen is during allocation, which means that whether you have a problem or not has absolutely nothing to do with whether you are using arrays or not, but only has something to do with if your code can potentially allocate or not, or in another word, whether you are running any code or not. No features, experimental or not, can be added to a language that doesn’t allow you to run any code using it so allocates are not going to be an issue for threading*.
In general, you should stop talking about semantics (i.e. what you can or can’t do as user) as soon as you reach implementation detail of a language. Allocates (or the absense of it) are not part of the semantics of the language so any issues caused by it are bugs in the implementation.
[*] @threadcall
is the only exception here.
1 Like
Oh, and to clear yet another common misconception, don’t think about multithread programming harder than they really/already are.
Thread synchronization in general is already a hard enough problem so you can rest assured that you only need to care about things you can see and easily control. This means that none of libraries, languages, compilers, hardwares (cpus, caches) are going to introduce races that are not obvious, well documented and easy to control. There are grey areas (data structures that are thread safe for some operations and not for others) that you have to refer to the doc but anything you can’t control can be assumed to be safe (short of any bugs of course.)
This is pretty much the only way anything can work without having every software require their own very specific compilers and hardwares…
3 Likes
If this were possible, it would be a pretty serious bug.
Now, that alone doesn’t mean it isn’t possible (we’ve found and solved similar and worse bugs before.)
But it does mean that it warrants evidence.
Assuming there is a major bug, in a fairly common operation
without either looking at the source code, or seeing demonstration of it in the wild,
is a waste of your own time.
1 Like