Multithreading inconsistency issue

The following is a simplified version of a problem I’m working on. I get different answers depending on whether or not I use threads. This has got to be some sort of race condition, but I"m not sure how to straighten it out.

x0_vals = [[-1.],[1.],[2.],[5.]];
n_runs = 10;

function f!(x)
    for k in 1:10
        @. x += -0.5 * x
    end
    x
end

function class(x)
   return Int(x[1]>0) 
end

nx = length(x0_vals);
svals = zeros(Int,nx);
x = similar(x0_vals[1]);

without threads:

for l in 1:nx
    for k in 1:n_runs
        x .= deepcopy(x0_vals[l]);
        @. x +=k;
        f!(x);
        svals[l] += class(x);
    end
end

svals

4-element Vector{Int64}:
  9
 10
 10
 10

with threads:

@. svals = 0;
for l in 1:nx
    Threads.@threads for k in 1:n_runs
        x .= deepcopy(x0_vals[l]);
        @. x +=k;
        f!(x);
        svals[l] += class(x);
    end
end

svals 

4-element Vector{Int64}:
  6
  8
 10
  8​

As should be clear, this is an entirely deterministic problem.

Aren’t all of your threads writing to the same x simultaneously?

1 Like

Ok, I can see how that would be an issue. But if I get rid of the preallocationg of x and just have:

x = deepcopy(x0_vals[l]);

in each loop, I still have inconsistent results. I’m open to other suggestions on how to fix this, too.

All your threads are reading and writing to the same address in svals:

svals[l] += class(x)

Maybe you could paralelize the outer loop instead of the inner one? (For the toy example you will not see any improvement possibly because of the cost of launching the threads).

2 Likes

Something like

@. svals = 0;
temp = zeros(n_runs)
for l in 1:nx
    Threads.@threads for k in 1:n_runs
        x .= deepcopy(x0_vals[l]);
        @. x +=k;
        f!(x);
        temp[k] = class(x);
    end
    svals[l] = reduce(+,temp)
end

Could work (taking the reduction outside of the threaded loop), although I am sure more optimizations are possible with the real code.
Also, don’t forget to put this part also in a function.

Another option is puttin the inside of the k-loop inside of a function and then using mapreduce from the ThreadsX package.

2 Likes