The following is a simplified version of a problem I’m working on. I get different answers depending on whether or not I use threads. This has got to be some sort of race condition, but I"m not sure how to straighten it out.

``````x0_vals = [[-1.],[1.],[2.],[5.]];
n_runs = 10;

function f!(x)
for k in 1:10
@. x += -0.5 * x
end
x
end

function class(x)
return Int(x[1]>0)
end

nx = length(x0_vals);
svals = zeros(Int,nx);
x = similar(x0_vals[1]);
``````

``````for l in 1:nx
for k in 1:n_runs
x .= deepcopy(x0_vals[l]);
@. x +=k;
f!(x);
svals[l] += class(x);
end
end

svals

4-element Vector{Int64}:
9
10
10
10
``````

``````@. svals = 0;
for l in 1:nx
x .= deepcopy(x0_vals[l]);
@. x +=k;
f!(x);
svals[l] += class(x);
end
end

svals

4-element Vector{Int64}:
6
8
10
8​
``````

As should be clear, this is an entirely deterministic problem.

Aren’t all of your threads writing to the same `x` simultaneously?

1 Like

Ok, I can see how that would be an issue. But if I get rid of the preallocationg of `x` and just have:

``````x = deepcopy(x0_vals[l]);
``````

in each loop, I still have inconsistent results. I’m open to other suggestions on how to fix this, too.

``````svals[l] += class(x)
``````

Maybe you could paralelize the outer loop instead of the inner one? (For the toy example you will not see any improvement possibly because of the cost of launching the threads).

2 Likes

Something like

``````@. svals = 0;
temp = zeros(n_runs)
for l in 1:nx
x .= deepcopy(x0_vals[l]);
@. x +=k;
f!(x);
temp[k] = class(x);
end
svals[l] = reduce(+,temp)
end
``````

Could work (taking the reduction outside of the threaded loop), although I am sure more optimizations are possible with the real code.
Also, don’t forget to put this part also in a function.

Another option is puttin the inside of the k-loop inside of a function and then using mapreduce from the `ThreadsX` package.

2 Likes