Data racing with @threads with "for-loop"

I figured out that my code is having a data-race problem with @threads. However, I cannot see what is causing the problem and how I can fix this.


### Generate deterministic sequence

trial = seq_generator(sobol_n);

### Prepare a null vector to collect every vector produced in each thread

    vec_model = Any[]

    for i in 1:Threads.nthreads()
        push!(vec_model,Any[])
    end

### In each thread, simulate and produce objects

    @threads for col in collect(eachcol(trial))

        try 
            moments = simulated_moment(col)     
            d = moments .- collect(values(dictEmpiricalMoments)  
            norm = transpose(d) * W * d         
            input = [col,moments,norm]
            push!(vec_model[threadid()],input)   
   
        catch  # in case of error, assign huge value  
            moments = ones(length(dictEmpiricalMoments)) * 10^6      
            d = moments .- collect(values(dictEmpiricalMoments)) # Warning, the order should be in line 
            norm = transpose(d) * W * d         
            input = [col,moments,norm]
            push!(vec_model[threadid()],input)   
       end

    end 

### Merge the vectors 

    sim = vcat(vec_model...)

### Find the row with the minimum norm 

    distance, ind = findmin(last,sim)

I get that “ind” can change by each trial, but I’m also having different “distance”.

Once I remove @threads, I get consistent result.

Just see this recent thread: Behavior of threads - #29 by lmiq

2 Likes

But first optimize the serial version. With all those Any, doing the calculations and global scope, and a lot of intermediate allocations (like using collect unnecessarily, etc), you probably can get huge speedups without going multi-threaded at all.

5 Likes

You could also just do

using Folds
sim = Folds.map(eachcol(trial)) do col
    # your code from the loop body, without the pushes
    return [col, moments, norm]
end
1 Like