Getting different result from serial and multithreaded code in Julia

This was a real surprise. Is this something new? Because I’m quite sure I remember seeing ‘official-seeming’ tutorials (don’t remember where) showing threadid used for this. I thought it was the canonical solution.

Is there any drop-in replacement?

See these discussion

One solution appears to be using FLoops.jl FAQ · FLoops

Basically, when I am trying to implement the function build_temp in the full code, that I provided, it gives me the wrong result. In my MWE the input matrices (t1,Lov_df,Lvv_df,tau) are just random numbers, but in the original program, these matrices are different and give significantly different results in comparison with the serial code.

The suggestion to avoid that patterns is old as pointed in FLoops docs, but what’s new is that, AFAIU, since 1.7 the tasks can effectively migrate: Julia 1.7 says it can switch the thread your task is on. How often does that happen, and how can it be disabled?

I don’t know of any drop in replacement other than executing the threaded loop at a higher level, defining a chunk index independent of the thread (which is what I do in the ChunkSplitters.jl package I mentioned above, and which I’m using to avoid this in simple parallel loops like this one).

I am not sure how we can proceed given that your MWE doesn’t have the issues you have described. Have you tried running your real input matrices through the functions as given in the MWE and then comparing the outputs?

2 Likes

Good point.

The missing + seems important.

Thank you for the suggestion. Now my temp is matching with the serial code. I thought @tensor and @tensoropt doesn’t make any difference.

Thank you for your responses. Now suppose I want to update array t2new by t2new[i,j,:,:] += temp. But again I am getting different t2new from serial and multithreaded code.

Here is the GitHub link to my code(MWE): https://github.com/Sudipta-code/ccsd_julia.git

Fork this repository and run: python3 pycall_multhread.py . I have stored all the arrays in hdf5 files ( eg.Loo_df.h5 , Lov_df.h5, Lvv_df.h5, t1.h5, t2new.h5, tau.h5)

Following is the julia code multithreading_tensor.jl which I am calling from a python code named pycall_multhread.py.

using Distributed
using TensorOperations,SharedArrays
using Base.Threads
using PyCall
using HDF5

println("number of threads ", nthreads())

Lovdf = h5open("Lov_df.h5", "r") do file_Lov
    read(file_Lov, "/Lov")
end

Lvvdf = h5open("Lvv_df.h5", "r") do file_Lvv
    read(file_Lvv, "/Lvv")
end

t1 = h5open("t1.h5", "r") do file_t1
    read(file_t1, "/t1")
end

tau = h5open("tau.h5", "r") do file_tau
    read(file_tau, "/tau")
end

t2new = h5open("t2new.h5", "r") do file_t2new
    read(file_t2new, "/t2new")
end


nocc,nvir = size(t1)
#t2new = zeros(nocc,nocc,nvir,nvir)
#temp = zeros(nvir,nvir)


function build_temp(t1,nocc,nvir,Lovdf,Lvvdf,tau)
    tempf = zeros(nvir,nvir)
    Lov_T120 = permutedims(Lovdf,(2,3,1))
    #t2new_temp = Array{Float64,4}(undef,(nocc,nocc,nvir,nvir))
    temp = Array{Float64,2}(undef,(nvir,nvir))
    temps = [zeros(nvir,nvir) for i = 1:Threads.nthreads()]
    t2new_temps = [zeros(nocc,nocc,nvir,nvir) for i = 1:Threads.nthreads()]
    Threads.@threads for i in 1:nocc
        @inbounds begin
            id = Threads.threadid()
            temp = temps[id]
            t2new =  t2new_temps[id]
            for j in 1:nocc
                @views tau_ij_ = tau[i,j,:,:]
                @tensoropt (k=>x, t=>100x, d=>100x, c=>100x, a=>100x, b=>100x) begin
                    temp[a,b] = (-t1[k,b]*Lov_T120[k,d,t]*tau_ij_[c,d]*Lvvdf[t,a,c])
                    temp[a,b] -= t1[k,a]*Lov_T120[k,c,t]*tau_ij_[c,d]*Lvvdf[t,b,d]
                    temp[a,b] += Lvvdf[t,a,c]*tau_ij_[c,d]*Lvvdf[t,b,d]
                end
                t2new[i,j,:,:] += temp
            end
        end
    end #sync
    temp_final= sum(temps)
    t2new_final = sum(t2new_temps)
    println("temp from multithreaded code ")
    display(temp_final)
    println("t2new from multithreaded code ")
    display(t2new_final)
    temp_final
end



function build_temp_serial(t1,nocc,nvir,Lovdf,Lvvdf,tau)
    Lov_T120 = permutedims(Lovdf,(2,3,1))
    t2new_temp = Array{Float64,4}(undef,(nocc,nocc,nvir,nvir))
    temp = Array{Float64,2}(undef,(nvir,nvir))
    for i in 1:nocc
        @inbounds begin
            for j in 1:nocc
                @views tau_ij_ = tau[i,j,:,:]
                @tensoropt (k=>x, t=>100x, d=>100x, c=>100x, a=>100x, b=>100x) begin
                    temp[a,b] = (-t1[k,b]*Lov_T120[k,d,t]*tau_ij_[c,d]*Lvvdf[t,a,c])
                    temp[a,b] -= t1[k,a]*Lov_T120[k,c,t]*tau_ij_[c,d]*Lvvdf[t,b,d]
                    temp[a,b] += Lvvdf[t,a,c]*tau_ij_[c,d]*Lvvdf[t,b,d]
                end
                t2new[i,j,:,:] += temp
            end
        end
    end #sync
    println("temp from serial code ")
    display(temp)
    println("t2new from serial code ")
    display(t2new)
    temp
end

temp_final = build_temp(t1,nocc,nvir,Lovdf,Lvvdf,tau)
temp = build_temp_serial(t1,nocc,nvir,Lovdf,Lvvdf,tau)

Now the problem is when I comment out the line #t2new[i,j,:,:] += temp, array temp is coming same for both serial and multithreaded code. But when I remove the comment (as written in the code, Line no:54) again, temp and t2new both are coming different. Maybe I am doing some silly mistakes, but I can not figure it out. I am new to Julia and also in parallel programing.

Shouldn’t this be += in both functions?

If I do += then, I am not getting the correct temp that I expect.

When temp[a,b] += (-t1[k,b]*Lov_T120[k,d,t]*tau_ij_[c,d]*Lvvdf[t,a,c]), then temp is:

2×2 Matrix{Float64}:
 -0.102307   -0.0585356
 -0.0585356  -0.0793371

When temp[a,b] = (-t1[k,b]*Lov_T120[k,d,t]*tau_ij_[c,d]*Lvvdf[t,a,c]), then temp is:

2×2 Matrix{Float64}:
 -0.0125728     1.8677e-17
  1.82111e-17  -0.00865733

And this is the temp I expect.

Sorry, I forgot to add the t2new.h5 file in the repository. Now it’s updated. Clone the ccsd_julia repository and start your terminal with export JULIA_NUM_THREADS=12 or (depending on your system specifications) and then run: python3 pycall_multhread.py