Getting different result from serial and multithreaded code in Julia

DNF · January 23, 2023, 12:13pm

This was a real surprise. Is this something new? Because I’m quite sure I remember seeing ‘official-seeming’ tutorials (don’t remember where) showing threadid used for this. I thought it was the canonical solution.

Is there any drop-in replacement?

skleinbo · January 23, 2023, 12:47pm

See these discussion

One solution appears to be using FLoops.jl FAQ · FLoops

Sudipta_Chakraborty · January 23, 2023, 1:50pm

Basically, when I am trying to implement the function build_temp in the full code, that I provided, it gives me the wrong result. In my MWE the input matrices (t1,Lov_df,Lvv_df,tau) are just random numbers, but in the original program, these matrices are different and give significantly different results in comparison with the serial code.

lmiq · January 23, 2023, 2:03pm

The suggestion to avoid that patterns is old as pointed in FLoops docs, but what’s new is that, AFAIU, since 1.7 the tasks can effectively migrate: Julia 1.7 says it can switch the thread your task is on. How often does that happen, and how can it be disabled?

I don’t know of any drop in replacement other than executing the threaded loop at a higher level, defining a chunk index independent of the thread (which is what I do in the ChunkSplitters.jl package I mentioned above, and which I’m using to avoid this in simple parallel loops like this one).

jmair · January 23, 2023, 2:25pm

I am not sure how we can proceed given that your MWE doesn’t have the issues you have described. Have you tried running your real input matrices through the functions as given in the MWE and then comparing the outputs?

mkitti · January 23, 2023, 6:20pm

Good point.

The missing + seems important.

Sudipta_Chakraborty · January 24, 2023, 6:13am

Thank you for the suggestion. Now my temp is matching with the serial code. I thought @tensor and @tensoropt doesn’t make any difference.

Sudipta_Chakraborty · January 24, 2023, 6:28am

Thank you for your responses. Now suppose I want to update array t2new by t2new[i,j,:,:] += temp. But again I am getting different t2new from serial and multithreaded code.

Here is the GitHub link to my code(MWE): https://github.com/Sudipta-code/ccsd_julia.git

Fork this repository and run: python3 pycall_multhread.py . I have stored all the arrays in hdf5 files ( eg.Loo_df.h5 , Lov_df.h5, Lvv_df.h5, t1.h5, t2new.h5, tau.h5)

Following is the julia code multithreading_tensor.jl which I am calling from a python code named pycall_multhread.py.

using Distributed
using TensorOperations,SharedArrays
using Base.Threads
using PyCall
using HDF5

println("number of threads ", nthreads())

Lovdf = h5open("Lov_df.h5", "r") do file_Lov
    read(file_Lov, "/Lov")
end

Lvvdf = h5open("Lvv_df.h5", "r") do file_Lvv
    read(file_Lvv, "/Lvv")
end

t1 = h5open("t1.h5", "r") do file_t1
    read(file_t1, "/t1")
end

tau = h5open("tau.h5", "r") do file_tau
    read(file_tau, "/tau")
end

t2new = h5open("t2new.h5", "r") do file_t2new
    read(file_t2new, "/t2new")
end


nocc,nvir = size(t1)
#t2new = zeros(nocc,nocc,nvir,nvir)
#temp = zeros(nvir,nvir)


function build_temp(t1,nocc,nvir,Lovdf,Lvvdf,tau)
    tempf = zeros(nvir,nvir)
    Lov_T120 = permutedims(Lovdf,(2,3,1))
    #t2new_temp = Array{Float64,4}(undef,(nocc,nocc,nvir,nvir))
    temp = Array{Float64,2}(undef,(nvir,nvir))
    temps = [zeros(nvir,nvir) for i = 1:Threads.nthreads()]
    t2new_temps = [zeros(nocc,nocc,nvir,nvir) for i = 1:Threads.nthreads()]
    Threads.@threads for i in 1:nocc
        @inbounds begin
            id = Threads.threadid()
            temp = temps[id]
            t2new =  t2new_temps[id]
            for j in 1:nocc
                @views tau_ij_ = tau[i,j,:,:]
                @tensoropt (k=>x, t=>100x, d=>100x, c=>100x, a=>100x, b=>100x) begin
                    temp[a,b] = (-t1[k,b]*Lov_T120[k,d,t]*tau_ij_[c,d]*Lvvdf[t,a,c])
                    temp[a,b] -= t1[k,a]*Lov_T120[k,c,t]*tau_ij_[c,d]*Lvvdf[t,b,d]
                    temp[a,b] += Lvvdf[t,a,c]*tau_ij_[c,d]*Lvvdf[t,b,d]
                end
                t2new[i,j,:,:] += temp
            end
        end
    end #sync
    temp_final= sum(temps)
    t2new_final = sum(t2new_temps)
    println("temp from multithreaded code ")
    display(temp_final)
    println("t2new from multithreaded code ")
    display(t2new_final)
    temp_final
end



function build_temp_serial(t1,nocc,nvir,Lovdf,Lvvdf,tau)
    Lov_T120 = permutedims(Lovdf,(2,3,1))
    t2new_temp = Array{Float64,4}(undef,(nocc,nocc,nvir,nvir))
    temp = Array{Float64,2}(undef,(nvir,nvir))
    for i in 1:nocc
        @inbounds begin
            for j in 1:nocc
                @views tau_ij_ = tau[i,j,:,:]
                @tensoropt (k=>x, t=>100x, d=>100x, c=>100x, a=>100x, b=>100x) begin
                    temp[a,b] = (-t1[k,b]*Lov_T120[k,d,t]*tau_ij_[c,d]*Lvvdf[t,a,c])
                    temp[a,b] -= t1[k,a]*Lov_T120[k,c,t]*tau_ij_[c,d]*Lvvdf[t,b,d]
                    temp[a,b] += Lvvdf[t,a,c]*tau_ij_[c,d]*Lvvdf[t,b,d]
                end
                t2new[i,j,:,:] += temp
            end
        end
    end #sync
    println("temp from serial code ")
    display(temp)
    println("t2new from serial code ")
    display(t2new)
    temp
end

temp_final = build_temp(t1,nocc,nvir,Lovdf,Lvvdf,tau)
temp = build_temp_serial(t1,nocc,nvir,Lovdf,Lvvdf,tau)

Now the problem is when I comment out the line #t2new[i,j,:,:] += temp, array temp is coming same for both serial and multithreaded code. But when I remove the comment (as written in the code, Line no:54) again, temp and t2new both are coming different. Maybe I am doing some silly mistakes, but I can not figure it out. I am new to Julia and also in parallel programing.

jmair · January 24, 2023, 8:18am

Shouldn’t this be += in both functions?

Sudipta_Chakraborty · January 24, 2023, 12:23pm

If I do += then, I am not getting the correct temp that I expect.

When temp[a,b] += (-t1[k,b]*Lov_T120[k,d,t]*tau_ij_[c,d]*Lvvdf[t,a,c]), then temp is:

2×2 Matrix{Float64}:
 -0.102307   -0.0585356
 -0.0585356  -0.0793371

When temp[a,b] = (-t1[k,b]*Lov_T120[k,d,t]*tau_ij_[c,d]*Lvvdf[t,a,c]), then temp is:

2×2 Matrix{Float64}:
 -0.0125728     1.8677e-17
  1.82111e-17  -0.00865733

And this is the temp I expect.

Sudipta_Chakraborty · January 24, 2023, 12:33pm

Sorry, I forgot to add the t2new.h5 file in the repository. Now it’s updated. Clone the ccsd_julia repository and start your terminal with export JULIA_NUM_THREADS=12 or (depending on your system specifications) and then run: python3 pycall_multhread.py

Topic		Replies	Views
Julia multithreading New to Julia question	8	569	January 15, 2023
Poor performance while multithreading (Julia 1.0) Performance multithreading	28	4036	February 11, 2019
Data structures for threaded computing Performance	23	2922	October 23, 2019
Parallelizing for loop in the computation of a gradient Performance question	19	2632	February 26, 2018
Innefficient paralellization? Need some help optimizing a simple dot product Performance question , parallel	32	4887	March 28, 2018

Getting different result from serial and multithreaded code in Julia

Related topics