Using multiple cores but the speed doesn't increase

soldin · January 21, 2024, 5:32am

Hi,
I am remotely accessing CPU with 42 cores. I ran the following command

julia --threads 40 in the REPL and set julia.NumThreads": 40 in VSC.

Threads.nthreads() gives 40. But there is no improvement in the speed. Kindly let me know what I’m missing. Thanks

Oscar_Smith · January 21, 2024, 5:43am

Does your code ever do anything to launch parallel tasks? If not, you shouldn’t expect speedup. Just because you have more cores doesn’t mean that you’ve given them anything to do.

soldin · January 21, 2024, 5:51am

U mean that I should use Threads.@threads macro before every for loop?

I thought that using multi-threading gives an extra advantage. The reason why I was hesitating from using that is I gathered from a YT video that it can lead to data race error.

I thought it was possible to speed up the code without using multi-threading.

Ok, I just realized that there is something called multi-processing.

gdalle · January 21, 2024, 10:49am

If you want to speed up your code, the first thing you want to do is speed up its sequential version, for instance using this guide:

https://viralinstruction.com/posts/optimise/

If that is not enough, or if you have access to a big machine, you can also use parallel computing.

There are several flavors of parallel computing. The main distinction to understand is between shared memory (eg. multithreading) and separate memory (eg. multiprocessing).

In multiprocessing, you don’t have to worry about separate processes accessing the same memory, but you pay higher communication costs.
In multithreading, your communication costs are much lower but several threads can access the same memory and cause race conditions. An example of what that means is given here:

https://docs.julialang.org/en/v1/manual/multi-threading/#Using-@threads-without-data-races

If you have a more concrete code example, we might be able to help you ensure correctness and performance scaling.

Indeed, before every loop for which the work can be performed in parallel without error.

ufechner7 · January 21, 2024, 11:05am

Not really. Only loops where each iteration needs a curtain amount of time (more than 0.1ms, my guess) are worth running multi-threaded, because multi-threading has an overhead. And large matrix operations run multi-threaded anyways, but are not using Julia threads.

EDITED

roflmaostc · January 21, 2024, 4:52pm

Would it be possible to share your code? Maybe a minimal working example of it.

That makes it much easier to judge whether multithreading helps.

DNF · January 21, 2024, 7:50pm

The length of each iteration is irrelevant, I presume you mean the total summed time of all iterations? If you do 1,000,000 iterations of 1ms each, clearly you can benefit from multithreading.

Anyway, 5ms, sounds high, even for the total time. I’ve seen speedups for workloads in the sub milli-seconds range.

roflmaostc · January 21, 2024, 8:28pm

Yeah, the overhead seems to be small:


julia> using BenchmarkTools

julia> arr = randn((100,100));

julia> function f(arr)
           for i in eachindex(arr)
               arr[i] = sin(arr[i])
           end
           return arr
       end
f (generic function with 1 method)

julia> function f_threaded(arr)
           Threads.@threads for i in eachindex(arr)
               arr[i] = sin(arr[i])
           end
           return arr
       end
f_threaded (generic function with 1 method)

julia> @btime f($arr);
  23.679 μs (0 allocations: 0 bytes)

julia> @btime f_threaded($arr);
  8.930 μs (61 allocations: 6.14 KiB)

soldin · January 25, 2024, 9:54am

Thanks a lot everyone. Sorry, I didn’t see the replies, bcoz after seeing Julia documentation (remote call, fetch and all that) I had decided to use just one core.

soldin · January 25, 2024, 9:59am

It’s a simulation of gene regulatory network using Gillespie’s stochastic simulation algorithm.

using Statistics
using Dates
using CSV
using DataFrames


function FillArrays0(Nanog,Gata6,Recep,Fgf4,idx,t)

        for i in 1:1
                if (i != idx)
                    Nanog[i,t+1] = Nanog[i,t]
                    Gata6[i,t+1] = Gata6[i,t] 
                    Recep[i,t+1] = Recep[i,t]
                    Fgf4[i,t+1] = Fgf4[i,t]                   
                end
        end
    
        return Nanog, Gata6, Recep, Fgf4
    
end


function Run0()

        println("I'm in m0")

        #numT = 1

        #FracVec_II = Array{Float64}(undef,numT)

        T = 50

        #YMax = 100 

        Cols = 10^8 + 2*10^6   

        Val_kf = 96; Val_Kd2f = 25; Val_Km = 100

        Vec = [1.25, 0.3,   1.25, 0.3,   2.0, 0.3,   160, 5.0,   17,   13.5,   Val_kf/32, 1, Val_Kd2f,   Val_Km]        
               
        p = 2
        
        b1=p*Vec[1]
        d1=Vec[2]
        
        b2=p*Vec[3]
        d2=Vec[4]

        b3=p*Vec[5]
        d3=Vec[6]
        
        Kd21=p*Vec[7] #binding of Gata6 to Nanog's promoter
        Kd31=p*Vec[8] #binding of Activated Recep to Nanog's promoter

        Kd12=p*Vec[9] #binding of Nanog to Gata6's promoter

        Kd13=p*Vec[10] #binding of Nanog to Recep's promoter      
        
        kf=p*Vec[11]
        d4=Vec[12]
        
        Kd2f=p*Vec[13] #binding of Gata6 to Fgf4's promoter

        Km=p*Vec[14] 

        a=2 #Hill-coefficient of Nanog 
        b=2 #Hill-coefficient of Gata6   
        c=2 #Hill-coefficient of Fgf4 suppression of Nanog production
        d=2 #Hill-coefficient of Gata6 repression of Fgf4 production

        #name = 'Stoch_Gillespie_1b'

        #filename1 = strcat('D:\Research\Excel_files\stochastic_stable_states\',name,'_stochastic_stable_states.xlsx')
         
        #for num = 1:1:numT
          
                #println(num)

                t=1

                Nanog = Array{Float64}(undef,1,Cols)
                Gata6 = Array{Float64}(undef,1,Cols)
                Recep = Array{Float64}(undef,1,Cols)
                Fgf4 = Array{Float64}(undef,1,Cols)
                Time = Array{Float64}(undef,1,Cols)

                Nanog_Total = 0
                Gata6_Total = 0

                for i1 in 1:1
                        for j1 in 1:Cols
                                Nanog[i1,j1] = -1
                                Gata6[i1,j1] = -1
                                Recep[i1,j1] = -1
                                Fgf4[i1,j1] = -1
                        end
                end


                for i2 in 1:Cols
                        Time[1,i2] = -1
                end


                for i3 in 1:size(Nanog)[1]
                        Nanog[i3,1] = 0
                        Gata6[i3,1] = 0
                        Recep[i3,1] = 0
                        Fgf4[i3,1] = 0
                end

                Time[1,1] = 0
                
                True = true

                while True

                        PF = zeros(1,8)

                        #XSn is the average number of molecules perceived by the nth cell
                        
                        XS1 = 0.5*Fgf4[1,t]                                                              
                        
                        m = 0

                        #PF[,n) is the propensity function for the (quotient(PF[,),6)+1)th cell
                        
                        Ra1 = Recep[1,t]*(XS1/Km)/(1+XS1/Km)

                        PF[1,1] = (2^m)*(b1)/((1+(Gata6[1,t]/Kd21)^b)*(1+(Ra1/Kd31)^c))
                        PF[1,2] = d1*Nanog[1,t]
                
                        PF[1,3] = (2^m)*(b2)/(1+(Nanog[1,t]/Kd12)^a)                                                         
                        PF[1,4] = d2*Gata6[1,t]        
                        
                        PF[1,5] = (2^m)*(b3)/(1+(Nanog[1,t]/Kd13)^a)
                        PF[1,6] = d3*Recep[1,t]

                        PF[1,7] = kf/(1+(Gata6[1,t]/Kd2f)^d)
                        PF[1,8] = d4*Fgf4[1,t]
                        
                        
                        B = rand()
                        C = log(B)
                        Tao = (-1)*C/(sum(PF))
                                
                        Time[1,t+1] = Time[1,t] + Tao
                        diff = Time[1,t+1]-Time[1,t]  

                        #println(Time[1,t+1])
                        
                        #if (diff < small)
                        #    small=diff
                        #end
                
                        k=size(Time)
                                        
                        if ((Time[1,t]+Tao) >= T)
                                #print("Time: ")
                                #println(Time[1,t])
                                #print("Tao: ")
                                #println(Tao)
                                break
                        end
                        
                        D = rand()
                                        
                        if (D*sum(PF) < PF[1,1])    
                            Nanog[1,t+1] = Nanog[1,t]+1
                            Gata6[1,t+1] = Gata6[1,t] 
                            Recep[1,t+1] = Recep[1,t]
                            Fgf4[1,t+1] = Fgf4[1,t] 
    
                            Ret = FillArrays0(Nanog,Gata6,Recep,Fgf4,1,t)
                            Nanog = Ret[1]
                            Gata6 = Ret[2]
                            Recep = Ret[3]
                            Fgf4 = Ret[4]
    
                        elseif (PF[1,1] <= D*sum(PF) && D*sum(PF) < sum(PF[1,1:2])) 
                                Nanog[1,t+1] = Nanog[1,t]-1
                                Gata6[1,t+1] = Gata6[1,t]
                                Recep[1,t+1] = Recep[1,t]
                                Fgf4[1,t+1] = Fgf4[1,t] 

                                Ret = FillArrays0(Nanog,Gata6,Recep,Fgf4,1,t)
                                Nanog = Ret[1]
                                Gata6 = Ret[2]
                                Recep = Ret[3]
                                Fgf4 = Ret[4]

                        elseif (sum(PF[1,1:2]) <= D*sum(PF) && D*sum(PF) < sum(PF[1,1:3])) 
                                Nanog[1,t+1] = Nanog[1,t]
                                Gata6[1,t+1] = Gata6[1,t]+1
                                Recep[1,t+1] = Recep[1,t]
                                Fgf4[1,t+1] = Fgf4[1,t] 

                                Ret = FillArrays0(Nanog,Gata6,Recep,Fgf4,1,t)
                                Nanog = Ret[1]
                                Gata6 = Ret[2]
                                Recep = Ret[3]
                                Fgf4 = Ret[4]
                        
                        elseif (sum(PF[1,1:3]) <= D*sum(PF) && D*sum(PF) < sum(PF[1,1:4])) 
                                Nanog[1,t+1] = Nanog[1,t]
                                Gata6[1,t+1] = Gata6[1,t]-1
                                Recep[1,t+1] = Recep[1,t]
                                Fgf4[1,t+1] = Fgf4[1,t]  
                                
                                Ret = FillArrays0(Nanog,Gata6,Recep,Fgf4,1,t)
                                Nanog = Ret[1]
                                Gata6 = Ret[2]
                                Recep = Ret[3]
                                Fgf4 = Ret[4]
                                        
                        elseif (sum(PF[1,1:4]) <= D*sum(PF) && D*sum(PF) < sum(PF[1,1:5])) 
                                Nanog[1,t+1] = Nanog[1,t]
                                Gata6[1,t+1] = Gata6[1,t]
                                Recep[1,t+1] = Recep[1,t]+1
                                Fgf4[1,t+1] = Fgf4[1,t]

                                Ret = FillArrays0(Nanog,Gata6,Recep,Fgf4,1,t)
                                Nanog = Ret[1]
                                Gata6 = Ret[2]
                                Recep = Ret[3]
                                Fgf4 = Ret[4]
                        
                        elseif (sum(PF[1,1:5]) <= D*sum(PF) && D*sum(PF) < sum(PF[1,1:6])) 
                                Nanog[1,t+1] = Nanog[1,t]
                                Gata6[1,t+1] = Gata6[1,t]
                                Recep[1,t+1] = Recep[1,t]-1
                                Fgf4[1,t+1] = Fgf4[1,t]

                                Ret = FillArrays0(Nanog,Gata6,Recep,Fgf4,1,t)
                                Nanog = Ret[1]
                                Gata6 = Ret[2]
                                Recep = Ret[3]
                                Fgf4 = Ret[4]                                

                        elseif (sum(PF[1,1:6]) <= D*sum(PF) && D*sum(PF) < sum(PF[1,1:7]))               
                                Nanog[1,t+1] = Nanog[1,t]
                                Gata6[1,t+1] = Gata6[1,t]
                                Recep[1,t+1] = Recep[1,t]
                                Fgf4[1,t+1] = Fgf4[1,t]+1

                                Ret = FillArrays0(Nanog,Gata6,Recep,Fgf4,1,t)
                                Nanog = Ret[1]
                                Gata6 = Ret[2]
                                Recep = Ret[3]
                                Fgf4 = Ret[4]

                        else #(sum(PF[1,1:7]) <= D*sum(PF) && D*sum(PF) < sum(PF[1,1:8]))               
                                Nanog[1,t+1] = Nanog[1,t]
                                Gata6[1,t+1] = Gata6[1,t]
                                Recep[1,t+1] = Recep[1,t]
                                Fgf4[1,t+1] = Fgf4[1,t]-1

                                Ret = FillArrays0(Nanog,Gata6,Recep,Fgf4,1,t)
                                Nanog = Ret[1]
                                Gata6 = Ret[2]
                                Recep = Ret[3]
                                Fgf4 = Ret[4]

                        end      

                        t = t+1
                        #println(t)

                end



                #display(Nanog[1,1:5])

                EndIdx = 0

                for i4 in 1:size(Nanog)[2] 

                        if(Nanog[1,i4]==-1)

                                EndIdx = i4-1
                                break

                        end

                end

                #print("EndIdx: ")
                #println(EndIdx)
                
                NewNanog = zeros(1,EndIdx)
                NewGata6 = zeros(1,EndIdx)
                NewRecep = zeros(1,EndIdx)
                NewFgf4 = zeros(1,EndIdx)   
                NewTime = zeros(1,EndIdx) 
                
                for i5 in 1:1

                        for j2 in 1:EndIdx

                                NewNanog[i5,j2] = Nanog[i5,j2]
                                NewGata6[i5,j2] = Gata6[i5,j2]
                                NewRecep[i5,j2] = Recep[i5,j2]
                                NewFgf4[i5,j2] = Fgf4[i5,j2]

                        end

                end

                for j2 in 1:EndIdx

                        NewTime[1,j2] = Time[1,j2]

                end
                 
                RetNanog_m0 = zeros(1)
                RetGata6_m0 = zeros(1)
                RetRecep_m0 = zeros(1)
                RetFgf4_m0 = zeros(1) 
                     
                for i7 in 1:1

                        RetNanog_m0[i7] = NewNanog[i7,end]
                        RetGata6_m0[i7] = NewGata6[i7,end]
                        RetRecep_m0[i7] = NewRecep[i7,end]
                        RetFgf4_m0[i7] = NewFgf4[i7,end]
                
                end
                

                #=
                for i6 in 1:1

                        #println(string("Image_SS_II_",i6))
                        
                        if (NewNanog[i6,EndIdx] > NewGata6[i6,EndIdx])
                                
                                Nanog_Total = Nanog_Total + 1
                        else
                                Gata6_Total = Gata6_Total + 1
                        end
                        
                        #=
                        plot(NewNanog[i6,:], lc=:blue, lw=2)
                        plot!(NewGata6[i6,:], lc=:green, lw=2)
                        plot!(NewRecep[i6,:], lc=:red, lw=2)
                        plot!(NewFgf4[i6,:], lc=:magenta ,lw=2)
                        plot!(grid=false, legend=false)
                        
                        xlims!(0, size(NewNanog)[2])
                        ylims!(0, YMax)

                        xlabel!("Time")
                        ylabel!("Protein concentration")
                        
                        savefig(string("Documents/Fgf4/Plots/Stoch_II/Fig_Stoch_II_NPF_Plot_",i6,".png"))
                        =#
                                 
                        #=
                        plot(NewNanog[i6,:], lc=:blue, lw=2)
                        plot!(NewGata6[i6,:], lc=:green, lw=2)
                        plot!(grid=false, legend=false)
                        
                        xlims!(0, size(NewNanog)[2])
                        ylims!(0, YMax)

                        xlabel!("Time")
                        ylabel!("Protein concentration")
                        
                        savefig(string("/home/Dinkar/Documents/Fgf4/Plots/Stoch_II/Fig_Stoch_II_NPF_Plot_Sans_Recep_Fgf4_",i6,".png"))                       
                        =#                                                
                end

                Frac = Nanog_Total/(Nanog_Total + Gata6_Total)
                FracVec_II[num] = Frac

                println(Frac)
                =#

                return  RetNanog_m0, RetGata6_m0, RetRecep_m0, RetFgf4_m0 
                
        #end

end

I am simulating a gene regulatory network along with cell division, so with every cell division the number of equations doubles. This is for the first cell. I have four more such scripts (for no. of cells 2, 4, 8, 16, 32). And I access all the files from the file for 32 cell stage.

Plz let me know how this code could be sped up using mutli-processing (at this point I am too scared of something going wrong with multi-threading).

lmiq · January 25, 2024, 10:04am

Is this a typo?

soldin · January 25, 2024, 10:20am

The perfectionist in me goads me to use the full syntax for i in 1:1:N

lmiq · January 25, 2024, 10:30am

So there is a N missing there?

Palli · January 25, 2024, 10:56am

Maybe not should, in addition to other good answer here, profile you code, see what code, loops, need optimizing. It’s usually 10% of the code spending 90% of the time.

soldin · January 25, 2024, 11:10am

Oh, here I did away with that full syntax. That is correct. Actually, for other scripts, it was 1:2, 1:4, 1:8 1:16, 1:32. So I stuck with the loop (1:1).

mkitti · January 25, 2024, 12:03pm

I wanted to point out that SciML calls these type of simulations Stochastic Differential Equations and that Gillespie simulations are but one of many solvers.

soldin · January 25, 2024, 12:28pm

Thanks. I actually wanted to used exact approximation method.

johnh · January 25, 2024, 1:13pm

Surely you could feed the number in on the command line, rather than maintaining separate scripts.
By the way, showing my ignorance here, are agent based simulations of any use
Introduction · Agents.jl (juliadynamics.github.io)

I would guess not as you are dealing with an expanding population of agents/ cells

soldin · January 27, 2024, 9:13am

I call script n-1 from script n.

I have done agent based simulation too, while this is detailed ODE based simulation. Thnx for the suggestion.

soldin · February 5, 2024, 10:25am

I visited the page on parallel computing.

It says, “Remote references come in two flavors: Future and RemoteChannel.”

And there are commands like @spawnet etc

I guess this will require extensive modification of the code. Also, I just came to know that I was using only a single core of my desktop computer (not the lab’s common and much bigger computer), whereas it has 16!

I just wanna know if there is a way to engage all the cores without making the above-mentioned modifications?

Something like julia --t or julia --p?

Topic		Replies	Views
Multi-threading on a 2 CPU system New to Julia multithreading	15	1083	February 2, 2023
Huge performance fluctuations in parallel benchmark: insights? Performance parallel , multithreading , benchmarktools	52	2629	December 1, 2021
Multi-threading or multi-processing, how to know which to use and when? Performance question , parallel , multithreading , distributed	32	6272	December 1, 2021
Poor performance while multithreading (Julia 1.0) Performance multithreading	28	3975	February 11, 2019
Multithreading not giving noticeable advantage when running parallel functions Performance parallel , multithreading	19	2675	August 14, 2020

Using multiple cores but the speed doesn't increase

Related topics