Its possible to use all cores or parallel cores to do some project/calculations?

my program is too complicated to reproduce here …
i am interpolate (cubic) 3 integrals with many variables … i do all integrals with many factors and keep it in a table to use further …

Even a bit of pseudocode? Again, knowledge of the particulars is really important for efficient parallelism. Are you using any Julia packages for integration or interpolation?

I am using quadgk and interpolation routine. Julia Packages …
I think before the command “interpolation” can have some command that uses all cores, or to use some specific core and use another cores for the other 2 interpolations.

General problem statements can only have general answers and the general answer for how to do this kind of thing is the manual chapter on parallel computing, already linked. You may be able to use Threads.@spawn to get some parallel speedup. If you post actual code here, you’ll very likely get help from people to parallelize that code. Absent that, you’ll have to try it yourself and see how it goes.

3 Likes

I will write only the major commands to do what i want.
I hope it helps you to see what i am doing.

function interpQa()
  

     ag = 0.01:0.05:3    
     aσ = 0.03 : 0.025 : 0.63

    if PrimeiraVez

    mQ = [Q(g,sigma) for g in ag, sigma in aσ]


     writedlm("Qa_termo2020.dat",mQ)

    else

     mQ=readdlm("C:\\Users\\Lucas\\Desktop\\LUCAS\\Julia\\Qa_termo2020.dat")
     
    iQ = interpolate(mQ, BSpline(Cubic(Line())), OnGrid())    # Interpola na grade
    sQ = scale(iQ, ag, aσ)
    (x,y)->sQ[x,y]
end


function interpQb()
    
     ag = 0.01:0.05:3    # teste
     aσ = 0.03 : 0.025 : 0.63
     aα = 0:0.1:0.4
     aT= 0.1:0.05:0.4

    if PrimeiraVez
          mQ = [Qd(g,sigma,α,T) for g in ag, sigma in aσ, α in aα, T in aT]
          writedlm("Qb_termo2020.dat",mQ)
    else
          mQ=readdlm("C:\\Users\\Lucas\\Desktop\\LUCAS\\Julia\\Qb_termo2020.dat")
         
          mQ = reshape(mQ,(length(ag),length(aσ),length(aα),length(aT)))
    end

           # Matriz com os valores de Q na grade
    iQ = interpolate(mQ, BSpline(Cubic(Line())), OnGrid())    # Interpola na grade
    sQ = scale(iQ, ag, aσ, aα,aT)
    (x,y,z,w) -> sQ[x,y,z,w]
end


function interpQc()
    

    ag = 0.01:0.05:3    # teste
    aσ = 0.03 : 0.025 : 0.63


    if PrimeiraVez
    mQ = [Qi(g,sigma) for g in ag, sigma in aσ]
    writedlm("Qc_termo2020.dat",mQ)
   else
      mQ=readdlm("C:\\Users\\Lucas\\Desktop\\LUCAS\\Julia\\Qc_termo2020.dat")
      #mQ=readdlm("/media/lucas/Backup/Linux/Julia/Qc_todos.dat")
   end

         # Matriz com os valores de Q na grade
    iQ = interpolate(mQ, BSpline(Cubic(Line())), OnGrid())    # Interpola na grade
    sQ = scale(iQ, ag, aσ)
    (x,y)->sQ[x,y]
end






time0 = time()
println("Gerando a função Qa(g,sigma)")
Qa = interpQa()
tempod = (time() - time0)/60
println("Tempo decorrido: $tempod min")

time0 = time()
println("Gerando a função Qb(g,α,sigma,T) ")
Qb = interpQb()
tempod = (time() - time0)/60
println("Tempo decorrido: $tempod min")

time0 = time()
println("Gerando a função Qc(g,sigma)")
Qc = interpQc()
tempod = (time() - time0)/60
println("Tempo decorrido: $tempod min")

This post has instructions for how to properly quote your code:

Maybe take a look at @threads instead of the comprehension. For the three different computations why not just start 3 julia processes?

i want to implement a parallel computation to optimize my code.
6 hours is ok to wait. But 10 days is too much.

We’ll happily help speed up your code, but it’s important to boil down the code to the bare minimum and provide a completely reproducible example that can be pasted into the REPL without modification. It’s much harder to provide help if parts of the code are missing. As it is, you haven’t defined PrimeiraVez or Q, and the .dat directory is specific to your computer (it’s better to use tempdir() to create a temporary directory for testing). You’re also not using quadgk anywhere in the provided example.

To simplify, let’s focus on the 10 day code & ignore the 6 hour codes for the moment. Can you provide an example from the 10 hour code that can be run in a fresh Julia session without modification, as well as a measurement of how long it takes to run the example on your computer?

1 Like

Usually, one starts by figuring out where the most time is spent in your code.
This is called profiling. See e.g., here
Profiling · The Julia Language .
Why do you think that using more cores will significantly speed up your code?
It sounds like your code is on the math heavy side. In this case it is often way better to change the algorithm to fit your particular problem better. But again: without knowledge about what you are actually doing, it is nearly impossible to help you.

Even math-heavy code can benefit from multiple cores if (the hotspot determined from profiling) is sufficiently parallelizable. I don’t think anyone can tell whether that is the case from the OP’s description.

https://drive.google.com/drive/folders/111VCGNQIwiugwOMq94qExnBLAukplN0x?usp=sharing
the file “” principal"" is my program … another files is to complete the principal file. I think only "brent.jl"is using.

the file “” principal"" is my program … another files is to complete the principal file. I think only "brent.jl"is using. I hope you can help me

https://drive.google.com/drive/folders/111VCGNQIwiugwOMq94qExnBLAukplN0x?usp=sharing

I’m not able to access the document.

https://drive.google.com/drive/folders/111VCGNQIwiugwOMq94qExnBLAukplN0x?usp=sharing

sorry, it is ok now

At first glance, I see you do a lot of redundant calculations. For example:

function Φll(r,sigma)
     ((-(3*sigma.^3)/(r^3*(1+r)^2))-(6*sigma.^3)/(r^2*(1+r)^3)
                             -(9*sigma.^3)/(r*(1+r)^4)+(12*sigma.^3)/(1+r)^5
                             +(3*sigma.^9)/(10*r^3*(1+r)^8)
                             +(12*sigma.^9)/(5*r^2*(1+r)^9)
                             +(54*sigma.^9)/(5*r*(1+r)^10)
                             -(12*sigma.^9)/(1+r)^11+(9*sigma.^3)/((r-1)^4*r)
                             -(54*sigma.^9)/(5*(r-1)^10*r)
                             +(6*sigma.^3)/((r-1)^3*r^2)
                             -(12*sigma.^9)/(5*(r-1)^9*r^2)
                             +(3*sigma.^3)/((r-1)^2*r^3)
                             -(3*sigma.^9)/(10*(r-1)^8*r^3)
                             -(12*sigma.^3)/(r-1)^5+(12*sigma.^9)/(r-1)^11)
end

Along the lines of a better algorithm, if you actually are using quadgk, perhaps see http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.7.3656&rep=rep1&type=pdf. The authors claim it is faster and amenable to parallelization.

it is a complicated problem …
i have big and big equations … this in a expanded equation to simplify calculations …
my problem in to calculate the cross section Qb … it takes too long … i am running for 2 days the program … at least will take 10 days … can you parrallelize the Qb function ? or make it faster ?

you can easily simplify the problem if you use

function interpQb()

 ag = 0.01:0.05:3                       # use ag = 0.1:0.1:0.2 to go faster to test
 aσ = 0.03 : 0.025 : 0.63           #  use aσ = 0.03 : 0.01 : 0.04  to go faster to test
 aα = 0:0.1:0.4                           # use aα = 0:0.1:0.2  to go faster to test
 aT= 0.1:0.05:0.4                        # use  aT= 0.1:0.1:0.2  to go faster to test

Looking at interpQb, the bulk of the runtime is spent looping over Qd, so let’s start by optimizing that at a single representative point in the provided range:

julia> @btime Qd(.1, .1, .1, .1)
  853.414 ms (139498 allocations: 8.26 MiB)
5.978500817052537

You’re solving some nested integrals to low precision, and there’s root-finding within the integrand. The quickest speedup comes from switching from your brent.jl to Roots.jl’s built-in Brent() method. Beyond that, @fastmath helps, and there’s some minor benefit to pre-computing σ3 = σ^3 and σ9 = σ^9 in Φl and Φll. With those changes, I get

julia> @btime Qd(.1, .1, .1, .1)
  70.482 ms (139303 allocations: 8.26 MiB)
5.978500817052536

Using Threads.@threads to distribute work,

function calcQb()
    ag = 0.01:0.05:3    # teste
    aσ = 0.03 : 0.025 : 0.63
    aα = 0:0.1:0.4
    aT= 0.1:0.05:0.4
    a = collect(Iterators.product(ag, aσ, aα, aT))
    Qb = zeros(size(a))
    Threads.@threads for i in eachindex(a)
        Qb[i] = Qd(a[i]...)
    end
    return Qb
end

Iterating over the full range of 52500 elements took 1056 seconds, or 20 ms/element, which is already a ~42x speedup over the original code.

There’s an additional opportunity for speedup here:

function Qd(g,σ,α,T)
    rc0, bc0, gc0 = rc0bc0gc0(σ)
    if g > gc0
         int = quadgk(b->dQ(b,g,σ), 0, Inf, atol=1e-3)[1]
    else ...

You can calculate rc0bc0gc0(σ) over your range of σ from the top-level function, and determine which items satisfy g > gc0, in which case the integral doesn’t depend on α or T. That allows you to skip a bunch of redundant calculations, and skipping is much faster than even the fastest algorithm.

7 Likes

MWE (note that it’s possible to trim down the code to about a third the size of what you shared): https://gist.github.com/stillyslalom/6d44489240a85e8148c4cc5c65b305db

1 Like