Its possible to use all cores or parallel cores to do some project/calculations?

https://drive.google.com/drive/folders/111VCGNQIwiugwOMq94qExnBLAukplN0x?usp=sharing

sorry, it is ok now

At first glance, I see you do a lot of redundant calculations. For example:

function Φll(r,sigma)
     ((-(3*sigma.^3)/(r^3*(1+r)^2))-(6*sigma.^3)/(r^2*(1+r)^3)
                             -(9*sigma.^3)/(r*(1+r)^4)+(12*sigma.^3)/(1+r)^5
                             +(3*sigma.^9)/(10*r^3*(1+r)^8)
                             +(12*sigma.^9)/(5*r^2*(1+r)^9)
                             +(54*sigma.^9)/(5*r*(1+r)^10)
                             -(12*sigma.^9)/(1+r)^11+(9*sigma.^3)/((r-1)^4*r)
                             -(54*sigma.^9)/(5*(r-1)^10*r)
                             +(6*sigma.^3)/((r-1)^3*r^2)
                             -(12*sigma.^9)/(5*(r-1)^9*r^2)
                             +(3*sigma.^3)/((r-1)^2*r^3)
                             -(3*sigma.^9)/(10*(r-1)^8*r^3)
                             -(12*sigma.^3)/(r-1)^5+(12*sigma.^9)/(r-1)^11)
end

Along the lines of a better algorithm, if you actually are using quadgk, perhaps see http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.7.3656&rep=rep1&type=pdf. The authors claim it is faster and amenable to parallelization.

it is a complicated problem …
i have big and big equations … this in a expanded equation to simplify calculations …
my problem in to calculate the cross section Qb … it takes too long … i am running for 2 days the program … at least will take 10 days … can you parrallelize the Qb function ? or make it faster ?

you can easily simplify the problem if you use

function interpQb()

 ag = 0.01:0.05:3                       # use ag = 0.1:0.1:0.2 to go faster to test
 aσ = 0.03 : 0.025 : 0.63           #  use aσ = 0.03 : 0.01 : 0.04  to go faster to test
 aα = 0:0.1:0.4                           # use aα = 0:0.1:0.2  to go faster to test
 aT= 0.1:0.05:0.4                        # use  aT= 0.1:0.1:0.2  to go faster to test

Looking at interpQb, the bulk of the runtime is spent looping over Qd, so let’s start by optimizing that at a single representative point in the provided range:

julia> @btime Qd(.1, .1, .1, .1)
  853.414 ms (139498 allocations: 8.26 MiB)
5.978500817052537

You’re solving some nested integrals to low precision, and there’s root-finding within the integrand. The quickest speedup comes from switching from your brent.jl to Roots.jl’s built-in Brent() method. Beyond that, @fastmath helps, and there’s some minor benefit to pre-computing σ3 = σ^3 and σ9 = σ^9 in Φl and Φll. With those changes, I get

julia> @btime Qd(.1, .1, .1, .1)
  70.482 ms (139303 allocations: 8.26 MiB)
5.978500817052536

Using Threads.@threads to distribute work,

function calcQb()
    ag = 0.01:0.05:3    # teste
    aσ = 0.03 : 0.025 : 0.63
    aα = 0:0.1:0.4
    aT= 0.1:0.05:0.4
    a = collect(Iterators.product(ag, aσ, aα, aT))
    Qb = zeros(size(a))
    Threads.@threads for i in eachindex(a)
        Qb[i] = Qd(a[i]...)
    end
    return Qb
end

Iterating over the full range of 52500 elements took 1056 seconds, or 20 ms/element, which is already a ~42x speedup over the original code.

There’s an additional opportunity for speedup here:

function Qd(g,σ,α,T)
    rc0, bc0, gc0 = rc0bc0gc0(σ)
    if g > gc0
         int = quadgk(b->dQ(b,g,σ), 0, Inf, atol=1e-3)[1]
    else ...

You can calculate rc0bc0gc0(σ) over your range of σ from the top-level function, and determine which items satisfy g > gc0, in which case the integral doesn’t depend on α or T. That allows you to skip a bunch of redundant calculations, and skipping is much faster than even the fastest algorithm.

7 Likes

MWE (note that it’s possible to trim down the code to about a third the size of what you shared): https://gist.github.com/stillyslalom/6d44489240a85e8148c4cc5c65b305db

1 Like

:rofl: :+1:

Great solution!

I will replace the code … but, another question

what about my interpolate funcion ? the .dat that i have do save in my computer to use further ?
i will not use the

if first_time_using 
    mQ = [Qi(g,sigma) for g in ag, sigma in aσ]
    writedlm("Qc_termo2020.dat",mQ)
   else  
      mQ=readdlm("C:\\Users\\Lucas\\Desktop\\LUCAS\\Julia\\Qc_termo2020.dat")
iQ = interpolate(mQ, BSpline(Cubic(Line())), OnGrid())    # Interpola na grade
    sQ = scale(iQ, ag, aσ)
    (x,y)->sQ[x,y]

how can i substitute that you right in my own program to have save the file ?

The interpolation should be orders of magnitude faster than the initial calculation, so I didn’t spend any time optimizing it. calcQb() should do the same thing as [Qd(g,sigma,α,T) for g in ag, sigma in aσ, α in aα, T in aT], so you can just replace that array comprehension with a function call.

So i used interp instead of calcQb() ?

like this :slightly_smiling_face:

using  NLsolve, Plots, QuadGK, Interpolations, QuadGK, Roots, ProgressMeter


include("brent.jl")


aRe=  [0.5;1;1.5;2;2.5;3;3.5;4;4.5;5;5.5;6;6.5;7;7.5;8;8.5;9;9.5;10]
aTe=[0.1;0.2;0.3;0.4]

lenR = length(aRe)
aσ = 0.388./aRe

Qb = []     # Variável onde será atribuída a função Qb(g,α,sigma)


First_time_using = true


brent(f::F, a, b) where {F} = find_zero(f, (a, b), Roots.Brent())

@fastmath function Φ(r,σ)
    return (2/15*σ.^9*(1/(r-1)^9-1/(r+1)^9-9/8/r*(1/(r-1)^8-1/(r+1)^8))-
                   σ.^3*(1/(r-1)^3-1/(r+1)^3-3/2/r*(1/(r-1)^2-1/(r+1)^2)))
end

@fastmath function Φl(r,σ)
    σ3 = σ^3
    σ9 = σ^9
    ((3*σ3)/(2*r^2*(1+r)^2)+(3*σ3)/(r*(1+r)^3)-(3*σ3)/(1+r)^4
                             -(3*σ9)/(20*r^2*(1+r)^8)
                             -(6*σ9)/(5*r*(1+r)^9)
                             +(6*σ9)/(5*(1+r)^10)
                             -(3*σ3)/((r-1)^3*r)
                             +(6*σ9)/(5*(r-1)^9*r)
                             -(3*σ3)/(2*(r-1)^2*r^2)
                             +(3*σ9)/(20*(r-1)^8*r^2)
                             +(3*σ3)/(r-1)^4-(6*σ9)/(5*(r-1)^10))
end

@fastmath function Φll(r,σ)
    σ3 = σ^3
    σ9 = σ^9
    ((-(3*σ3)/(r^3*(1+r)^2))-(6*σ3)/(r^2*(1+r)^3)
                              -(9*σ3)/(r*(1+r)^4)+(12*σ3)/(1+r)^5
                              +(3*σ9)/(10*r^3*(1+r)^8)
                              +(12*σ9)/(5*r^2*(1+r)^9)
                              +(54*σ9)/(5*r*(1+r)^10)
                              -(12*σ9)/(1+r)^11+(9*σ3)/((r-1)^4*r)
                              -(54*σ9)/(5*(r-1)^10*r)
                              +(6*σ3)/((r-1)^3*r^2)
                              -(12*σ9)/(5*(r-1)^9*r^2)
                              +(3*σ3)/((r-1)^2*r^3)
                              -(3*σ9)/(10*(r-1)^8*r^3)
                              -(12*σ3)/(r-1)^5+(12*σ9)/(r-1)^11)
end

@fastmath Veff(r,b,g,σ) = Φ(r,σ)+b^2*g^2/r^2
@fastmath Veffl(r,b,g,σ) = Φl(r,σ)-2b^2*g^2/r^3
@fastmath Veffll(r,b,g,σ) = Φll(r,σ)+6b^2*g^2/r^4

@fastmath function rc0bc0gc0(σ)
    rc0 = brent(r->3Φl(r,σ)+r*Φll(r,σ),1.02,5)
    gc0= sqrt(Φ(rc0,σ)+rc0*Φl(rc0,σ)/2)
    bc0 = rc0*sqrt(gc0^2-Φ(rc0,σ))/gc0
    rc0, bc0, gc0
end

@fastmath function frbc(g, σ)
    rc0,bc0,gc0 = rc0bc0gc0(σ)
     if g == gc0 return rc0, bc0 end
     if g > gc0 || g == 0 error("g = $g, there is no critical b") end
     rc = rc0
     f(r,σ) = Φ(r,σ)+r*Φl(r,σ)/2-g^2
     while f(rc0,σ)*f(rc,σ)>0 rc = rc+1 end
     rc = brent(r->f(r,σ),rc0,rc)
     bc = sqrt(rc^2*(g^2-Φ(rc,σ))/g^2)
     rc, bc
end

@fastmath function Rmin(b,g,σ)
    if b > 100    # Este caso é quando Φ -> 0 e g^2r^2-b^2g^2+Φ=0
         rmin = b
         return rmin
    end
    rc0, bc0, gc0 = rc0bc0gc0(σ)
    if g == gc0 return rc0 end    # É o caso de b e g críticos
    if g > gc0        # Neste caso não tem extremos
        rinicial = nextfloat(1.0)
        rfinal = rinicial
        while (g^2-Veff(rinicial,b,g,σ))*(g^2-Veff(rfinal,b,g,σ)) > 0 rfinal = rfinal+1 end
         rmin = brent(r->g^2-Veff(r,b,g,σ),rinicial,rfinal)
         return rmin
    end
    # Neste caso Veff tem máximo e mínimo e temos b crítico
    rc, bc = frbc(g, σ)
    if b == bc return rc end
    if b < bc
        rinicial = nextfloat(1.0)
        rfinal = rinicial
        while (g^2-Veff(rinicial,b,g,σ))*(g^2-Veff(rfinal,b,g,σ)) > 0 rfinal = rfinal+1 end
         rmin = brent(r->g^2-Veff(r,b,g,σ),rinicial,rfinal)
    else
        rfinal = rc
        while (g^2-Veff(rc,b,g,σ))*(g^2-Veff(rfinal,b,g,σ)) > 0 rfinal = rfinal+1 end
         rmin = brent(r->g^2-Veff(r,b,g,σ),rc,rfinal)
    end
    rmin
end

@fastmath function dX(r,b,g,σ)
    if g == 0 return 0 end
    rmin = Rmin(b,g,σ)
   den1 = g^2 - Veff(r,b,g,σ)
    den2 = -Veffl(rmin,b,g,σ)*(r-rmin)
    # if den1<0 || den2 < 0
    #       warn("g^2-Veff = $den1 -Veffl*(r-rmin) = $den2, r = $r, b = $b, g = $g")
    #  end
    den1 = abs(den1)
    den2 = abs(den2)
    # The lines below are
    # 2b*g*/r^2(1/sqrt(den1)-1/sqrt(den2))
    t0 = 2*b*g/r^2
    t1 = 1/sqrt(den1)
   t2 = 1/sqrt(den2)
    if abs(t2/t1) < 0.5
        res = t0*t1*exp(log1p(-t2/t1))
    else
        res = t0*(t1-t2)
    end
    res
end

@fastmath function X(b,g,σ)
    if g > 1e5 return 0 end
    if g == 0 || b == 0 return π end
    rc0, bc0, gc0 = rc0bc0gc0(σ)
    if b == bc0 && g == gc0 return -Inf end
    if 0 < g <= gc0
         rc, bc = frbc(g, σ)
         if b == bc return -Inf end
    end
    rmin = Rmin(b,g,σ)
    if b > 100 return 0 end
    if g < 0.2 && b > bc+0.2
          return -4π*σ^3/g^2/b^6
    end
    int = 2b*g*pi/(2rmin^(3/2)*sqrt(abs(Veffl(rmin,b,g,σ))))
    try
         int = int + quadgk(r->dX(r,b,g,σ), rmin, Inf, atol=1e-3)[1]
    catch
         println("b=$b, g=$g")
         int = int + quadgk(r->dX(r,b,g,σ), BigFloat(rmin), Inf, atol=1e-3)[1]
    end
    pi-int
end

@fastmath function dQ(b,g,σ)
    2*(1-cos(X(b,g,σ)))*b
end

@fastmath function dQd(b,g,σ,α,T)
    2*(1+1/sqrt(1+α)*sqrt(π*T)/2g*sin(X(b,g,σ)/2))*b
end

@fastmath function Qd(g,σ,α,T)
    if g > 1e5 return 0 end
    if g == 0 return Inf end
    rc0, bc0, gc0 = rc0bc0gc0(σ)
    if g > gc0
         int = quadgk(b->dQ(b,g,σ), 0, Inf, atol=1e-3)[1]
    else
         rc, bc = frbc(g, σ)
         int = quadgk(b->dQd(b,g,σ,α,T), 0, bc-1e-3, atol=1e-3)[1]
         int = int + quadgk(b->dQ(b,g,σ), bc+1e-3, Inf, atol=1e-3)[1]
    end
    int
end

function interpQb()    

     ag = 0.01:0.05:3    
     aσ = 0.03 : 0.025 : 0.63
     aα = 0:0.1:0.4
     aT= 0.1:0.05:0.4

    if First_time_using
          mQ = [Qd(g,sigma,α,T) for g in ag, sigma in aσ, α in aα, T in aT]
          writedlm("Qb_termo2020.dat",mQ)
    else
          mQ=readdlm("C:\\Users\\Lucas\\Desktop\\LUCAS\\Julia\\Qb_termo2020.dat")
          #mQ=readdlm("/media/lucas/Backup/Linux/Julia/Qb_todos.dat")
          mQ = reshape(mQ,(length(ag),length(aσ),length(aα),length(aT)))
    end

           # Matriz com os valores de Q na grade
    iQ = interpolate(mQ, BSpline(Cubic(Line())), OnGrid())    # Interpola na grade
    sQ = scale(iQ, ag, aσ, aα,aT)
    (x,y,z,w) -> sQ[x,y,z,w]
end





time0 = time()
println("Generating  Qb(g,α,sigma,T) ")
Qb = interpQb()
tempod = (time() - time0)/60
println("Tempo decorrido: $tempod min")

calc(Qb) is not faster than interpolationQb, ok ?
but calc(Qb) does not generate a .dat file to save my calculations

I’m confused - what’s your question?

I will keep with

function interpQb()    

     ag = 0.01:0.05:3    
     aσ = 0.03 : 0.025 : 0.63
     aα = 0:0.1:0.4
     aT= 0.1:0.05:0.4

    if First_time_using
          mQ = [Qd(g,sigma,α,T) for g in ag, sigma in aσ, α in aα, T in aT]
          writedlm("Qb_termo2020.dat",mQ)
    else
          mQ=readdlm("C:\\Users\\Lucas\\Desktop\\LUCAS\\Julia\\Qb_termo2020.dat")
          #mQ=readdlm("/media/lucas/Backup/Linux/Julia/Qb_todos.dat")
          mQ = reshape(mQ,(length(ag),length(aσ),length(aα),length(aT)))
    end

           # Matriz com os valores de Q na grade
    iQ = interpolate(mQ, BSpline(Cubic(Line())), OnGrid())    # Interpola na grade
    sQ = scale(iQ, ag, aσ, aα,aT)
    (x,y,z,w) -> sQ[x,y,z,w]
end

time0 = time()
println("Generating  Qb(g,α,sigma,T) ")
Qb = interpQb()
tempod = (time() - time0)/60
println("Tempo decorrido: $tempod min")

and do not use this :

function calcQb()
    ag = 0.01:0.05:3    # teste
    aσ = 0.03 : 0.025 : 0.63
    aα = 0:0.1:0.4
    aT= 0.1:0.05:0.4
    a = collect(Iterators.product(ag, aσ, aα, aT))
    out = zeros(size(a))
    pm = Progress(length(out))

    Threads.@threads for i in eachindex(a)
        out[i] = Qd(a[i]...)
        next!(pm)
    end

    return out
end

because in the function calcQb() you use more threads and do not save any file .dat for me.

Well, you wanted to speed things up, so I’d recommend sticking to Threads.@threads. You need to recognize that calcQb() does the same thing as [Qd(g,sigma,α,T) for g in ag, sigma in aσ, α in aα, T in aT], so you can just replace the latter with a function call to the former, and keep the rest of interpQb() as is.

In the end of the program, how can i combine my text and yours, to use calcQb() to save a file .dat for me with all my calculations, to use it further?

I explained how to combine them in my previous answers. It’s analogous to going from

Q(a, b, c) = atan(a, b)/c

function f()
    a = 0:.1:1
    b = 2:.2:4
    c = 4:.1:5

    datpath = "abc.dat"
    if ispath(datpath) # check if file already exists
        abc0 = readdlm(datpath)
        abc = reshape(abc0, size(Iterators.product(a, b, c)))
    else
        abc = [Q(ai, bi, ci) for ai in a, bi in b, ci in c]
        writedlm(datpath, abc)
    end

    iabc = interpolate(abc, BSpline(Cubic(Line(OnGrid()))))
    sabc = scale(iabc, a, b, c)
    (x, y, z) -> sabc[x, y, z]
end

to

function Qthreaded(a, b, c)
    abcrange = collect(Iterators.product(a, b, c))
    out = zeros(size(abcrange))
    Threads.@threads for i in eachindex(abcrange)
        out[i] = Q(abcrange[i]...)
    end

    return out
end

function g(a, b, c, datpath)
    if ispath(datpath) # check if file already exists
        abc0 = readdlm(datpath)
        abc = reshape(abc0, size(Iterators.product(a, b, c)))
    else
        abc = Qthreaded(a, b, c)
        writedlm(datpath, abc)
    end

    iabc = interpolate(abc, BSpline(Cubic(Line(OnGrid()))))
    sabc = scale(iabc, a, b, c)
    (x, y, z) -> sabc[x, y, z]
end

I can show you how exactly to do it, but I can’t make you understand it - that’s your job. If there are parts you don’t understand, ask about them.

1 Like

Hello man … I cannot change my problem to adequate your solution.
Because i do not understand abc. Is abc = g sigma alfa ?
My problem is to evaluate a table of Qd(g,sigma,α,T) to integrate further.
But i try for hours to read your program changes and adequate to mine, using variables g, sigma, alfa and T for the values

     ag = 0.01:0.05:3    g in ag
     aσ = 0.03 : 0.025 : 0.63  σ in aσ
     aα = 0:0.1:0.4    α in aα
     aT= 0.1:0.05:0.4  T in aT

This command :slightly_smiling_face:

 if First_time_using
          mQ = [Qd(g,sigma,α,T) for g in ag, sigma in aσ, α in aα, T in aT]
          writedlm("Qb_termo2020.dat",mQ)
    else
          mQ=readdlm("C:\\Users\\Lucas\\Desktop\\LUCAS\\Julia\\Qb_termo2020.dat")
          #mQ=readdlm("/media/lucas/Backup/Linux/Julia/Qb_todos.dat")
          mQ = reshape(mQ,(length(ag),length(aσ),length(aα),length(aT)))
    end

and this :

time0 = time()
println("Generating  Qb(g,α,sigma,T) ")
Qb = interpQb()
tempod = (time() - time0)/60
println("Tempo decorrido: $tempod min")

Gives me the .dat file that i want …

but i cannot understand you g function, f function and Qthread function …

I cannot use this and place this news functions to solve my problem … Can you help me with that ?

Yes, a, b, c in my example are analogous to ag, aσ, aα, aT in your code. You’re using an array comprehension (like [func(x, y) for x in X, y in Y]) to fill in the elements of Qb, and comprehensions can’t currently be multithreaded. To allow threading, you have to think about what the loop comprehension is actually doing, and figure out how to do it with threads-compatible constructs like for loops. I’ll walk you through my thought process.

First, I recognized that an array comprehension can always be rewritten as a for loop. Let’s check:

julia> X, Y = 1:3, 10:30 # variables are just an example, nothing to do w/ your code
(1:3, 10:30)

julia> [x + y for x in X, y in Y] # using a comprehension
3×3 Array{Int64,2}:
 11  21  31
 12  22  32
 13  23  33

julia> out = zeros(Int, length(X), length(Y)); # preallocate output array

julia> for i in eachindex(X) # using for loops
           for j in eachindex(Y)
               out[i, j] = x[i] + y[j]
           end
       end

julia> out
3×3 Array{Int64,2}:
 11  21  31
 12  22  32
 13  23  33

Then the question is how to make it multithreaded. I could just put a Threads.@threads on any one of the nested loops, but that could be inefficient if the number of threads is close (but not equal) to the number of elements, which might lead to some threads sitting idle while the others work. Instead, I went with Iterators.product, which allows you to get elements one-by-one, and also has the benefit of having the same output shape as an array comprehension.

julia> prodXY = Iterators.product(X, Y)
Base.Iterators.ProductIterator{Tuple{UnitRange{Int64},StepRange{Int64,Int64}}}((1:3, 10:10:30))

julia> size(prodXY)
(3, 3)

julia> for p in prodXY
           @show p
       end
p = (1, 10)
p = (2, 10)
p = (3, 10)
p = (1, 20)
p = (2, 20)
p = (3, 20)
p = (1, 30)
p = (2, 30)
p = (3, 30)

Unfortunately, we can’t yet get individual elements from an iterator like we can with an array, so I had to use collect to turn prodXY into an array.

julia> prodXY[1, 1]
ERROR: MethodError: no method matching getindex(::Base.Iterators.ProductIterator{Tuple{UnitRange{Int64},StepRange{Int64,Int64}}}, ::Int64, ::Int64)

julia> cXY = collect(prodXY)
3×3 Array{Tuple{Int64,Int64},2}:
 (1, 10)  (1, 20)  (1, 30)
 (2, 10)  (2, 20)  (2, 30)
 (3, 10)  (3, 20)  (3, 30)
julia> Threads.@threads for i in eachindex(cXY)
           x, y = cXY[i]
           out[i] += x + y
       end

julia> out
3×3 Array{Int64,2}:
 22  42  62
 24  44  64
 26  46  66

Great, it’s multithreaded! The same principle applies to your problem - first, create the product iterator and preallocate an output array.

    ag = 0.01:0.05:3    # teste
    aσ = 0.03 : 0.025 : 0.63
    aα = 0:0.1:0.4
    aT= 0.1:0.05:0.4
    a = collect(Iterators.product(ag, aσ, aα, aT))
    mQ = zeros(size(a))

Now use Threads.@threads to iterate over each element of the inputs, calling Qd(g, σ, α, T) and putting its output into your preallocated output array.

   Threads.@threads for i in eachindex(a)
        g, σ, α, T = a[i]
        mQ[i] = Qd(g, σ, α, T)
    end

Now that the output array is filled, you can save it to disk with writedlm("Qb_termo2020.dat, mQ).

1 Like

Great. And in the final, how can I interpolate the terms ? to have a continuous values of Qb …
like this:

`writedlm("Qb_termo2020.dat, mQ)` .
iQ = interpolate(mQ, BSpline(Cubic(Line())), OnGrid())    
    sQ = scale(iQ, ag, aσ, aα,aT)
    (x,y,z,w) -> sQ[x,y,z,w]

end

time0 = time()
println("Generating Qb(g,α,sigma,T) ")
Qb = interpQb()
tempod = (time() - time0)/60
println("total time: $tempod min")

That part should be no different from your original function.