How can i improve the performance of the code?

I’m a newbie of julia, and I want to improve the performance of the code included massive iteration and calculation.

I
Even I red the manual, but I have no ability to adapt all of the solutions…

Here is simplified code.

using DelimitedFiles
using Plots

global main=readdlm("main.txt"); 
global main=reshape(main,150,27,55) 

function Iteration()
#Temporary parameter
Pres=760; T1=500; T2=500; T3=5000;
    experiment=readdlm("experiment.txt")
    experiment=ifelse.(isnan.(experiment), 0, experiment)
    main_raw=experiment[:,1]; I_raw=experiment[:,2];
    main_start=findmin(main_raw); main_end=findmax(main_raw); 
    main_start=main_start[1]; main_end=main_end[1]
    if main_start<300; main_start=300; end
    if main_end>1200; main_end=1200; end
    main_div=trunc(Int,(100*(main_end-main_start)))

#call parameters
    para_A2C=readdlm("para_A2C.txt");
    a1=para_A2C[:,1]; Ya1=para_A2C[:,2]; Ba1=para_A2C[:,3]; Da1=para_A2C[:,4]; wa1=para_A2C[:,5]; wb1=para_A2C[:,6]; wc1=para_A2C[:,7]; Ea1=para_A2C[:,8]; JG1=para_A2C[:,9]; DAN1=para_A2C[:,10]; 
    para_A2B=readdlm("para_A2B.txt");
    a2=para_A2B[:,1]; Ya2=para_A2B[:,2]; Ba2=para_A2B[:,3]; Da2=para_A2B[:,4]; wa2=para_A2B[:,5]; wb2=para_A2B[:,6]; wc2=para_A2B[:,7]; Ea2=para_A2B[:,8]; JG2=para_A2B[:,9]; DAN2=para_A2B[:,10];
    CAR=readdlm("CAR.txt"); 
    Re=readdlm("Re.txt"); 
    MainCar=readdlm("MainCar.txt");


    main_div=1000; #temporary
#### sub-coefficient calculation ####
Rv=Array{Float64,2}(undef,5,11);
Gv=Array{Float64,2}(undef,5,11);
I_sharp=Array{Float64,2}(undef,150,27);
I_broad=Array{Float64,1}(undef,main_div);

        for A_2C=1:1:5;  
            for A_2B=1:1:11; 
                Rv[A_2C,A_2B]=(wa1[A_2C]*(a1[A_2C]+0.5)-wb1[A_2C]*(a1[A_2C]+0.5)^2+wc1[A_2C]*(a1[A_2C]+0.5)^3)-(wa2[A_2B]*(a2[A_2B]+0.5)-wb2[A_2B]*(a2[A_2B]+0.5)^2+wc2[A_2B]*(a2[A_2B]+0.5)^3); 
                Gv[A_2C,A_2B]=wa1[A_2C]*(a1[A_2C]+0.5)-wb1[A_2C]*(a1[A_2C]+0.5)^2+((a1[A_2C]+0.5)^3);
            end
        end

        main_start=390; main_end=400; main_div=1000; #temporary
        main_shift=zeros(55); #temporary
        main_cal=LinRange(main_start,main_end,main_div); k=1:1:main_div

# 'crazy bottle neck'!!!
            for A_2C=1:1:5  #A_2C 5 
                for A_2B=1:1:5 #A_2B 11
                    numb_v=(A_2B+11*(A_2C-1))
                    for branch=1:1:27
                        for J1=1:1:150 # 150
                            I_sharp[J1,branch]= Re[A_2C,A_2B]^2+*(1/(main[J1,branch,numb_v].-main_shift[numb_v])^4)*CAR[A_2C,A_2B]*MainCar[J1*exp(-1/T2)
                            I_broad[k]=I_broad[k]+I_sharp[J1,branch].*exp.(-1*(main_cal[k].-(main[J1,branch,numb_v].-main_shift[numb_v])))
                        end
                    end
                end
            end
    end

In particular, most of the time is spent in the section where the ‘crazy bottle neck’ i denoted. (4 times for)
Please share your experiences and/or good paper (materials).

Please read; Performance Tips · The Julia Language

First thing is NOT to use global variables. Define them as const or - better - pass them as variables to your function.

1 Like

As @ufechner7 mentioned, the main issue with this code is use of the global variable main. Easiest way to fix this is to pass it in as an argument to the function.

function Iteration(main)
    #.... Your code
end

And then call it

main=readdlm("main.txt"); 
main=reshape(main,150,27,55)
Iteration(main)

You can add an @inbounds to the outermost for loop to turn of bounds checking to speed up the code. I would only recommend this once you are sure the code works as it won’t error if you do something unsafe.

There are packages like LoopVectorization.jl which provide macros to try and speed up loops too that might be worth checking out.

There are likely other issues but the biggest problem is the global variable.

The reason global variables are bad for performance is that compilation is done on a function basis. When your Iteration function is compiled, the compiler assumes the global variable main can change its type at any time. So in your case, every time main is accessed in the inner loop, a check is done to find the type of the variable, and the right way to index it is looked up. This kills performance very efficiently in your otherwise quite clean loop.

I included the variables ‘main’ in the function Iteration(), before. And compare to before, it (the code I uploaded) improve the speed about 20 %.
Could you give me an example?

Can you share the file main.txt ? Then we could play a bit with your code to try to reproduce your issue.

Sorry to late reply. I could not find the way to upload the text file. however, it would be not different with rand(222750) in point of programming view. Over 5 million allocation is occuring in my code, and 4 millions are occured in ‘Crzay bottle neck’.
In addition, I have another question. Idealized code shows low allocation (close to 0)?

Zero allocations are usually hard to achieve and only in rare cases worth the effort, but reducing allocations by a factor of 5 to 10 is often not too hard by pre-allocating arrays and using views, in-place operators and in-place functions.

Did you try to compress it first and than upload it to gist: https://gist.github.com ?
Docu: Creating gists - GitHub Docs

I should pre-allocating array and using view, maybe.
The link is follows:

Is it work?