Can the compiler automatically optimize this code?

song_tebo · December 22, 2018, 3:57pm

The code below is very slow:

w=rand(300,300);x=rand(300,300);
r=vec(rand(300,1));b=rand(300,1);
j=12;N=300;

function orig()    
    sAll=0.0;k=1;
    while  k<=1000000
        sAll=sAll+sum(w[:,j].*(r + b[j]*x[:,j]));
        k=k+1;
    end
    return sAll;
end

And I write a helper function to speed it up:

function w_r_b_x_n(w,r,b,x,j,N)
    sAll=0.0;k=1;
    while  k<=1000000
        s=0.0;i=1;bj=b[j];
        while  i<=N
            @inbounds s+=(w[i,j]*(r[i] + bj*x[i,j])); 
            i=i+1;
        end
        k=k+1;sAll=sAll+s;
    end
    return sAll;
end

I wrote a bunch of similar magical helper functions. Can the compiler automatically optimize this code?

stillyslalom · December 22, 2018, 4:12pm

Without information about the variables you’re using, it’s impossible to know. Can you produce a reproducible example?

rdeits · December 22, 2018, 4:14pm

Please take a look at Please read: make it easier to help you otherwise it will be very hard to provide meaningful help.

song_tebo · December 22, 2018, 5:03pm

Thanks a lot. This is the test code:

w=rand(300,300);x=rand(300,300);
r=vec(rand(300,1));b=rand(300,1);
j=12;N=300;

function orig()    
    sAll=0.0;k=1;
    while  k<=1000000
        sAll=sAll+sum(w[:,j].*(r + b[j]*x[:,j]));
        k=k+1;
    end
    return sAll;
end

function w_r_b_x_n(w,r,b,x,j,N)
    sAll=0.0;k=1;
    while  k<=1000000
        s=0.0;i=1;bj=b[j];
        while  i<=N
            @inbounds s+=(w[i,j]*(r[i] + bj*x[i,j])); 
            i=i+1;
        end
        k=k+1;sAll=sAll+s;
    end
    return sAll;
end

@time orig()
@time w_r_b_x_n(w,r,b,x,j,N)

Seif_Shebl · December 22, 2018, 5:21pm

Please add three backtics like this ``` before and after your code.

You needn’t write all your code in loops to achieve performance. You can use the sum intrinsic which is very fast and more accurate, just take care of unnecessary allocations. Write the sum like this:

sum( @. @views w[:,j] * (r + b[j]*d[:,j]) )

and it will be almost as fast as the loop.

Curerntly, @views still allocates, but this may change in the future and you will not need to write loops for performance.

nalimilan · December 22, 2018, 6:50pm

You can also do sum(w[i, j] * (r + b[j] * x[i, j]) for i in 1:N) to avoid all allocations.

(BTW, instead of while and the manual handling of i, better do for i in 1:N.)

Seif_Shebl · December 22, 2018, 8:34pm

If I get it correctly, I think it will not give the exact same result as sum(array), see this for example.

nalimilan · December 22, 2018, 9:36pm

Indeed currently it won’t, but depending on the situation it may or may not matter.

StefanKarpinski · December 26, 2018, 4:16pm

Seems like making all your globals const would fix performance more easily.

Topic		Replies	Views
Slow code to compute b=A*x Performance question	7	158	July 3, 2024
Compiler Can't Optimize Away Unnecessary Memory Allocs with Convenience Variables Performance	14	646	May 3, 2023
Compiler optimizations with broadcast or map Performance compilation	5	2493	June 27, 2020
Small functions - best practices New to Julia question , compilation , function	5	812	February 8, 2021
Reduce memory allocated in array view and in place sum Performance question	12	689	November 10, 2023

Can the compiler automatically optimize this code?

Related topics