Making a Function FASTER!

marianoarnaiz · November 12, 2020, 11:04am

Hi everyone. I am new to Julia. Just started a week ago.

I am trying to do a Monte Carlo kind of code and these functions are a part of it.

function GetTime(ray0, F_Sp)
    Snodes=zeros(size(ray0,1),1); #iniciate variable
    #Loop to evaluate slowness
    for ri=1:size(ray0,1)
        Snodes[ri]=F_Sp(ray0[ri,1],ray0[ri,2],ray0[ri,3]);
    end
    #mean slowness in every path
    B=(Snodes[1:end-1].+Snodes[2:end]).*0.5;
    #Distance between nodes
    C=((diff(ray0[:,1]).*diff(ray0[:,1]))+(diff(ray0[:,2]).*diff(ray0[:,2]))+(diff(ray0[:,3]).*diff(ray0[:,3]))).^(1/2);
    #traveltime
    T=sum(B.*C);
    return T
end

function RayBender(ray0,dz,dray,zmax,dr,F_Sp)
    T0=GetTime(ray0, F_Sp);
movedz=dz*dray;
NCC=0;
while NCC<dr-2
    NCC=0;
    for p=2:dr-1
        #Bend a point of the ray up and down
        #Also get the travel time of each test
        rayz1=[ray0[1:p-1,3] ; ray0[p,3]+movedz; ray0[p+1:end,3]];
        #rayz1[rayz1.>zmax] .= zmax;
        T1=GetTime([collect(ray0[:,1]) collect(ray0[:,2]) collect(rayz1)],F_Sp);
        rayz2=[ray0[1:p-1,3] ; ray0[p,3]-movedz; ray0[p+1:end,3]];
        T2=GetTime([collect(ray0[:,1]) collect(ray0[:,2]) collect(rayz2)],F_Sp);
        if T1<T0 && T1<T2 # If RAY 1 is the solution
            ray0=[collect(ray0[:,1]) collect(ray0[:,2]) collect(rayz1)];
            T0=T1;
            #println("T1")
        elseif T2<T0 && T2<T1 # If RAY 2 is the solution
            ray0=[collect(ray0[:,1]) collect(ray0[:,2]) collect(rayz2)];
            T0=T2;
            #println("T2")
        else
            NCC=NCC+1; # Keep Count of NO CHANGES
        end
    end
end
return T0,ray0

end

I have a path (ray) that crosses a box of velocity. F_Sp is the interpolated 3D slowness function at any point.
I need to call them like 2e14 times so i need it to be VERY FAST.

Can any one give me some tips?

The first one just calculates the travel time and the second one bends an initially linear ray into the real one.

marianoarnaiz · November 12, 2020, 11:07am

Also. Can anyone tell me where is it faster for Julia to open a Function? Inside the code (beginning or end) or outside (in a separate file)?

kristoffer.carlsson · November 12, 2020, 11:12am

First, please read: PSA: how to quote code with backticks

Then, have you read through Performance Tips · The Julia Language?

It has a lot of good advice, for this code, Performance Tips · The Julia Language should be especially useful.

It’s a bit unclear to me what this means. Putting code inside functions is a good idea, yes. Where the code exists (if it is in e.g. a file or the REPL) doesn’t matter for the performance of it.

marianoarnaiz · November 12, 2020, 11:28am

Thanks for the reply! I did not know about these!

Regarding the function. I have these 2 at the very top of my main script. It bugs me a bit, but if they are at the end Julia won’t read them. I wonder it the will perform faster if I saved them outside my code as .jl files.

kristoffer.carlsson · November 12, 2020, 11:37am

They have to be defined before you try to run them. But exactly where they are defined doesn’t matter for performance, no.

marianoarnaiz · November 12, 2020, 11:46am

Thanks! I will try some of this right now

stevengj · November 12, 2020, 2:47pm

Forget everything you learned in Matlab or Python. Don’t try to “vectorize” everything. Don’t allocate new arrays in innermost loops.

Start smaller: write a function that traces a single ray through a single path (if that is what you are doing), and ask for help in optimizing that. Give sample inputs so that your code is actually runnable.

marianoarnaiz · November 12, 2020, 3:07pm

Dear Kristoffer
I tried some of the @ to make the code faste… now my Julia is Slow for everything. Any ideas what i dod wrong?

Oscar_Smith · November 12, 2020, 3:12pm

It’s really hard to tell without knowing what you did.

kristoffer.carlsson · November 12, 2020, 3:43pm

I think you should go back to basic. Write down for yourself exactly what data a Ray needs to contain. Now write a struct that contains that. Now think about all operations you need to do on a Ray (check if it intersects something, compute a newly reflected ray etc etc). Implement those functions. Now you have a bunch of small stand alone functions. Now think about your full problem. You will have a vector of such Rays. For each Ray, you do something,

So the code would, at a very high level, look something like:

struct Ray
    direction:: ...
end

struct Plane
    normal:: ...
end

function reflection(ray::Ray, plane::Plane)
    ....
end

function intersect(ray::Ray, plane::Plane)
    ....
end

struct Problem
    rays::Vector{Ray}
    planes::Vector{Plane}
end

function run_problem(p::Problem)
    for ray in p.rays
        for plane in p.planes
            if intersect(ray, plane)
                   # do something
             end
        end
    end
end

prob = Problem(
   [Ray(1.0, 0.0), .... ],
   [Plane(0.0, 1.0), ...]
)

run_problem(prob)

It makes it easier if you think about what are the logical “parts” that make out the full program and try separate those out.

DNF · November 12, 2020, 6:54pm

You are basically writing Matlab code, and thereby missing out on all the performance improvements you can get from Julia. In fact, your code would be quite inefficient in Matlab as well, because it creates a staggeringly massive amount of unnecessary arrays.

You should avoid all slices (ray0[:, 1]), and in particular you should remove absolutely every use of collect. Whenever you write collect(ray0[:,2]) you create an unnecessary array, not once, but twice. A good rule of thumb is “never, never, ever use collect”. Out of 10000 times you use collect, 9999999999 times it’s wrong.

I have tried to translate your code into Julia. I cannot test it, because I lack your input data, so it may have bugs and may not even run, but you can see roughly how to speed your code up. I have removed some input arguments that seem to be redundant.

function proptime(ray, F)
    snode1 = F(ray[1, 1], ray[1, 2], ray[1, 3])
    time = zero(snode1)
    for i in 2:size(ray, 1)
        snode2 = F(ray[i, 1], ray[i, 2], ray[i, 3])
        B = (snode1 + snode2) / 2
        snode1 = snode2
        C = sqrt((ray[i, 1] - ray[i-1, 1])^2 + 
                 (ray[i, 2] - ray[i-1, 2])^2 + 
                 (ray[i, 3] - ray[i-1, 3])^2)
        time += B * C
    end
    return time
end


function bendray!(ray, F, Δz, zmax)
    t0 = proptime(F, ray)
    Nray = size(ray, 1)
    N = 0
    while N < Nray - 2
        N = 0
        for p in 2:Nray-1
            r3 = ray[p, 3]
            ray[p, 3] = r3 + Δz  # just modify the existing array instead of making a completely new one
            # ray[p, 3] = min(r3 + Δz, zmax)
            t1 = proptime(F, ray)
            ray[p, 3] = r3 - Δz
            t2 = proptime(F, ray)
            if t1 < t0 && t1 < t2
                t0 = t1
                ray[p, 3] = r3 + Δz
            elseif t2 < t0
                t0 = t2
                ray[p, 3] = r3 - Δz
            else
                ray[p, 3] = r3
                N += 1
            end
        end
    end
    return t0, ray
end

This will only speed up the implementation of your algorithm, but the algorithm itself also looks very inefficient, so you should probably re-write the whole thing from scratch and use a different algorithm

DNF · November 12, 2020, 8:07pm

So, the thing about speeding up code is, you have to be lazy. Imagine that you are manually building the arrays, as if they were Lego buildings. You want to be fast by saving work.

If you want to see the effect of changing one piece in your Lego building, the efficient thing to do is to just replace that one brick.

Instead of doing that, your code is rebuilding the whole, or parts of, the building 18-24 times in each iteration. That’s pretty inefficient.

What’s more, after each brick change, what’s the lazy way of measuring the effect of that change? Your code finds the change by re-measuring the entire building. That’s not efficient. In stead, just compare the parts of the building that were actually changed.

If you change one node in your ray, that affects two pieces of the whole ray. If the total number of pieces is small, it doesn’t matter, but if you have a lot of pieces, then just compare the changes caused by those two ray pieces. (My code does not implement this improvement, it just re-measures the whole thing each time.)

marianoarnaiz · November 12, 2020, 8:48pm

Hi everyone! Thank you so much for the help.
Coming from MATLAB is quite difficult for me not to see everything as vectors hahaha.
Anyone can direct me to a book or website where I can see some none-matlab type codes?

THANKS AGAIN!

marianoarnaiz · November 12, 2020, 8:49pm

Thanks man! I will try your suggestions.

marianoarnaiz · November 12, 2020, 8:54pm

Let me see if I can twist my problem this way. I also added a few @fastmath @simd and new Julia runs slow on everything I do… any clue?

DNF · November 12, 2020, 9:07pm

My day job is also with Matlab. But you should do the same thing in Matlab, changing a single element is just fundamentally faster than creating dozens of copies over and over.

DNF · November 12, 2020, 9:09pm

Forget @fastmath, forget @simd. Fix the fundamentals.

marianoarnaiz · November 12, 2020, 9:35pm

I get it. I don’t know what I did… I made the whole code slower somehow trying to fix it hahaha. I will report back when I have a better version! Thanks for the ideas!

Mason · November 12, 2020, 10:05pm

If it made sense to use them on everything, they’d be the default. It’s generally a bad idea to use ‘performance annotations’ that one doesn’t understand.

They’re also almost always for squeezing out the last bug of extra speed, but you’re not at that stage yet. There are large algorithmic changes you should be making first.

marianoarnaiz · November 12, 2020, 11:32pm

Yeah! I get that!
But I don’t know how… but using them made my Julia slow… I reinstalled it and now all is OK. Working on a better algorithm as we Speak!