Improving runtime

I’m a new user of JULIA. The following code is a line by translation from MATLAB with the aim of comparing runtime. However, it seems like the performance doesn’t improve. I’ll be happy to have feedback on why it’s the case.

At first glance, it seems your code performs a lot of unneeded allocations. For Julia to be fast, you want to avoid allocating new memory when you can reuse it instead.
See the performance tips for more details, especially the sections about

In many cases it will be as easy as replacing

for i in 1:n
    x = y - z


x =  # init
for i in 1:n
    x .= y .- z

whenever you deal with arrays


You can diagnose the performance of your your code with the @btime macro from BenchmarkTools.jl. Ideally, you want the number of allocations not to scale with the number of loop iterations. In practice, that’s a lot to ask, so you should only focus on the most critical parts.
How do I recognize these critical parts, you ask? By profiling your code, eg. using the @profview macro from the VSCode Julia extension.


Which function do you want to optimize?

As I understand it the main work is being done in the loop at the end of the notebook.

The loop is using non constant global variables. If you want Juila to be able to compile efficient code for the last part you should consider putting it in a function.

I would like to improve the overall performance of the code.

Well, this is nothing we can help you with. We can help you with improving the performance of a function, if you point out which function should be faster.

The advice I gave here should be a good starting point for you! Feel free to come back if there are things you don’t understand :slight_smile:


One important thing I noticed is that you allocate an enormous number of 1-length vectors.

Now, this might make sense in Matlab (where everything is an array), but in Julia, x=1 is very different from x=[1] in terms of memory allocation.

A small change from X_s_distance2 = (X_s[1] .^ 2 .+ X_s[2] .^ 2) to X_s_distance2 = (X_s[1] ^ 2 + X_s[2] ^ 2) alone is dropping the @btime results from 18 seconds to almost 16 seconds on my machine (yes, this X_s_distance2 is one of your one-element vectors).

After doing this for a few more one-element vectors, I managed to reach 12 seconds - and I could continue doing this - but I think you got the point.

So, I don’t think your code is a line-by-line translation to Julia - especially because you are forcing this everything is an array philosophy on Julia.

You might also be interested in reading Noteworthy Differences from other Languages · The Julia Language.

Equally important, pay attention to the advice related to performance that others have already pointed out. For example, in your velocityRS function, you allocate u = zeros(2, length(s)) and int_u = zeros(size(u)) each time when you call the function (and you end up calling it from a for loop). On my machine, preallocating u and int_u is followed by another almost 2 seconds execution time reduction. And there is still room to fix many things (e.g., an enormous number of allocations remain).

Have fun.


Thank you for your comment. Even without pre-allocation, the runtime seems to be improved. Could you please show an example of the pre-allocation of one variable from the velocityRS function? I’m a little confused about this.

Please take a look here - this was already posted by @gdalle before.

This is one of the specific performance tips that you can apply to your valocityRS function: you allocate your output u each time you call the function inside your loop. Imagine the alternative where you preallocate u before your outer loop and then pass u to your velocityRS where you can fill/mutate the array as needed.

I am not saying this particular step will significantly improve your performance (because I think the big chunk of allocations are happening all over the place because of those unneeded 1-length vectors).

Another important thing - if you know that you will only need a very small container and its size is known at the compile time, you could use tuples instead of vectors ((1,2) vs. [1,2]), especially in the scenario of your nested loops - that will make a big difference on memory allocation (and garbage collection) - consequently will improve the execution.