I have a question to figure out how to improve the performance of my code in Julia. I have a function that takes N seconds to be finished. I want to optimize the function to achieve a better performance. To do this, I go line by line and measure the time of each line using @btime. However, at the end, the sum of all the measured times is a lot smaller than N. How can I solve this?
Out = f(input)
elapsed time = N
S = sum of the measured times of each line << N
- I get tick tock() from TickTock package
- I know that it’s not the best way to measure the time but I don’t want to use @btime because the run-time of function f is too much and I don’t want to wait hours to measure the time.
- Then, my main question is, why the sum of the measured times of the lines (S), is not even close to the total measured time of the function (N)?
Can you be a bit more specific? Post timings, code, …
@btime runs code several times to get more accurate measurements.
This is not a very productive workflow for optimizing performance. Instead, you should profile the code using julia’s profiler, and probably a visualization package like
StatProfilerHTML. This will give you an idea of which parts of the code are worth changing for the most significant performance gains.
(If it’s a very fast running function, you should profile it many times (e.g. using @btime) to make sure you have a good estimate of how long each part of it takes.)
Let me be more specific. I have a while loop which can go at most 4million iterations. When I let it run, it goes almost forever. To find out why, I put @btime for each line to find the bottleneck. The sum of all those elapsed time for each line is a lot smaller than the real run-time.
while A < B
@btime line 1
@btime line 2
@btime line 3
I let the while loop to execute and observe the elapsed time for each line. The maximum of each is almost 500ns. which means that if the while loop goes even 4million times the total run time should be almost 6 seconds. But the while loop goes almost forever.
Each @btime runs the line many times (hundreds, thousands of times).
Yes, I know that. While loop without @btime takes a long time:
while A < B
I was expecting the run time of the whole while loop to be close to the sum of measured time of each line.
Not necessarily. You can have lines computed together (SIMD) and you can have computations fused with others, and you can have lines deleted, etc. Also… did you put it in a function?
But these that you mentioned can only decrease the whole running time not increasing that. Yes, it is part of a function.
I think without the code, there’s not much more we can do for you… maybe it’s a type stability issue that
@btime is less sensitive to because of the scope in which it makes the evaluation? Impossible to say.
If some lines are simple, the compiler can trick
@btime by reducing the computation to a constant, in some cases: https://github.com/JuliaCI/BenchmarkTools.jl/blob/master/doc/manual.md#understanding-compiler-optimizations
But you should really use a profiler for this.
The code you are profiling, is it pure computations? Or does it access the disk or network? Either of those situations can cause performance to vary widely.
Additionally are you using multiple threads? What is the computer hardware? What else is happening on the computer? If you are single threaded but you only have 1 core and you are browsing the web while benchmarking, that can mess with your results.
Are you sure you are not calling “tick()” in one of the sub functions?
tock() cannot be nested, but do appear to be thread safe.