Is there a line profiler

#1

I have rewritten some python code in julia. Now I see that the Julia code is 1/3 faster than the naive python code, but still a factor of two slower than the optimized one.

Is there a line profiler for Julia which can check where I spend the most time in the Julia code?

I suspect the following function to be called most, but I don’t know.

function fox_goodwin_step!(w_p1,w_0,w_m1,r)
        aa = 2*I+10*w_0
        bb = (I-w_m1)*r
        cc = I-w_p1
        r .= (aa.-bb)\cc
        w_m1 .= w_0
        w_0 .= w_p1
        r
end

The project is located here: https://github.com/feanor12/HASlib.jl

#2

Did you try Profile? That should be able to confirm your suspicion about where most of the time is going.

What optimization tricks does the python code play, presumably you can just do the same thing in julia? Are the matrices large? If so you’ll probably get similar performance.

By the way you can link directly to the source by clicking on a line in the github source viewer:

One thing I can see here is that you’re not reusing the storage for the temporary arrays aa,bb,cc. That might be important. What size are these matrices?

2 Likes
#3

Also try ProfileView!

1 Like
#4

I tried profile but in the trace the maximum number i found was 5. Do I have to increase the amount of measurements in this case? The runtime of one evaluation is around 300ms.
In the optimized python code I work a lot with vectorization using numpy as well as cython to avoid overhead on the inner loop. The matrices can be quite small 20x20, but in some cases can also be around 200x200. So not really big, I guess.
I’ll try to allocate aa,bb,cc outside the loop. Thanks for the hint.

#5

You must be on Windows, where the interval between samples is larger. (On Linux you would have gotten ~300 samples.) Yes, try running it multiple times.

2 Likes
#6

You can check how many allocations fox_goodwin_step! does using @time fox_goodwin_step!(...) (just keep in mind the first run will be contaminated by JIT compilation overhead, both in time and allocations).

If you make it allocation free that will help, but by the looks will uglify the implementation. You’ll need:

  • some named working arrays (aa,bb,cc at least)
  • more in place broadcasting with .=
  • LinearAlgebra.mul! for in place matrix multiplication
  • Probably lu! plus ldiv! to replace the \
  • Maybe some manual loops for adding to the diagonals in place, I couldn’t see how to do this with stdlib LinearAlgebra
3 Likes