Is there a line profiler

feanor12 · May 15, 2019, 6:49am

I have rewritten some python code in julia. Now I see that the Julia code is 1/3 faster than the naive python code, but still a factor of two slower than the optimized one.

Is there a line profiler for Julia which can check where I spend the most time in the Julia code?

I suspect the following function to be called most, but I don’t know.

function fox_goodwin_step!(w_p1,w_0,w_m1,r)
        aa = 2*I+10*w_0
        bb = (I-w_m1)*r
        cc = I-w_p1
        r .= (aa.-bb)\cc
        w_m1 .= w_0
        w_0 .= w_p1
        r
end

The project is located here: https://github.com/feanor12/HASlib.jl

c42f · May 15, 2019, 7:46am

Did you try Profile? That should be able to confirm your suspicion about where most of the time is going.

What optimization tricks does the python code play, presumably you can just do the same thing in julia? Are the matrices large? If so you’ll probably get similar performance.

By the way you can link directly to the source by clicking on a line in the github source viewer:

github.com

feanor12/HASlib.jl/blob/8fc100166de2dc5f77e29f1f2d7de11b3b919608/src/close_coupling.jl#L24


      
              print(" (open: ",result.n_open)
              print(", closed: ",result.n_closed,")")
              print(", kz2: ",result.kz2,")")
          end
          
          
function h2m(m)
              hb = 1.054571800e-34
              hb^2/(2*m)
          end
          
          
function fox_goodwin_step!(w_p1,w_0,w_m1,r)
                  aa = 2*I+10*w_0
                  bb = (I-w_m1)*r
                  cc = I-w_p1
                  r .= (aa.-bb)\cc
                  w_m1 .= w_0
                  w_0 .= w_p1
                  r
          end
          
          
function fox_goodwin(z,w)

One thing I can see here is that you’re not reusing the storage for the temporary arrays aa,bb,cc. That might be important. What size are these matrices?

cstjean · May 15, 2019, 9:49am

Also try ProfileView!

feanor12 · May 15, 2019, 10:40am

I tried profile but in the trace the maximum number i found was 5. Do I have to increase the amount of measurements in this case? The runtime of one evaluation is around 300ms.
In the optimized python code I work a lot with vectorization using numpy as well as cython to avoid overhead on the inner loop. The matrices can be quite small 20x20, but in some cases can also be around 200x200. So not really big, I guess.
I’ll try to allocate aa,bb,cc outside the loop. Thanks for the hint.

tim.holy · May 15, 2019, 11:25am

You must be on Windows, where the interval between samples is larger. (On Linux you would have gotten ~300 samples.) Yes, try running it multiple times.

c42f · May 15, 2019, 11:43am

You can check how many allocations fox_goodwin_step! does using @time fox_goodwin_step!(...) (just keep in mind the first run will be contaminated by JIT compilation overhead, both in time and allocations).

If you make it allocation free that will help, but by the looks will uglify the implementation. You’ll need:

some named working arrays (aa,bb,cc at least)
more in place broadcasting with .=
LinearAlgebra.mul! for in place matrix multiplication
Probably lu! plus ldiv! to replace the \
Maybe some manual loops for adding to the diagonals in place, I couldn’t see how to do this with stdlib LinearAlgebra

Topic		Replies	Views
Time each line of a code General Usage question	5	1341	March 27, 2020
How to profile / analyse / optimise large Julia apps General Usage	7	579	September 1, 2019
[ANN] OwnTime - Gives alternate view of profiling data. Seeking feedback Package Announcements profiling	7	1098	February 9, 2020
Performance & Profiling Tips for Beginner Code Performance performance , profiling	14	1776	June 26, 2023
New on Forem: “Profile a short-running function” Profiling profiling , profile	0	70	June 2, 2025

Is there a line profiler

Related topics