Tooling for line-by-line memory allocation analysis

I am tracking down an allocation problem in a large codebase. I generated the allocation information as described in the manual, and managed to get all those MallocInfo objects, but I am unsure what to do next.

That is, I could just manually look up line numbers and track down the biggest culprits, but I am wondering if there is tooling to help me visualize the information I have obtained coupled to the source. I am on Linux, Julia 1.7.

11 Likes

Such a tool would be great.
FWIW Rstudio provides this for the time profiling (little bars in front of source code lines). I do not remember if it also the case for memory.

2 Likes

This exists in Juno as well:

https://docs.junolab.org/stable/man/juno_frontend/#Juno.@profiler

but as you say it’s for time spent on each line, not memory allocated. Afaik VSCode will get a similar flamegraph viewer soon (:tm:), but I haven’t seen anythong on the allocation side.

I went back to using

Perhaps I am barking up the wrong tree, do people analyze allocations with a different tool? I am not talking about macros for single calls like @allocated, but something systematic, analogous to profiling.

2 Likes

This PR seems related: https://github.com/JuliaLang/julia/pull/42768

Like you, this is what I use to profile allocations (provided that the analyzed code is not too large).

I have a macro lying somewhere in a file on my hard drive, that automatically annotates each line of code (instead of letting you call @timeit at each line).

A single macro call sets everything up to individually time each source code line:

# File foo.jl

@timeit_all to function foo(n)
    for i = 1:n
        begin
            A = randn(100,100,20)
            m = maximum(A)
        end
        if i < 10
            Am = mapslices(sum, A; dims=2)
            B = A[:,:,5]
            Bsort = mapslices(B; dims=1) do col
                sort(col)
            end
        elseif i < 15
            sleep(0.01)
        else
            sleep(0.02)
        end
        let j
            j = i
            while j<5
                b = rand(100)
                C = B.*b
                j += 1
            end
        end
    end

    sleep(0.5)
end

and the timer output looks like this:

julia> const to = TimerOutput()
julia> foo(20)
julia> println(to)
 ─────────────────────────────────────────────────────────────────────────────────────────────────────────
                                                                  Time                   Allocations      
                                                          ──────────────────────   ───────────────────────
                     Tot / % measured:                         744ms / 99.0%           38.7MiB / 100%     

 Section                                          ncalls     time   %tot     avg     alloc   %tot      avg
 ─────────────────────────────────────────────────────────────────────────────────────────────────────────
 foo.jl:3   function foo(n)                           1    737ms   100%   737ms   38.7MiB  100%   38.7MiB
   foo.jl:30  sleep(0.5)                              1    502ms  68.1%   502ms      144B  0.00%     144B
   foo.jl:4   for i = 1:n                             1    235ms  31.9%   235ms   38.7MiB  100%   38.7MiB
     foo.jl:18  sleep(0.02)                           6    127ms  17.3%  21.2ms   1.14KiB  0.00%     195B
     foo.jl:16  sleep(0.01)                           5   56.0ms  7.60%  11.2ms      720B  0.00%     144B
     foo.jl:5   begin                                20   36.5ms  4.95%  1.82ms   30.5MiB  78.9%  1.53MiB
       foo.jl:6   A = randn(100, 100, 20)            20   33.1ms  4.49%  1.66ms   30.5MiB  78.9%  1.53MiB
       foo.jl:7   m = maximum(A)                     20   3.31ms  0.45%   165ΞΌs     0.00B  0.00%    0.00B
     foo.jl:10  Am = mapslices(sum, A; dims =         9   12.3ms  1.68%  1.37ms   5.10MiB  13.2%   581KiB
     foo.jl:12  Bsort = mapslices(B; dims = 1)        9   2.61ms  0.35%   290ΞΌs   1.60MiB  4.13%   182KiB
     foo.jl:11  B = A[:, :, 5]                        9    123ΞΌs  0.02%  13.7ΞΌs    704KiB  1.78%  78.2KiB
     foo.jl:20  let j                                20    116ΞΌs  0.02%  5.82ΞΌs    794KiB  2.00%  39.7KiB
       foo.jl:22  while j < 5                        20   85.3ΞΌs  0.01%  4.26ΞΌs    793KiB  2.00%  39.6KiB
         foo.jl:24  C = B .* b                       10   65.1ΞΌs  0.01%  6.51ΞΌs    782KiB  1.97%  78.2KiB
         foo.jl:23  b = rand(100)                    10   5.34ΞΌs  0.00%   534ns   8.75KiB  0.02%     896B
         foo.jl:25  j += 1                           10   1.69ΞΌs  0.00%   169ns     0.00B  0.00%    0.00B
       foo.jl:21  j = i                              20   8.65ΞΌs  0.00%   433ns     0.00B  0.00%    0.00B
 ─────────────────────────────────────────────────────────────────────────────────────────────────────────

If you think it is of some general interest, please let me know and we can try pushing this as a new feature of TimerOutputs

18 Likes

Conceptually… it looks pretty easy to display this in front of the source code like this:

(from https://support.rstudio.com/hc/en-us/articles/218221837-Profiling-R-code-with-the-RStudio-IDE)

… conceptually :wink:

4 Likes

yes, this would be a great addition, please make a PR if you can.

4 Likes

It would be a useful feature

I recently had to look over a fairly large codebase to figure out where allocations were occurring. My particular case was trying to eliminate them everywhere as they were bugs :slight_smile:

Here’s a quick set of tools I developed to run and analyze the results. Would be fairly easy to sort them by the allocated value given the output. (Which is just the raw file and line information…)

#=
usage:

write the code you want to track allocations for into a single file, say "myfile.jl"

    using LinearAlgebra
    using Profile # to allow reseting allocation to get your target function 

    function myfun(n::Int)
      return  randn(n,n)
    end 
    Z = myfun(100)
    Profile.clear_malloc_data()
    Z = myfun(100)

then run 

    include("allocations.jl")
    lines = report_allocations("myfile.jl")
    println.(lines); 

Or the code is smart enough to be able to run from the same file.

    using LinearAlgebra
    using Profile # to allow reseting allocation to get your target function 

    function myfun(n::Int)
      return  randn(n,n)
    end 
    Z = myfun(100)
    Profile.clear_malloc_data()
    Z = myfun(100)

    ## 
    include("allocations.jl")
    lines = report_allocations(@__FILE__) # this will auto-exit 
    println.(lines);
=#  

function check_file_allocation(filename::AbstractString; alines=Vector{Pair{String, String}}())
  lines = readlines(filename)
  for line in lines
    line = lstrip(line)
    if startswith(line, "- ")
      # not run...
    elseif startswith(line, "0 ", )
      # not allocating
    else
      push!(alines, (filename => String(line)))
    end 
  end 
  return alines
end 
""" Add files matching pidstring to the allfiles list """
function _make_filelist!(maindir, allfiles, pidstring)
  if isdir(maindir)
    allfileinfo = walkdir(maindir)
    for (root, dirs, files) in allfileinfo
      for file in files
        if endswith(file, pidstring)
          push!(allfiles, joinpath(root, file))
        end
      end
    end
  end
end 
# TODO, add .julia/julia main dir list
function show_allocations(maindir::AbstractString; pid=nothing, depotpaths=false, cleanup=true)
  # make a list of all files
  
  allfiles = Vector{String}()
  pidstring = ".mem"
  if pid !== nothing
    pidstring = "$(pid).mem"
  end 
  _make_filelist!(maindir, allfiles, pidstring)
  if depotpaths
    for depotdir in DEPOT_PATH
      _make_filelist!(depotdir, allfiles, pidstring)
    end 
  end
  alines = Vector{Pair{String, String}}()
  for afile in allfiles
    check_file_allocation(afile; alines)
  end 
  if cleanup
    for afile in allfiles
      rm(afile)
    end 
  end 
  alines 
end 

function report_allocations(file_to_run; depotpaths=true, cleanup=true)
  # check if we are running from the file we are reporting on! 
  if haskey(ENV, "JULIA_REPORT_ALLOCS")
    exit(0)
  end 
  curenv = dirname(Base.active_project())
  jlpath = joinpath(Sys.BINDIR, "julia")
  cmd = Cmd(`$jlpath --project=$(curenv) --track-allocation=user $file_to_run`, env=("JULIA_REPORT_ALLOCS"=>"1",))
  # ripped from run(...) and adaopted
  ps = Base._spawn(cmd, Base.spawn_opts_inherit())
  pid = getpid(ps)
  if success(ps)
    alines = show_allocations("."; depotpaths, cleanup, pid)
    if length(alines) == 0
      println("No allocations found")
    end
  else
    # cleanup if error
    alines = show_allocations("."; depotpaths, cleanup, pid)
    Base.pipeline_error(ps)
  end 
  return alines
end

Lots of edge cases that aren’t handled here, obviously. I call this β€œallocated.jl” :slight_smile:

5 Likes

I also need this feature. Could you give me a copy of this macro?

1 Like

Not line-by-line, but FWIW I just created a proof-of-principle package GitHub - carstenbauer/TimerOutputsTracked that lets you track certain functions (via Cassette overdubbing) and time them with TimerOutputs.jl without having to manually modify the source code of these functions. So, not line-by-line, but on a function level. If there is enough interest I might also support automatically tracking all β€œuser functions” that are, e.g., defined in a certain module or similar.

10 Likes

Is it possible with the current setup to do something like

using MyPkg
TimerOutputsTracked.track([MyPkg.hotloop_not_exported])
@timetracked MyPkg.exported_api()

and see the allocations in the hotloop in deep in the belly of MyPkg ?
Also it may be of interest to see the number of allocations along with (or instead of) the amount of memory involved.

1 Like

This should work, yes. Note though that I have only created this package a few hours ago so it is far from battle-tested :slight_smile: Feel free to play around with it and file issues (feature requests / bugs).

UPDATE: Nevermind, it doesn’t work yet. I tried to support with 10 minutes work but I did hit Overdubbing things using llvmcall fail Β· Issue #138 Β· JuliaLabs/Cassette.jl Β· GitHub.

I think this isn’t supported by TimerOutputs.jl (yet) and would therefore be an upstream issue.

Thanks @carstenbauer for starting this.

fix args expansion - add more features by t-bltg Β· Pull Request #4 Β· carstenbauer/TimerOutputsTracked Β· GitHub would allow the following (non-intrusive):

julia> using UnicodePlots, TimerOutputsTracked
julia> TimerOutputsTracked.reset();
julia> TimerOutputsTracked.track(UnicodePlots; all = true);  # track all `UnicodePlots` methods (even unexported ones)
julia> @timetracked lineplot(1:2) functionloc=true;
julia> timings_tracked();
────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
                                                                                                                        Time                    Allocations      
                                                                                                               ───────────────────────   ────────────────────────
                                               Tot / % measured:                                                    1.47s /  18.6%           5.26MiB /  99.3%    

 Section                                                                                               ncalls     time    %tot     avg     alloc    %tot      avg
 ────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
 lineplot at ~/.julia/dev/UnicodePlots.jl/src/interface/lineplot.jl:78                                      1    273ms  100.0%   273ms   5.22MiB  100.0%  5.22MiB
   #lineplot#150 at ~/.julia/dev/UnicodePlots.jl/src/interface/lineplot.jl:78                               1    273ms  100.0%   273ms   5.22MiB   99.9%  5.22MiB
     lineplot at ~/.julia/dev/UnicodePlots.jl/src/interface/lineplot.jl:67                                  1    273ms  100.0%   273ms   5.21MiB   99.8%  5.21MiB
       lineplot at ~/.julia/dev/UnicodePlots.jl/src/interface/lineplot.jl:67                                1    273ms  100.0%   273ms   5.21MiB   99.7%  5.21MiB
         #lineplot#149 at ~/.julia/dev/UnicodePlots.jl/src/interface/lineplot.jl:67                         1    273ms   99.9%   273ms   5.20MiB   99.6%  5.20MiB
           lineplot! at ~/.julia/dev/UnicodePlots.jl/src/interface/lineplot.jl:80                           1    272ms   99.8%   272ms   5.17MiB   99.0%  5.17MiB
             #lineplot!#151 at ~/.julia/dev/UnicodePlots.jl/src/interface/lineplot.jl:80                    1    272ms   99.8%   272ms   5.16MiB   98.9%  5.16MiB
               lines! at ~/.julia/dev/UnicodePlots.jl/src/plot.jl:510                                       1    272ms   99.8%   272ms   5.15MiB   98.7%  5.15MiB
                 #lines!#88 at ~/.julia/dev/UnicodePlots.jl/src/plot.jl:510                                 1    272ms   99.7%   272ms   5.14MiB   98.5%  5.14MiB
                   lines! at ~/.julia/dev/UnicodePlots.jl/src/canvas.jl:183                                 1    272ms   99.7%   272ms   5.13MiB   98.3%  5.13MiB
                     lines! at ~/.julia/dev/UnicodePlots.jl/src/canvas.jl:115                               1    272ms   99.7%   272ms   5.13MiB   98.2%  5.13MiB
                       lines! at ~/.julia/dev/UnicodePlots.jl/src/canvas.jl:115                             1    272ms   99.7%   272ms   5.12MiB   98.1%  5.12MiB
                         pixel! at ~/.julia/dev/UnicodePlots.jl/src/canvas/braillecanvas.jl:76             81    264ms   96.8%  3.26ms   4.47MiB   85.6%  56.5KiB
                           pixel_to_char_point_off at ~/.julia/dev/UnicodePlots.jl/src/canvas.jl:307       81    236ms   86.5%  2.91ms   1.46MiB   27.9%  18.4KiB
                             x_pixel_per_char at ?:?                                                      162    429ΞΌs    0.2%  2.65ΞΌs     0.00B    0.0%    0.00B
                             y_pixel_per_char at ?:?                                                      162    371ΞΌs    0.1%  2.29ΞΌs     0.00B    0.0%    0.00B
                             pixel_width at ~/.julia/dev/UnicodePlots.jl/src/canvas.jl:12                  81   48.6ΞΌs    0.0%   600ns   6.33KiB    0.1%    80.0B
                             pixel_height at ~/.julia/dev/UnicodePlots.jl/src/canvas.jl:11                 81   37.5ΞΌs    0.0%   464ns   6.33KiB    0.1%    80.0B
[...]  # rest truncated
1 Like

This is my solution: Ipaper.jl/timeit_all.jl at master Β· jl-spatial/Ipaper.jl Β· GitHub

julia> @timeit_all function foo(n)
         for i = 1:n
           begin
             A = randn(100, 100, 20)
             m = maximum(A)
           end
           if i < 10
             Am = mapslices(sum, A; dims=2)
             B = A[:, :, 5]
             Bsort = mapslices(B; dims=1) do col
               sort(col)
             end
           elseif i < 15
             sleep(0.01)
           else
             sleep(0.02)
           end
           let j
             j = i
             while j < 5
               b = rand(100)
               C = B .* b
               j += 1
             end
           end
         end
         sleep(0.5)
       end
foo (generic function with 1 method)

julia> foo(20)

julia> show(to, sortby = :firstexec)
 ───────────────────────────────────────────────────────────────────────────────────────────────────
                                                           Time                    Allocations      
                                                  ───────────────────────   ────────────────────────
                 Tot / % measured:                     11.8s /  13.8%            272MiB /  43.4%

 Section                                  ncalls     time    %tot     avg     alloc    %tot      avg
 ───────────────────────────────────────────────────────────────────────────────────────────────────
 L5: for                                       1    1.12s   69.0%   1.12s    118MiB  100.0%   118MiB
   L6: block                                  20    149ms    9.1%  7.43ms   30.5MiB   25.8%  1.53MiB
     L7: A = randn(100, 100, 20)              20    140ms    8.6%  6.99ms   30.5MiB   25.8%  1.53MiB
     L8: m = maximum(A)                       20   2.22ms    0.1%   111ΞΌs      640B    0.0%    32.0B
   L10: if                                    20    974ms   59.8%  48.7ms   86.9MiB   73.5%  4.34MiB
     L11: Am = mapslices(sum, A; di ...        9    435ms   26.7%  48.3ms   61.8MiB   52.3%  6.87MiB
     L12: B = A[:, :, 5]                       9    431ΞΌs    0.0%  47.8ΞΌs    704KiB    0.6%  78.2KiB
     L13: Bsort = mapslices(B; dims ...        9    278ms   17.1%  30.9ms   24.3MiB   20.6%  2.70MiB
   L21: let                                   20    506ΞΌs    0.0%  25.3ΞΌs    797KiB    0.7%  39.8KiB
     L22: j = i                               20   18.8ΞΌs    0.0%   940ns      640B    0.0%    32.0B
     L23: while                               20    411ΞΌs    0.0%  20.5ΞΌs    794KiB    0.7%  39.7KiB
       L24: b = rand(100)                     10   10.9ΞΌs    0.0%  1.09ΞΌs   9.06KiB    0.0%     928B
       L25: C = B .* b                        10    335ΞΌs    0.0%  33.5ΞΌs    782KiB    0.6%  78.2KiB
       L26: j += 1                            10   6.70ΞΌs    0.0%   670ns      320B    0.0%    32.0B
 L30: sleep(0.5)                               1    505ms   31.0%   505ms      176B    0.0%     176B
 ───────────────────────────────────────────────────────────────────────────────────────────────────
1 Like