Tooling for line-by-line memory allocation analysis

Like you, this is what I use to profile allocations (provided that the analyzed code is not too large).

I have a macro lying somewhere in a file on my hard drive, that automatically annotates each line of code (instead of letting you call @timeit at each line).

A single macro call sets everything up to individually time each source code line:

# File foo.jl

@timeit_all to function foo(n)
    for i = 1:n
        begin
            A = randn(100,100,20)
            m = maximum(A)
        end
        if i < 10
            Am = mapslices(sum, A; dims=2)
            B = A[:,:,5]
            Bsort = mapslices(B; dims=1) do col
                sort(col)
            end
        elseif i < 15
            sleep(0.01)
        else
            sleep(0.02)
        end
        let j
            j = i
            while j<5
                b = rand(100)
                C = B.*b
                j += 1
            end
        end
    end

    sleep(0.5)
end

and the timer output looks like this:

julia> const to = TimerOutput()
julia> foo(20)
julia> println(to)
 ─────────────────────────────────────────────────────────────────────────────────────────────────────────
                                                                  Time                   Allocations      
                                                          ──────────────────────   ───────────────────────
                     Tot / % measured:                         744ms / 99.0%           38.7MiB / 100%     

 Section                                          ncalls     time   %tot     avg     alloc   %tot      avg
 ─────────────────────────────────────────────────────────────────────────────────────────────────────────
 foo.jl:3   function foo(n)                           1    737ms   100%   737ms   38.7MiB  100%   38.7MiB
   foo.jl:30  sleep(0.5)                              1    502ms  68.1%   502ms      144B  0.00%     144B
   foo.jl:4   for i = 1:n                             1    235ms  31.9%   235ms   38.7MiB  100%   38.7MiB
     foo.jl:18  sleep(0.02)                           6    127ms  17.3%  21.2ms   1.14KiB  0.00%     195B
     foo.jl:16  sleep(0.01)                           5   56.0ms  7.60%  11.2ms      720B  0.00%     144B
     foo.jl:5   begin                                20   36.5ms  4.95%  1.82ms   30.5MiB  78.9%  1.53MiB
       foo.jl:6   A = randn(100, 100, 20)            20   33.1ms  4.49%  1.66ms   30.5MiB  78.9%  1.53MiB
       foo.jl:7   m = maximum(A)                     20   3.31ms  0.45%   165μs     0.00B  0.00%    0.00B
     foo.jl:10  Am = mapslices(sum, A; dims =         9   12.3ms  1.68%  1.37ms   5.10MiB  13.2%   581KiB
     foo.jl:12  Bsort = mapslices(B; dims = 1)        9   2.61ms  0.35%   290μs   1.60MiB  4.13%   182KiB
     foo.jl:11  B = A[:, :, 5]                        9    123μs  0.02%  13.7μs    704KiB  1.78%  78.2KiB
     foo.jl:20  let j                                20    116μs  0.02%  5.82μs    794KiB  2.00%  39.7KiB
       foo.jl:22  while j < 5                        20   85.3μs  0.01%  4.26μs    793KiB  2.00%  39.6KiB
         foo.jl:24  C = B .* b                       10   65.1μs  0.01%  6.51μs    782KiB  1.97%  78.2KiB
         foo.jl:23  b = rand(100)                    10   5.34μs  0.00%   534ns   8.75KiB  0.02%     896B
         foo.jl:25  j += 1                           10   1.69μs  0.00%   169ns     0.00B  0.00%    0.00B
       foo.jl:21  j = i                              20   8.65μs  0.00%   433ns     0.00B  0.00%    0.00B
 ─────────────────────────────────────────────────────────────────────────────────────────────────────────

If you think it is of some general interest, please let me know and we can try pushing this as a new feature of TimerOutputs