Like you, this is what I use to profile allocations (provided that the analyzed code is not too large).
I have a macro lying somewhere in a file on my hard drive, that automatically annotates each line of code (instead of letting you call @timeit at each line).
A single macro call sets everything up to individually time each source code line:
# File foo.jl
@timeit_all to function foo(n)
for i = 1:n
begin
A = randn(100,100,20)
m = maximum(A)
end
if i < 10
Am = mapslices(sum, A; dims=2)
B = A[:,:,5]
Bsort = mapslices(B; dims=1) do col
sort(col)
end
elseif i < 15
sleep(0.01)
else
sleep(0.02)
end
let j
j = i
while j<5
b = rand(100)
C = B.*b
j += 1
end
end
end
sleep(0.5)
end
and the timer output looks like this:
julia> const to = TimerOutput()
julia> foo(20)
julia> println(to)
─────────────────────────────────────────────────────────────────────────────────────────────────────────
Time Allocations
────────────────────── ───────────────────────
Tot / % measured: 744ms / 99.0% 38.7MiB / 100%
Section ncalls time %tot avg alloc %tot avg
─────────────────────────────────────────────────────────────────────────────────────────────────────────
foo.jl:3 function foo(n) 1 737ms 100% 737ms 38.7MiB 100% 38.7MiB
foo.jl:30 sleep(0.5) 1 502ms 68.1% 502ms 144B 0.00% 144B
foo.jl:4 for i = 1:n 1 235ms 31.9% 235ms 38.7MiB 100% 38.7MiB
foo.jl:18 sleep(0.02) 6 127ms 17.3% 21.2ms 1.14KiB 0.00% 195B
foo.jl:16 sleep(0.01) 5 56.0ms 7.60% 11.2ms 720B 0.00% 144B
foo.jl:5 begin 20 36.5ms 4.95% 1.82ms 30.5MiB 78.9% 1.53MiB
foo.jl:6 A = randn(100, 100, 20) 20 33.1ms 4.49% 1.66ms 30.5MiB 78.9% 1.53MiB
foo.jl:7 m = maximum(A) 20 3.31ms 0.45% 165μs 0.00B 0.00% 0.00B
foo.jl:10 Am = mapslices(sum, A; dims = 9 12.3ms 1.68% 1.37ms 5.10MiB 13.2% 581KiB
foo.jl:12 Bsort = mapslices(B; dims = 1) 9 2.61ms 0.35% 290μs 1.60MiB 4.13% 182KiB
foo.jl:11 B = A[:, :, 5] 9 123μs 0.02% 13.7μs 704KiB 1.78% 78.2KiB
foo.jl:20 let j 20 116μs 0.02% 5.82μs 794KiB 2.00% 39.7KiB
foo.jl:22 while j < 5 20 85.3μs 0.01% 4.26μs 793KiB 2.00% 39.6KiB
foo.jl:24 C = B .* b 10 65.1μs 0.01% 6.51μs 782KiB 1.97% 78.2KiB
foo.jl:23 b = rand(100) 10 5.34μs 0.00% 534ns 8.75KiB 0.02% 896B
foo.jl:25 j += 1 10 1.69μs 0.00% 169ns 0.00B 0.00% 0.00B
foo.jl:21 j = i 20 8.65μs 0.00% 433ns 0.00B 0.00% 0.00B
─────────────────────────────────────────────────────────────────────────────────────────────────────────
If you think it is of some general interest, please let me know and we can try pushing this as a new feature of TimerOutputs