Should we improve the text output of Profile.print()?

Currently, if you profile the function mentioned in ProfileView.jl’s README and Profile.print() you get the following output.

using Profile
function profile_test(n)
    for i = 1:n
        A = randn(100,100,20)
        m = maximum(A)
        Am = mapslices(sum, A; dims=2)
        B = A[:,:,5]
        Bsort = mapslices(sort, B; dims=1)
        b = rand(100)
        C = B.*b
    end
end
profile_test(1);
@profile profile_test(10)
Profile.print()
julia> Profile.print()
Overhead ╎ [+additional indent] Count File:Line; Function
=========================================================
   ╎105 @Base/client.jl:495; _start()
   ╎ 105 @Base/client.jl:309; exec_options(opts::Base.JLOptions)
   ╎  105 @Base/client.jl:379; run_main_repl(interactive::Bool, quiet::Bool, banner::Bool, history_file::Bool, ...
   ╎   105 @Base/essentials.jl:714; invokelatest
   ╎    105 @Base/essentials.jl:716; #invokelatest#2
   ╎     105 @Base/client.jl:394; (::Base.var"#919#921"{Bool, Bool, Bool})(REPL::Module)
   ╎    ╎ 105 .../build/usr/share/julia/stdlib/v1.7/REPL/src/REPL.jl:346; run_repl(repl::REPL.AbstractREPL, consumer::Any)
   ╎    ╎  105 ...build/usr/share/julia/stdlib/v1.7/REPL/src/REPL.jl:359; run_repl(repl::REPL.AbstractREPL, consumer::Any; backend_on_current_task::Bool)
   ╎    ╎   105 ...build/usr/share/julia/stdlib/v1.7/REPL/src/REPL.jl:226; start_repl_backend(backend::REPL.REPLBackend, consumer::Any)
   ╎    ╎    105 ...uild/usr/share/julia/stdlib/v1.7/REPL/src/REPL.jl:241; repl_backend_loop(backend::REPL.REPLBackend)
   ╎    ╎     105 ...uild/usr/share/julia/stdlib/v1.7/REPL/src/REPL.jl:150; eval_user_input(ast::Any, backend::REPL.REPLBackend)
  2╎    ╎    ╎ 105 @Base/boot.jl:373; eval
   ╎    ╎    ╎  8   REPL[3]:3; profile_test(n::Int64)
   ╎    ╎    ╎   8   ...usr/share/julia/stdlib/v1.7/Random/src/normal.jl:244; randn
   ╎    ╎    ╎    8   ...sr/share/julia/stdlib/v1.7/Random/src/normal.jl:238; randn
   ╎    ╎    ╎     4   ...sr/share/julia/stdlib/v1.7/Random/src/normal.jl:223; randn!(rng::Random.TaskLocalRNG, A::Array{Float64, 3})
   ╎    ╎    ╎    ╎ 4   ...sr/share/julia/stdlib/v1.7/Random/src/Random.jl:267; rand!
   ╎    ╎    ╎    ╎  4   ...r/share/julia/stdlib/v1.7/Random/src/Random.jl:267; rand!
   ╎    ╎    ╎    ╎   4   ...re/julia/stdlib/v1.7/Random/src/XoshiroSimd.jl:298; rand!
   ╎    ╎    ╎    ╎    4   ...e/julia/stdlib/v1.7/Random/src/XoshiroSimd.jl:140; xoshiro_bulk
   ╎    ╎    ╎    ╎     4   ...e/julia/stdlib/v1.7/Random/src/XoshiroSimd.jl:141; xoshiro_bulk
   ╎    ╎    ╎    ╎    ╎ 1   ...e/julia/stdlib/v1.7/Random/src/XoshiroSimd.jl:252; xoshiro_bulk_simd(rng::Random.TaskLocalRNG, dst::Ptr{UInt8}, len::Int...
  1╎    ╎    ╎    ╎    ╎  1   ...e/julia/stdlib/v1.7/Random/src/XoshiroSimd.jl:76; _xor
   ╎    ╎    ╎    ╎    ╎ 1   ...e/julia/stdlib/v1.7/Random/src/XoshiroSimd.jl:253; xoshiro_bulk_simd(rng::Random.TaskLocalRNG, dst::Ptr{UInt8}, len::Int...
  1╎    ╎    ╎    ╎    ╎  1   ...e/julia/stdlib/v1.7/Random/src/XoshiroSimd.jl:76; _xor
   ╎    ╎    ╎    ╎    ╎ 2   ...e/julia/stdlib/v1.7/Random/src/XoshiroSimd.jl:256; xoshiro_bulk_simd(rng::Random.TaskLocalRNG, dst::Ptr{UInt8}, len::Int...
   ╎    ╎    ╎    ╎    ╎  2   @Base/pointer.jl:118; unsafe_store!
  2╎    ╎    ╎    ╎    ╎   2   @Base/pointer.jl:118; unsafe_store!
   ╎    ╎    ╎     4   ...sr/share/julia/stdlib/v1.7/Random/src/normal.jl:226; randn!(rng::Random.TaskLocalRNG, A::Array{Float64, 3})
  1╎    ╎    ╎    ╎ 1   @Base/array.jl:877; setindex!
   ╎    ╎    ╎    ╎ 1   ...sr/share/julia/stdlib/v1.7/Random/src/normal.jl:61; _randn
   ╎    ╎    ╎    ╎  1   @Base/int.jl:491; >>
  1╎    ╎    ╎    ╎   1   @Base/int.jl:485; >>
   ╎    ╎    ╎    ╎ 1   ...sr/share/julia/stdlib/v1.7/Random/src/normal.jl:63; _randn
  1╎    ╎    ╎    ╎  1   @Base/int.jl:85; -
   ╎    ╎    ╎    ╎ 1   ...sr/share/julia/stdlib/v1.7/Random/src/normal.jl:65; _randn
  1╎    ╎    ╎    ╎  1   ...sr/share/julia/stdlib/v1.7/Random/src/normal.jl:78; randn_unlikely(rng::Random.TaskLocalRNG, idx::Int64, rabs::Int64, x::Flo...
   ╎    ╎    ╎  1   REPL[3]:4; profile_test(n::Int64)
   ╎    ╎    ╎   1   @Base/reducedim.jl:889; maximum
   ╎    ╎    ╎    1   @Base/reducedim.jl:889; #maximum#730
   ╎    ╎    ╎     1   @Base/reducedim.jl:893; _maximum
   ╎    ╎    ╎    ╎ 1   @Base/reducedim.jl:893; #_maximum#732
   ╎    ╎    ╎    ╎  1   @Base/reducedim.jl:894; _maximum
   ╎    ╎    ╎    ╎   1   @Base/reducedim.jl:894; #_maximum#733
   ╎    ╎    ╎    ╎    1   @Base/reducedim.jl:322; mapreduce
   ╎    ╎    ╎    ╎     1   @Base/reducedim.jl:322; #mapreduce#715
   ╎    ╎    ╎    ╎    ╎ 1   @Base/reducedim.jl:330; _mapreduce_dim
   ╎    ╎    ╎    ╎    ╎  1   @Base/reduce.jl:417; _mapreduce(f::typeof(identity), op::typeof(max), #unused#::IndexLine...
   ╎    ╎    ╎    ╎    ╎   1   @Base/reduce.jl:629; mapreduce_impl(f::typeof(identity), op::typeof(max), A::Array{Float...
   ╎    ╎    ╎    ╎    ╎    1   @Base/range.jl:747; iterate
  1╎    ╎    ╎    ╎    ╎     1   @Base/promotion.jl:424; ==
   ╎    ╎    ╎  91  REPL[3]:5; profile_test(n::Int64)
   ╎    ╎    ╎   91  @Base/abstractarray.jl:2661; mapslices##kw
   ╎    ╎    ╎    91  @Base/abstractarray.jl:2718; mapslices(f::typeof(sum), A::Array{Float64, 3}; dims::Int64)
   ╎    ╎    ╎     1   @Base/abstractarray.jl:2726; inner_mapslices!(safe_for_reuse::Bool, indices::Base.Iterators.Drop{Cart...
   ╎    ╎    ╎    ╎ 1   @Base/abstractarray.jl:2743; replace_tuples!
  1╎    ╎    ╎    ╎  1   @Base/essentials.jl:479; setindex!
  2╎    ╎    ╎     4   @Base/abstractarray.jl:2727; inner_mapslices!(safe_for_reuse::Bool, indices::Base.Iterators.Drop{Cart...
   ╎    ╎    ╎    ╎ 2   @Base/multidimensional.jl:875; _unsafe_getindex!(::Vector{Float64}, ::Array{Float64, 3}, ::Int64, ::Bas...
   ╎    ╎    ╎    ╎  2   @Base/multidimensional.jl:862; macro expansion
   ╎    ╎    ╎    ╎   1   @Base/cartesian.jl:64; macro expansion
  1╎    ╎    ╎    ╎    1   @Base/multidimensional.jl:865; macro expansion
   ╎    ╎    ╎    ╎   1   @Base/cartesian.jl:65; macro expansion
   ╎    ╎    ╎    ╎    1   @Base/indices.jl:370; iterate
   ╎    ╎    ╎    ╎     1   @Base/range.jl:747; iterate
  1╎    ╎    ╎    ╎    ╎ 1   @Base/promotion.jl:424; ==
 78╎    ╎    ╎     86  @Base/abstractarray.jl:2728; inner_mapslices!(safe_for_reuse::Bool, indices::Base.Iterators.Drop{Cart...
  2╎    ╎    ╎    ╎ 8   @Base/abstractarray.jl:2747; concatenate_setindex!(::Array{Float64, 3}, ::Float64, ::Int64, ::Vararg...
  1╎    ╎    ╎    ╎  6   @Base/broadcast.jl:1238; dotview(::Array{Float64, 3}, ::Int64, ::Base.OneTo{Int64}, ::Vararg{Any})
  5╎    ╎    ╎    ╎   5   @Base/views.jl:146; maybeview(::Array{Float64, 3}, ::Int64, ::Base.OneTo{Int64}, ::Vararg{...
   ╎    ╎    ╎  2   REPL[3]:7; profile_test(n::Int64)
   ╎    ╎    ╎   2   @Base/abstractarray.jl:2661; mapslices##kw
   ╎    ╎    ╎    2   @Base/abstractarray.jl:2718; mapslices(f::typeof(sort), A::Matrix{Float64}; dims::Int64)
  1╎    ╎    ╎     1   @Base/abstractarray.jl:2727; inner_mapslices!(safe_for_reuse::Bool, indices::Base.Iterators.Drop{Cart...
   ╎    ╎    ╎     1   @Base/abstractarray.jl:2728; inner_mapslices!(safe_for_reuse::Bool, indices::Base.Iterators.Drop{Cart...
   ╎    ╎    ╎    ╎ 1   @Base/sort.jl:770; sort
   ╎    ╎    ╎    ╎  1   @Base/sort.jl:770; #sort#9
   ╎    ╎    ╎    ╎   1   @Base/sort.jl:711; sort!
   ╎    ╎    ╎    ╎    1   @Base/sort.jl:723; #sort!#8
   ╎    ╎    ╎    ╎     1   @Base/sort.jl:1232; sort!
   ╎    ╎    ╎    ╎    ╎ 1   @Base/sort.jl:1224; fpsort!(v::Vector{Float64}, a::Base.Sort.QuickSortAlg, o::Base.Order...
   ╎    ╎    ╎    ╎    ╎  1   @Base/sort.jl:575; sort!(v::Vector{Float64}, lo::Int64, hi::Int64, a::Base.Sort.QuickSo...
   ╎    ╎    ╎    ╎    ╎   1   @Base/sort.jl:561; partition!(v::Vector{Float64}, lo::Int64, hi::Int64, o::Base.Sort.F...
   ╎    ╎    ╎    ╎    ╎    1   @Base/operators.jl:425; >=
  1╎    ╎    ╎    ╎    ╎     1   @Base/int.jl:471; <=
   ╎    ╎    ╎  1   REPL[3]:9; profile_test(n::Int64)
   ╎    ╎    ╎   1   @Base/broadcast.jl:904; materialize
   ╎    ╎    ╎    1   @Base/broadcast.jl:929; copy
   ╎    ╎    ╎     1   @Base/broadcast.jl:957; copyto!
   ╎    ╎    ╎    ╎ 1   @Base/broadcast.jl:1004; copyto!
  1╎    ╎    ╎    ╎  1   @Base/simdloop.jl:75; macro expansion
Total snapshots: 660

While this isn’t bad at all, arguably, it is not super easy to identify the most relevant lines in here (the ones with REPL[a]:b) and make sense out of the output (i.e. answer a question like “which line / function call is the most time consuming”). Despite the fact that one can influence the output of Profile.print via keyword arguments (docs), I couldn’t find a way to readily highlight the relevant bits a sort the output in a way that would have the most expensive line (the one with mapslices(sum, ...)) at the top (sortedby=:count sort of gets the job done but I’d prefer decreasing order which doesn’t seem to be available).

To give you in impression of what I’d like to have, I’ve dumped the output to a text file to get

➜  ~  cat test.txt | grep 'REPL\[' | tac
   ╎    ╎    ╎  91  REPL[3]:5; profile_test(n::Int64)
   ╎    ╎    ╎  8   REPL[3]:3; profile_test(n::Int64)
   ╎    ╎    ╎  2   REPL[3]:7; profile_test(n::Int64)
   ╎    ╎    ╎  1   REPL[3]:4; profile_test(n::Int64)
   ╎    ╎    ╎  1   REPL[3]:9; profile_test(n::Int64)

Here, I immediately see that line 5 of my profile_test is the performance bottleneck and needs my attention.

Wouldn’t it be great to have something like this in Julia itself? Perhaps even with the option to interactively unfold the subtrees of these lines (to not only see the top level function and line but also the mapslices(sort...) information? (I imagine one could use TerminalMenus.jl here?)

Note that I’m aware of the great graphical visualizations out there. Actually, I can’t wait for inline bars to highlight the profiling output in julia-vscode (similar to what Juno offers / offered). But sometimes text profiling is useful as well. For example, I often work remotely on a cluster where trying to open a GTK window (or similar) isn’t always ideal.

Before opening an issue (or PR attempt) over at GitHub I thought it might be a good idea to hear other people’s opinions and ideas first. So,

  • How do you feel about the current profiling output?
  • How could it be improved?
  • Are there already any attempts in this direction that I’m not aware of (I couldn’t find any)?

Best,
Carsten

PS: Note that the idea to filter “user code” from the tree output has been raised by @viralbshah already in 2015.

7 Likes

I work mostly from a command line and spawning a GUI is often not possible (headless environments :man_shrugging:). I really liked the console plotting improvement in BenchmarkTools, maybe something similar could be done with ProfileView here? Though I don’t know how easy the selection & hovering mechanics would translate… Would also need a lot of ANSI control codes maybe, if it’s to update in place :thinking:

I basically don’t use the Prodile stdlib at the moment because I can’t process its output in a meaningful way.

Dito. That’s precisely why I think we need to improve it.

2 Likes