PackageCompiler.jl: display functions seem to use a lot of memory

MWE:

  1. create a package with PkgTemplates
  2. add following packages: PackageCompiler Revise TimerOutputs. Revise is not used, I just get used to add it to every package.
  3. add the following code to SelfTest.jl:
using TimerOutputs

# Write your package code here.
function julia_main()::Cint
    to = TimerOutput()
    @timeit to "print" println("wow")
    show(to)
    return 0
end
  1. add a test/precompile.jl file with the following content:
using SelfTest

SelfTest.julia_main()
  1. create two apps with PackageCompiler, with the following command:
create_app(".", "bin/"; force=true, include_lazy_artifacts=true)
create_app(".", "bin_pre/"; precompile_execution_file=["test/precompile.jl"], force=true, include_lazy_artifacts=true)

Then begin test:

  1. open a fresh Julia REPL, and call the function. First time will need precompilation, so I call it twice:
julia> using SelfTest
[ Info: Precompiling SelfTest [5c836c24-3dcc-4429-8ed9-23232f434ad3]

julia> SelfTest.julia_main()
wow
 ────────────────────────────────────────────────────────────────────
                            Time                    Allocations      
                   ───────────────────────   ────────────────────────
 Tot / % measured:      602ms /   0.0%           46.9MiB /   0.0%    

 Section   ncalls     time    %tot     avg     alloc    %tot      avg
 ────────────────────────────────────────────────────────────────────
 print          1   52.0μs  100.0%  52.0μs      144B  100.0%     144B
 ────────────────────────────────────────────────────────────────────0

julia> SelfTest.julia_main()
wow
 ────────────────────────────────────────────────────────────────────
                            Time                    Allocations      
                   ───────────────────────   ────────────────────────
 Tot / % measured:     78.2μs /  81.4%           1.69KiB /   8.3%    

 Section   ncalls     time    %tot     avg     alloc    %tot      avg
 ────────────────────────────────────────────────────────────────────
 print          1   63.6μs  100.0%  63.6μs      144B  100.0%     144B
 ────────────────────────────────────────────────────────────────────0
  1. use the compiled binaries:
(base) [user@machine selftest]$ ./bin/bin/SelfTest 
wow
 ────────────────────────────────────────────────────────────────────
                            Time                    Allocations      
                   ───────────────────────   ────────────────────────
 Tot / % measured:      602ms /  22.3%           81.0MiB /  17.9%    

 Section   ncalls     time    %tot     avg     alloc    %tot      avg
 ────────────────────────────────────────────────────────────────────
 print          1    134ms  100.0%   134ms   14.5MiB  100.0%  14.5MiB
 ────────────────────────────────────────────────────────────────────
(base) [user@machine selftest]$ ./bin_pre/bin/SelfTest 
wow
 ────────────────────────────────────────────────────────────────────
                            Time                    Allocations      
                   ───────────────────────   ────────────────────────
 Tot / % measured:     63.2ms / 100.0%           7.79MiB / 100.0%    

 Section   ncalls     time    %tot     avg     alloc    %tot      avg
 ────────────────────────────────────────────────────────────────────
 print          1   63.2ms  100.0%  63.2ms   7.79MiB  100.0%  7.79MiB

So, even the binary created with precompile_execution_file uses quite a lot memory. How can I improve it?

Something that may be of use:
I run binary in this way to get compilation information:

./bin_pre/bin/SelfTest --julia-args --trace-compile=stderr

I got this:

precompile(Tuple{typeof(Base.Sys.which), String})
precompile(Tuple{typeof(Base.something), Nothing, String})
precompile(Tuple{typeof(Base.setindex!), Base.Dict{Base.PkgId, Revise.PkgData}, Revise.PkgData, Base.PkgId})
precompile(Tuple{typeof(Base.uv_timercb), Ptr{Nothing}})
precompile(Tuple{Revise.var"#107#108"})
precompile(Tuple{typeof(Printf.format), Printf.Format{Base.CodeUnits{UInt8, String}, Tuple{Printf.Spec{Base.Val{Char(0x66000000)}}}}, Float64})
precompile(Tuple{typeof(Base.notify), Base.GenericCondition{Base.Threads.SpinLock}, Any, Bool, Bool})
precompile(Tuple{typeof(Base._uv_hook_close), Base.Timer})
precompile(Tuple{typeof(Base.uvfinalize), Base.Timer})

It seems to be related to Revise. Remove it and it is just fine:

[user@machine selftest]$ ./bin_pre_new/bin/SelfTest --julia-args --trace-compile=compile.log
wow
 ────────────────────────────────────────────────────────────────────
                            Time                    Allocations      
                   ───────────────────────   ────────────────────────
 Tot / % measured:      147μs /  89.6%           1.64KiB /   5.7%    

 Section   ncalls     time    %tot     avg     alloc    %tot      avg
 ────────────────────────────────────────────────────────────────────
 print          1    132μs  100.0%   132μs     96.0B  100.0%    96.0B

and compile.log contains this:

precompile(Tuple{typeof(Base.Sys.which), String})
precompile(Tuple{typeof(Base.something), Nothing, String})
precompile(Tuple{typeof(Printf.format), Printf.Format{Base.CodeUnits{UInt8, String}, Tuple{Printf.Spec{Base.Val{Char(0x66000000)}}}}, Float64})
precompile(Tuple{typeof(Printf.format), Printf.Format{Base.CodeUnits{UInt8, String}, Tuple{Printf.Spec{Base.Val{Char(0x66000000)}}}}, Int64})

Lessons learned: remember to exclude dev/debug utilities when compiling release binaries.