Analyzing startup cost of julia itself (to possibly make better before the feature freeze)

What’s the best way to analyze startup? I suppose I want to profile (with gdb?). Strace gets me some even higher level view.

I suppose what needs to happen, at a minimum, is open the sysimage, and mmap involved in that? It’s not clear to me how to see with strace timing of each individual mmap or other operation, and wat it’s involving in each case.

Does Julia have to open (or try) 200+ files? If I subtract the errors it’s still likely opening 200-142=58 files, when doing “nothing”. How many of those could easily be skipped, and give the most bang for the buck? I suspect, libopenblas64_.0.3.20.so. Can some of them be deferred (opening and initializing, not just opening and starting with one thread, as in PR just about to be merged for that).

execve (I seen it up to 500), clone3 are single most expensive, but likely futex (most expensive) etc, that are more expensive cumulative should first be looked at. Or mmap most expensive depending on how I run strace, and sometimes rt_sigprocmask.

$ strace -c --summary-wall-clock ./julia --startup-file=no -e ""  2>&1 |less

% time     seconds  usecs/call     calls    errors syscall
------ ----------- ----------- --------- --------- ------------------
 25,68    0,007409          19       377           futex
 16,32    0,004709          23       204           mmap
 14,80    0,004270          21       200       142 openat
  8,31    0,002397          17       134        73 newfstatat
  5,22    0,001505          23        65           mprotect
  4,46    0,001286          25        50           brk
  4,41    0,001271          23        54           munmap
  3,80    0,001095          19        56           rt_sigprocmask
  3,67    0,001059          17        60           close
  2,78    0,000802          50        16           clone3
  2,76    0,000798          19        40           read
  1,21    0,000348          18        19        15 readlink
  1,11    0,000319         319         1           execve
  0,78    0,000226          18        12           rt_sigaction
  0,52    0,000151          16         9           pread64
  0,49    0,000141          28         5           pwrite64
  0,46    0,000132          26         5         1 statx
  0,43    0,000125          13         9         3 ioctl
  0,36    0,000105          21         5           fstat
  0,34    0,000098          19         5         3 epoll_ctl
  0,28    0,000082          13         6           fcntl
  0,27    0,000079          19         4           getrandom
  0,23    0,000067          16         4           prlimit64
  0,18    0,000052          26         2           getdents64
  0,12    0,000036          17         2           pipe2
  0,12    0,000034          34         1           uname
  0,11    0,000031          15         2         1 arch_prctl
  0,07    0,000021          20         1           sched_getaffinity
  0,07    0,000021          20         1         1 lseek
  0,07    0,000020          20         1         1 access
  0,06    0,000017          17         1           epoll_create1
  0,06    0,000017          17         1           epoll_wait
  0,06    0,000017          16         1           eventfd2
  0,06    0,000016          16         1           write
  0,05    0,000016          15         1           rseq
  0,05    0,000016          15         1           set_tid_address
  0,05    0,000015          15         1           set_robust_list
  0,05    0,000015          15         1           faccessat2
  0,05    0,000014          14         1           sigaltstack
  0,05    0,000014          14         1           dup3
------ ----------- ----------- --------- --------- ------------------
100,00    0,028848          21      1360       240 total

$ strace -c ./julia --startup-file=no -e ""  2>&1 |less

The many errors are likely false alarms, or are they slowing down?

$ strace ./julia --startup-file=no -e ""  2>&1 |less
[..]
openat(AT_FDCWD, "/home/pharaldsson/.julia/juliaup/julia-1.8.1+0.x64/bin/../lib/glibc-hwcaps/x86-64-v2/libdl.so.2", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory)
newfstatat(AT_FDCWD, "/home/pharaldsson/.julia/juliaup/julia-1.8.1+0.x64/bin/../lib/glibc-hwcaps/x86-64-v2", 0x7ffcb2c69c20, 0) = -1 ENOENT (No such file or directory)
openat(AT_FDCWD, "/home/pharaldsson/.julia/juliaup/julia-1.8.1+0.x64/bin/../lib/tls/haswell/x86_64/libdl.so.2", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory)
[..]

sched_yield is 92% of the time if I run with (which is likely wrong to do, see total time is excessive):

$ strace -c -f ./julia --startup-file=no -e ""  2>&1 |less
 92,10    0,304994          63      4818           sched_yield
  1,70    0,005634          25       219           mmap
[..]
------ ----------- ----------- --------- --------- ------------------
100,00    0,459765          71      6472       239 total