What’s the best way to analyze startup? I suppose I want to profile (with gdb?). Strace gets me some even higher level view.
I suppose what needs to happen, at a minimum, is open the sysimage, and mmap involved in that? It’s not clear to me how to see with strace timing of each individual mmap or other operation, and wat it’s involving in each case.
Does Julia have to open (or try) 200+ files? If I subtract the errors it’s still likely opening 200-142=58 files, when doing “nothing”. How many of those could easily be skipped, and give the most bang for the buck? I suspect, libopenblas64_.0.3.20.so. Can some of them be deferred (opening and initializing, not just opening and starting with one thread, as in PR just about to be merged for that).
execve (I seen it up to 500), clone3 are single most expensive, but likely futex (most expensive) etc, that are more expensive cumulative should first be looked at. Or mmap most expensive depending on how I run strace, and sometimes rt_sigprocmask.
$ strace -c --summary-wall-clock ./julia --startup-file=no -e "" 2>&1 |less
% time seconds usecs/call calls errors syscall
------ ----------- ----------- --------- --------- ------------------
25,68 0,007409 19 377 futex
16,32 0,004709 23 204 mmap
14,80 0,004270 21 200 142 openat
8,31 0,002397 17 134 73 newfstatat
5,22 0,001505 23 65 mprotect
4,46 0,001286 25 50 brk
4,41 0,001271 23 54 munmap
3,80 0,001095 19 56 rt_sigprocmask
3,67 0,001059 17 60 close
2,78 0,000802 50 16 clone3
2,76 0,000798 19 40 read
1,21 0,000348 18 19 15 readlink
1,11 0,000319 319 1 execve
0,78 0,000226 18 12 rt_sigaction
0,52 0,000151 16 9 pread64
0,49 0,000141 28 5 pwrite64
0,46 0,000132 26 5 1 statx
0,43 0,000125 13 9 3 ioctl
0,36 0,000105 21 5 fstat
0,34 0,000098 19 5 3 epoll_ctl
0,28 0,000082 13 6 fcntl
0,27 0,000079 19 4 getrandom
0,23 0,000067 16 4 prlimit64
0,18 0,000052 26 2 getdents64
0,12 0,000036 17 2 pipe2
0,12 0,000034 34 1 uname
0,11 0,000031 15 2 1 arch_prctl
0,07 0,000021 20 1 sched_getaffinity
0,07 0,000021 20 1 1 lseek
0,07 0,000020 20 1 1 access
0,06 0,000017 17 1 epoll_create1
0,06 0,000017 17 1 epoll_wait
0,06 0,000017 16 1 eventfd2
0,06 0,000016 16 1 write
0,05 0,000016 15 1 rseq
0,05 0,000016 15 1 set_tid_address
0,05 0,000015 15 1 set_robust_list
0,05 0,000015 15 1 faccessat2
0,05 0,000014 14 1 sigaltstack
0,05 0,000014 14 1 dup3
------ ----------- ----------- --------- --------- ------------------
100,00 0,028848 21 1360 240 total
$ strace -c ./julia --startup-file=no -e "" 2>&1 |less
The many errors are likely false alarms, or are they slowing down?
$ strace ./julia --startup-file=no -e "" 2>&1 |less
[..]
openat(AT_FDCWD, "/home/pharaldsson/.julia/juliaup/julia-1.8.1+0.x64/bin/../lib/glibc-hwcaps/x86-64-v2/libdl.so.2", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory)
newfstatat(AT_FDCWD, "/home/pharaldsson/.julia/juliaup/julia-1.8.1+0.x64/bin/../lib/glibc-hwcaps/x86-64-v2", 0x7ffcb2c69c20, 0) = -1 ENOENT (No such file or directory)
openat(AT_FDCWD, "/home/pharaldsson/.julia/juliaup/julia-1.8.1+0.x64/bin/../lib/tls/haswell/x86_64/libdl.so.2", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory)
[..]
sched_yield is 92% of the time if I run with (which is likely wrong to do, see total time is excessive):
$ strace -c -f ./julia --startup-file=no -e "" 2>&1 |less
92,10 0,304994 63 4818 sched_yield
1,70 0,005634 25 219 mmap
[..]
------ ----------- ----------- --------- --------- ------------------
100,00 0,459765 71 6472 239 total