Julia/Pkg manager and (Linux) OOM behaviour

Logan wrote a good blog post the other day about how Julia’s package manager is great, better than others, at least in the Python world.

What I know happened for me, facts in A., Julia seemingly “crashed” as a result of adding a package (and I would understand if people might blame it (unfairly) on Julia) for at least 10 min. (or felt so, so I drove to a store hoping it would recover, the computer had recovered an hour after the OOM event, not sure how quickly).

Have other had such memory/thrashing issue with Julia (because there’s a possibility Julia wasn’t the cause, maybe the web browser)? On Windows (macOS too?), the program that fills up memory gets killed guaranteed (unlike on Linux), so I would like to know the behavior there too, to compare. Even ignoring memory issues I still have some other issues and potential solutions discussed in B.

A.

(@v1.7) pkg> add GitHub - tshort/StaticCompiler.jl: Compiles Julia code to a standalone library (experimental)
[…]
Updating ~/.julia/environments/v1.7/Manifest.toml
[2a0fbf3d] ↑ CPUSummary v0.1.7 ⇒ v0.1.8
[35d6a980] ↑ ColorSchemes v3.16.0 ⇒ v3.17.1
[31c24e10] ↑ Distributions v0.25.46 ⇒ v0.25.48
[6a86dc24] ↑ FiniteDiff v2.10.0 ⇒ v2.10.1
[86223c79] ↑ Graphs v1.5.1 ⇒ v1.6.0
[033835bb] + JLD2 v0.4.20
[ef3ab10e] ↑ KLU v0.2.3 ⇒ v0.3.0
[ba0b0d4f] ↑ Krylov v0.7.11 ⇒ v0.7.12
[7ed4a6bd] ↑ LinearSolve v1.11.2 ⇒ v1.11.3
[961ee093] ↑ ModelingToolkit v8.3.2 ⇒ v8.4.0
[d8a4904e] ↑ MutableArithmetics v0.3.2 ⇒ v0.3.3
[f517fe37] ↑ Polyester v0.6.3 ⇒ v0.6.4
[47a9eef4] ↑ SparseDiffTools v1.20.0 ⇒ v1.20.1
[81625895] + StaticCompiler v0.4.0 https://github.com/tshort/StaticCompiler.jl#master
[c3572dad] ↑ Sundials v4.9.1 ⇒ v4.9.2
[0796e94c] ↓ Tokenize v0.5.22 ⇒ v0.5.21
[1986cc42] ↑ Unitful v1.10.1 ⇒ v1.11.0
[cc8bc4a8] ↑ Widgets v0.6.4 ⇒ v0.6.5
[0ee61d77] + Clang_jll v12.0.1+3
[8f36deef] + libLLVM_jll
Precompiling project…
Progress [=============> ] 25/80
◒ Unitful
◐ JLD2
✗ UnicodePlots
◐ VectorizationBase
◒ MathOptInterface
◐ JSExpr
◑ PlotlyBase
◑ PlotUtils
◓ SymbolicUtils
◓ SparseDiffTools
◐ JuliaFormatter
◒ MessyTimeSeries
◓ KernelDensity
◐ NLsolve
◓ Optim
^C Interrupted: Exiting precompilation…

I didn’t notice any issue (such as the fan, that later got loud) before I pressed CTRL-C to stop the precomilation, but I suspect the issue had started, as stopping the precompilation is innocent enough and supported (while a known issue, you can lose the blinking cursor).

Note, systemd (or lxd or BgSchPool as later in the day) are not the cause of OOM. Or at least it could be Julia allocating, getting Julia plus non-julia combined close to the memory limit (I have 32 GB RAM), and then systemd allocating more putting the kernel over the edge.

What I saw (an hour later) in /var/log syslog:

Feb 11 12:35:29 SYMLINUX011 kernel: [320135.398211] systemd invoked oom-killer: gfp_mask=0x100cca(GFP_HIGHUSER_MOVABLE), order=0, oom_score_adj=0
[…]
Feb 11 12:35:29 SYMLINUX011 kernel: [320135.398371] Tasks state (memory values in pages):
Feb 11 12:35:29 SYMLINUX011 kernel: [320135.398372] [ pid ] uid tgid total_vm rss pgtables_bytes swapents oom_score_adj name
Feb 11 12:35:29 SYMLINUX011 kernel: [320135.398383] [ 407] 0 407 32327 436 262144 201 0 systemd-journal
Feb 11 12:35:29 SYMLINUX011 kernel: [320135.398385] [ 438] 0 438 11781 316 122880 146 -1000 systemd-udevd
[…]
Feb 11 12:35:29 SYMLINUX011 kernel: [320135.398773] [ 14655] 1000 14655 796106 83153 2846720 15976 0 julia
Feb 11 12:35:29 SYMLINUX011 kernel: [320135.398775] [ 24810] 112 24810 119343 1251 425984 2513 0 whoopsie
[…]
Feb 11 12:35:29 SYMLINUX011 kernel: [320135.399050] [ 23233] 1000 23233 455924 50226 1159168 0 0 julia
Feb 11 12:35:29 SYMLINUX011 kernel: [320135.399054] [ 23244] 1000 23244 473542 57409 1228800 0 0 julia
Feb 11 12:35:29 SYMLINUX011 kernel: [320135.399057] [ 23412] 1000 23412 396798 46837 1056768 0 0 julia
Feb 11 12:35:29 SYMLINUX011 kernel: [320135.399061] [ 23494] 1000 23494 468689 58342 1306624 0 0 julia
Feb 11 12:35:29 SYMLINUX011 kernel: [320135.399065] [ 23548] 1000 23548 427638 62058 1204224 1 0 julia
Feb 11 12:35:29 SYMLINUX011 kernel: [320135.399069] [ 23557] 1000 23557 467178 51754 1171456 0 0 julia
Feb 11 12:35:29 SYMLINUX011 kernel: [320135.399074] [ 23558] 1000 23558 468072 54010 1196032 0 0 julia
Feb 11 12:35:29 SYMLINUX011 kernel: [320135.399078] [ 23575] 1000 23575 482362 71866 1339392 0 0 julia
Feb 11 12:35:29 SYMLINUX011 kernel: [320135.399082] [ 23584] 1000 23584 482079 53395 1404928 7 0 julia
Feb 11 12:35:29 SYMLINUX011 kernel: [320135.399087] [ 23593] 1000 23593 466188 44039 1159168 0 0 julia
Feb 11 12:35:29 SYMLINUX011 kernel: [320135.399091] [ 23594] 1000 23594 481576 52861 1306624 0 0 julia
Feb 11 12:35:29 SYMLINUX011 kernel: [320135.399094] [ 23596] 1000 23596 498307 79526 1421312 0 0 julia
Feb 11 12:35:29 SYMLINUX011 kernel: [320135.399097] [ 23620] 1000 23620 498292 90749 1495040 0 0 julia
Feb 11 12:35:29 SYMLINUX011 kernel: [320135.399100] [ 23621] 1000 23621 498104 89401 1490944 0 0 julia
[…]
Feb 11 12:35:29 SYMLINUX011 kernel: [320135.399124] oom-kill:constraint=CONSTRAINT_NONE,nodemask=(null),cpuset=lxc.payload.infldaily,mems_allowed=0,global_oom,task_mem
cg=/user.slice,task=brave,pid=17443,uid=1000
Feb 11 12:35:29 SYMLINUX011 kernel: [320135.399183] Out of memory: Killed process 17443 (brave) total-vm:25764548kB, anon-rss:3973132kB, file-rss:0kB, shmem-rss:28kB,
UID:1000 pgtables:10096kB oom_score_adj:300
Feb 11 12:35:29 SYMLINUX011 kernel: [320135.667156] oom_reaper: reaped process 17443 (brave), now anon-rss:0kB, file-rss:0kB, shmem-rss:28kB

NB. the web browser, brave, that got killed (one of it many processes) need not be the cause, since the Linux kernel picks some victim to kill, can be some other than what filled up the memory, and it uses some heurisitc involving oom_score_adj.

Brave, based on Chrome (and slack, based on Firefox web browser), both do have processes with the default 0 oom_score_adj (like julia), but also 200, and 300. Since 300 is the highest, at least for my processes those get killed, most often whatever web browser I use at the time.

B.
Julia spawns 14-15 process, that each precompiles some package, and each takes some memory and combined a lot(?) not sure spawning that many is adviced.

It’s not clear why adding this one package needed to upgrade so many, but ok, let’s put that aside, say they are all its dependencies. That doesn’t explain why a lot of others need to get precompiled. I’m guessing they share dependencies. In theory, those packages could keep their dependencies as is, at the version they where at. I’m guessing there might be issues with that solution, but at least it seems deferring precompiling those packages might be an option.

C.
Later in the day, two incidents, but I believe I was at my desk and no thrashing/noticeable crash:

Feb 11 17:21:44 SYMLINUX011 kernel: [337310.178346] lxd invoked oom-killer: gfp_mask=0x100cca(GFP_HIGHUSER_MOVABLE), order=0, oom_score_adj=0
[…]
Feb 11 17:21:44 SYMLINUX011 kernel: [337310.179551] Out of memory: Killed process 17475 (brave) total-vm:25764548kB, anon-rss:5038068kB, file-rss:0kB, shmem-rss:100kB,
UID:1000 pgtables:12236kB oom_score_adj:300
[…]

Feb 11 17:45:18 SYMLINUX011 kernel: [338724.115568] lxd invoked oom-killer: gfp_mask=0x100cca(GFP_HIGHUSER_MOVABLE), order=0, oom_score_adj=0
[…]
Feb 11 17:45:18 SYMLINUX011 kernel: [338724.116453] Out of memory: Killed process 17411 (brave) total-vm:25765016kB, anon-rss:5752956kB, file-rss:0kB, shmem-rss:0kB, UID:1000 pgtables:13984kB oom_score_adj:300

Feb 11 18:40:48 SYMLINUX011 kernel: [342054.319544] BgSchPool invoked oom-killer: gfp_mask=0x100cca(GFP_HIGHUSER_MOVABLE), order=0, oom_score_adj=0
[…]
Feb 11 18:40:48 SYMLINUX011 kernel: [342054.320459] Out of memory: Killed process 17372 (brave) total-vm:25743952kB, anon-rss:7040796kB, file-rss:0kB, shmem-rss:0kB, UID:1000 pgtables:17208kB oom_score_adj:300

so… how much memory do you have?.. can you give your system a 2GB SWAP?

It has 32 GB RAM (I was editing the post as you answered) plus 2 GB swap. Possibly helping would be if the precompile processes Julia spawns would have oom_score_adj set to 300 or 400?

I have seen A on HPC system basically boils down to too many parallel compiling processes
https://github.com/JuliaLang/Pkg.jl/issues/2404

Thanks for the link with: ENV[“JULIA_NUM_PRECOMPILE_TASKS”]

I vaguely remembered something like this, and see the default max has been lowered to 16. How was that number figured out? As much as I like/d the precompilation/spinning, when I first saw it, I think it might be excessive. Ideally N processes give N-times speedup, is that for sure? Memory consumption (and cache use) is ignored in the calculation, and I think something like 4 or 8 might be a better default, since many have less memory than me (32 GB). Power users could always go higher with the ENV.

[I did have 128 GB of RAM before I took 3 of 4 DIMMs out, so that’s one solution, putting them back in… For some reason it takes forever too boot, with them, and I was debugging the issue, it’s tolerable now, still non-fast. I’m guessing RAM check takes a while…]