Base.addprocs() affects other software

question
multithreading
linux

#1

I tried this don’t do this at home (well, maybe you should try this at home instead of at some place where others would not appreciate this) command on my laptop with 4 processors:

addprocs(80)

This affected several things on my system.

  • Computer became slow (maybe expected);
  • Firefox crashed (not expected);
  • login prompts on vc’s started crashing (not expected).

After failed attempts to login on a virtual console (vc) I could switch back to X and do a killall julia in a running xterm, restart my web browser and everything was fine again. I guess what Julia is doing, launching processes you asked for, is fine! I guess that the unexpected things are due to issues with other software. Firefox crashing seems like a bug in Firefox: if resources get exhausted by some other program, this should still not happen. The login prompts on my vc’s should also be more stable, I think.

Do others agree? Should these issues be reported at Mozilla and other places?

What if I tried this on the login node of a supercomputer instead of on my workstation/laptop? (That one is rethorical — I could possibly cause similar problems, and many people would not be happy.)

Should I expect similar behaviour on the BSDs? I guess in GNU/Linux that by setting limits (e.g. with ulimit or SELinux et al.) this issue can be mitigated.

I am using an Ubuntu 16.04 system running with Linux 4.13.0, systemd-logind, Julia 0.6.2, Firefox 58.0.1 and other software.


#2

This kind of crashing happens on every OS with pretty much every software if you vastly overutilize your resources. I think the answer is, just don’t do it.

Login nodes specifically tell you to not do this. Never ever should a parallel command be ran on a login node. In fact, this is how login nodes go down… and that happens quite often…


#3

This isn’t a bug, it’s a feature. I’d rather not be protected from this kind of thing. We should be glad to have software that is low-level enough to take down a system. I’ve run into similar problems by using up all memory.

If you want to prevent your program from causing this, you can do something like @assert nmappers < 2 * Sys.CPU_CORES where nmappers might be running addprocs() later. There’s no useful way to keep users from typing stuff into the REPL though.


#4

Fair enough. Thank you both for the tips


#5

Good questions here.
Regarding HPC setups, on the HPC which I manage we have implemented cgroups for jobs. SO your job is constrained on the CPU cores and memory it can access. Also for GPU nodes you are given access to only one (or more) GPUs.
The login nodes do not at the moment have cgroups for users but we will be moving towards that.


#6

This raises the question of what resources a Julia instance requires. I see about 100 MiB of unshared memory per process, which seems bloated. (So OOM is not surprising for 80 procs on a typical laptop with other stuff running.)