I’m struggling to understand where to even start debugging this one! I have a workstation that I usually only access over the network, so it does not normally have a monitor or keyboard/mouse attached. Some time between a week or two ago and today a problem appeared - Julia would hang while importing the package I work on. While attempting to find the issue, I connected a monitor, and the problem disappeared. Even with the monitor connected but just sitting at the login screen, the problem is gone when ssh’ing into the machine. I don’t think the problem is (directly) Julia or any of the packages, because using a slightly older Julia (1.10.7) and a Manifest.toml recovered from February 8th that I’m pretty sure was working at the time, I get the same problem. It happens on several different kernel versions (including ones I was using previously that seemed to work). The (Linux) system has updated a bunch of packages, including firmware, since the last time I know things were working. I’d like to do something less drastic than roll the whole system back to an old restore point… Does anyone have any guess what could cause this kind of behaviour? Thanks!
Update:
- I restored the workstation (using TimeShift Linux Mint - Community) back to a point when I am pretty sure everything was working correctly, and I still have the same problem?!?
- The hang seems to be happening in
MPI_Init_thread
, insideMPI.Init()
. I am baffled by how that could be affected by whether a display is connected or not. I must be missing something!
Sort of progress - this issue is nothing to do with my package in particular, which might make it easier to diagnose, doing just
julia> using MPI
julia> MPI.Init()
hangs.
I seem to have found a workaround. I have another problem that when I start the workstation without a monitor attached, then plug a monitor in later, there is no signal output to the monitor. While looking for solutions for that, I came across the following:
replace
GRUB_CMDLINE_LINUX_DEFAULT="quiet splash"
with
GRUB_CMDLINE_LINUX_DEFAULT="nomodeset text"
in /etc/default/grub
, then do sudo update-grub
. I think this disables some graphical something in grub, that presumably prevents some error which otherwise puts my system into a strange state where MPI fails, but really why this works and what is going on is totally beyond me. This change doesn’t fix my monitor problem, but does seem to avoid the MPI.Init()
hang!
I think this means that this was never a Julia issue, but rather a Linux+MPI one.
If anyone can add any explanation of what’s been going on here, I’d be very grateful!!