I have been stumped by this for a while now, so I thought I’d put it up here.

I developed a script that calculates some properties of a simulation “self-consistently”. I.e. the system is set up as a matrix, diagonalized, then some properties are calculated from the eigenvectors and plugged back in to the matrix. This process is then repeated until convergence (or some `n_max_iter`

). I then want to do this for a bunch of different parameters, so I pass them to different threads using `Threads.@threads`

.

The script works fine (fast & results look right) on my desktop, and for the smaller system sizes I used when developing the code. When I put the same, small system size (800x800 matrices), on our slurm cluster, (96 threads per node, I usually request 72) **it works fine**.

Increasing the system size to 1800x1800 matrices the script **still works** fine on my desktop (16 threads, 32G memory), however when I run it on the cluster and monitor with `htop`

, initially all threads are running at 100% until suddenly after a minute or so, all threads go into `S`

or sleep mode. They do not recover.

In fact, this already occurs at 20 threads, but **doesn’t** at 16 threads on the same machine. I’m stumped.

Things I tried include:

- Making sure it’s not a (too-little) memory issue. This is “easy” since the cluster has up to 256G memory available.
- Forcing garbage collection. I know my code allocates a lot (a consequence of running
`LinearAlgebra.eigen()`

hundreds of times

I made myself a sort of minimal example, where I diagonalize a bunch of random matrices in the same sort of loop structure as in my real code, and that doesn’t cause the sleeping to occur. So I conclude that the problem is somehow related to the way I update the matrix. For completeness, this is the exact code:

```
function update_H!(H, erray, N, E, uv, U, β, μ0, t, ϵ)
erray[1] = 0.0
erray[2] = 0.0
for i in 1:N
δi = 0+0im
mi = μ0
for (k, e) in enumerate(E)
δi += U[i] * uv[2i-1, k] * conj(uv[2i, k]) * tanh(β * e / 2)
mi -= U[i] * abs2(uv[2i-1, k]) * fermi_dirac(e, β)
end
if abs(H[2i-1, 2i] - δi) > erray[1]
erray[1] = abs(H[2i-1, 2i] - δi)
end
if abs(H[2i-1, 2i-1] - mi) > erray[2]
erray[2] = abs(H[2i-1, 2i-1] - (4t + ϵ[i] - mi))
end
H[2i-1, 2i] = δi
H[2i, 2i-1] = conj(δi)
H[2i-1, 2i-1] = 4t + ϵ[i] - mi
H[2i, 2i] = -4t - ϵ[i] + mi
end
return nothing
end
function get_self_consistent(n, m, t, μ, μ0, ϵ, Δ, U, pbc_x, pbc_y, β; epsa=1e-4, N_iter=40)
i = 0
erra = 1
erray = [0.0, 0.0]
H = H_BdG(n, m, t, μ, ϵ, Δ, pbc_x=pbc_x, pbc_y=pbc_y)
E = Array{Float64}(undef, n*m)
uv = similar(H)
while (i < N_iter) && (erra > epsa)
E, uv = eigen(Hermitian(H))
update_H!(H, erray, n*m, E, uv, U, β, μ0, t, ϵ)
erra = max(erray...)
i += 1
print("i = ", i, " | ϵ(abs) = ", round.(erray,digits=5), "\n")
flush(stdout)
end
Δ = calc_Δ(n*m, E, uv, U, β)
μ = calc_μ(n*m, E, uv, U, β, µ0)
# print("i = ", i, " | ϵ(abs) = ", round(erra,digits=5), "\n")
return Δ, μ
end
```

So `H`

is the matrix being diagonalized, `update_H!`

updates `H`

in place, and keeps track of the difference with the previous iteration (`erray`

). There are also a bunch of parameters which I won’t explain, but left in for completeness. The `print `

statement is only there for debugging purposes, and informed me that most workers don’t make it past the first few iterations of the self-consistent loop. The bug is also present without printing anything.

I’m sorry if this is a bit much, but I would appreciate any insight into what’s going on