Unexpected OutOfMemory error on HPC

AhmedAlreweny · April 7, 2020, 7:54pm

Hello,
I am using an HPC cluster to run a code that involves solving 2 huge sparse linear systems. I was able to test the code on my windows machine with 16GB memory. I used the macro @time to monitor the memory allocations at the critical places (Matrix filling up, Matrix multiplication, system solving … etc).

0.573989 seconds (143.03 k allocations: 710.180 MiB, 22.39% gc time)
0.568460 seconds (71.21 k allocations: 1.865 GiB, 0.75% gc time)
9.861626 seconds (4.70 M allocations: 1.018 GiB)
0.177318 seconds (275.76 k allocations: 120.454 MiB, 19.67% gc time)
0.220135 seconds (82.13 k allocations: 39.772 MiB)
0.005179 seconds (123 allocations: 8.219 KiB)

I was also able to optimize the number of allocations by using dropzeros!(..) and other features of sparse matrices. In general, the code was consuming up to 7 GB of my meomry on windows. However, when I moved to the cluster, I allocated 10 GB for the same code and I got OutOfMemory error. I increased it up to 20 GB and it also crashed! It is only working when I use 25 GB. I made a plot for the mem consumption in kb measured by a tool in our cluster and I got this graph.

Any suggestion?

Oscar_Smith · April 7, 2020, 8:00pm

Can you post the code that’s doing the bulk of the work?

anon94023334 · April 7, 2020, 8:22pm

This may not be your issue, but note that if you’re running with addprocs() on the same node, you will be copying all data to each process. This means that, if you have an 8-core node, and you run addprocs(7) on it, you will have 8 separate copies of your data as each process will get its own.

Without knowing more about how you set up your cluster jobs and your multiprocessing environment, it’s hard to say what the problem is.

AhmedAlreweny · April 7, 2020, 8:38pm

Thanks for your prompt answer. I even tried using 1 core only on a single node, and allocated 20 GB to this core only … didn’t work!

anon94023334 · April 7, 2020, 8:39pm

If you have code and more details of your setup, that would help.

AhmedAlreweny · April 7, 2020, 9:56pm

@time(NNN =α^(-2) * C * C_t)
@time(A__ = B * B_t + NNN)
@time(W = A__ \ b_)
@time(V = -B_t * W)
@time(η = C \ -(b_ + B * V))
@time(y = reshape(V, N, M))

where:
C : Sparse matrix of dims (659934,9999) , C_t : its transpose
B : Sparse matrix of dims (659934,660000) , B_t : its transpose
α : scalar

anon94023334 · April 7, 2020, 10:21pm

… and how have you set up your distributed environment?

traktofon · April 8, 2020, 8:05am

Do you happen to know which metric of “mem consumption” the cluster is reporting? Is it resident set size, virtual memory reservation, or …?

johnh · April 8, 2020, 8:54am

Good questions from @traktofon
Also ask your systems managers if the job is being run within a cgroup or container.
It looks to be, since you are allocating a certain amount of memory.

AhmedAlreweny · April 8, 2020, 9:04am

Yes, that’s indeed right. The way I allocate my resources is as follows:

#PBS -l walltime  .... 
#PBS -l pmem=  ....  //memory per core, it is also possible to use mem=... which allocates mem per node

AhmedAlreweny · April 8, 2020, 9:09am

I don’t really know exactly as I don’t have access to this data in real-time. All I have is the total vmem and mem usage reported in the stdout file. One example from a job that cached is the following:

Resource List: nodes=1:ppn=1,pmem=15gb,walltime=00:05:00,neednodes=1:ppn=1
Resources Used: cput=00:02:28,vmem=10906160kb,walltime=00:02:39,mem=7072720kb,energy_used=0

johnh · April 8, 2020, 9:41am

Really, really sorry to ask this.
Can you cut and paste the Out of Memory error? I think this is the PBS mechanism which is terminating the job - the job here is not being run within a cgroup and it is the PBS daemon which is looking at memory use and terminating the job.

You can get a detailed log output from the PBS job log on the first compute node, but you probably have to be a root user to do this.

But I must say this does not really help us with your Julia issue of more memory use on a Linux cluster as opposed to Windows.

johnh · April 8, 2020, 9:44am

Can you share your PBS job submission script?
Also on the compute nodes how many cores are there and how much memory?
Do you know if hyperthreading is enabled - as far as I know PBS is not aware of hyperthreading.
I could be wrong!

AhmedAlreweny · April 8, 2020, 9:55am

I forgot to clarify some distinction between two scenarios, the first one, when I (think) that I allocated sufficient memory based on what I tested on Windows → this shows the OutOfMemory error as:

[ Info:  started timer at: 2020-04-01T15:14:48.981
ERROR: LoadError: OutOfMemoryError()
Stacktrace:
 [1] Array at ./boot.jl:404 [inlined]
 [2] spmatmul(::SparseMatrixCSC{Float64,Int64}, ::SparseMatrixCSC{Float64,Int64}) at /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.3/SparseArrays/src/linalg.jl:212
 [3] *(::SparseMatrixCSC{Float64,Int64}, ::SparseMatrixCSC{Float64,Int64}) at /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.3/SparseArrays/src/linalg.jl:187
 [4] LSS() at /ddn1/vol1/site_scratch/../../../..//1500/LSS.jl:130
 [5] top-level scope at /ddn1/vol1/site_scratch//../../../../1500/Main.jl:80
 [6] include at ./boot.jl:328 [inlined]
 [7] include_relative(::Module, ::String) at ./loading.jl:1105
 [8] include(::Module, ::String) at ./Base.jl:31
 [9] exec_options(::Base.JLOptions) at ./client.jl:287
 [10] _start() at ./client.jl:460
in expression starting at /ddn1/vol1/site_scratch/../../../../1500/Main.jl:80

I scaled sown my problem (smaller size of the linear systems), ans I was able to run this version, but with mem allocated much much higher that expected (i.e. 25GB). When I try to make the pmem smaller (i.e pmem=10gb), the scheduler automatically kills my job without showing any error massage from julia itself (2nd scenario).

The common problem is that Julia is overusing the memory in a bizarre way that leads Julia to kill itself in the first case, and leads the scheduler to kill the job when memory over-usage is detected in the 2nd case.

AhmedAlreweny · April 8, 2020, 9:56am

Regarding the job script, here is an example:

#!/bin/bash -l

#PBS -l nodes=1:ppn=1 
#PBS -l pmem=15gb
#PBS -l walltime=00:05:00 


module purge
module load Julia/1.3.1 
module load monitor

cd $PBS_O_WORKDIR

monitor -d 1 julia Main.jl 2 200

johnh · April 8, 2020, 11:34am

I do not think that user limits are the problem here. You could put this in the start of the job script to check:
ulimit -a

Also put this in your job scrit at the start:
free
sysctl -a | grep mem

I am definitely leading everyone up a wrong path here. However the memory overcommit may be DISABLED on an HPC cluster - for good reasons.

AhmedAlreweny · April 8, 2020, 12:30pm

Alright! This is what I got (alongside with some permission denied error massages in the stderr, I don’t think I have the right to use sysctrl).

core file size          (blocks, -c) 62500
data seg size           (kbytes, -d) 26214400
scheduling priority             (-e) 0
file size               (blocks, -f) unlimited
pending signals                 (-i) 770478
max locked memory       (kbytes, -l) unlimited
max memory size         (kbytes, -m) 26214400
open files                      (-n) 16384
pipe size            (512 bytes, -p) 8
POSIX message queues     (bytes, -q) 819200
real-time priority              (-r) 0
stack size              (kbytes, -s) unlimited
cpu time               (seconds, -t) unlimited
max user processes              (-u) 770478
virtual memory          (kbytes, -v) unlimited
file locks                      (-x) unlimited
              total        used        free      shared  buff/cache   available
Mem:      197734444     7752876   169395028         308    20586540   189197472
Swap:       2097148      247784     1849364
net.core.optmem_max = 20480
net.core.rmem_default = 212992
net.core.rmem_max = 67108864
net.core.wmem_default = 212992
net.core.wmem_max = 67108864
net.ipv4.igmp_max_memberships = 20
net.ipv4.tcp_mem = 4631163	6174885	9262326
net.ipv4.tcp_rmem = 4096	87380	33554432
net.ipv4.tcp_wmem = 4096	65536	33554432
net.ipv4.udp_mem = 4633461	6177950	9266922
net.ipv4.udp_rmem_min = 4096
net.ipv4.udp_wmem_min = 4096
vm.lowmem_reserve_ratio = 256	256	32
vm.memory_failure_early_kill = 0
vm.memory_failure_recovery = 1
vm.nr_hugepages_mempolicy = 0
vm.overcommit_memory = 0

AhmedAlreweny · April 8, 2020, 2:43pm

I found this topic that might be related …

https://github.com/JuliaLang/julia/issues/30888

Topic		Replies	Views
OutOfMemoryError() instead of allocating too much resources in the job scheduler General Usage	5	1423	June 9, 2022
Garbage collection not triggering on SLURM cluster Julia at Scale question	6	1161	March 4, 2024
Memory issues using command line argument New to Julia question , memory , slurm , command-line-options	4	653	June 20, 2023
Unexpected OOM errrors in julia 1.9.0 and 1.9.1 with Distributed Julia at Scale memory-allocation	6	667	September 28, 2023
Overuse of memory leads to computer freeze Performance memory	10	1281	June 10, 2021

Unexpected OutOfMemory error on HPC

Related topics