Loading data sometimes very slow on HPC system

Hi there,
I am processing a bunch of JLD2 files (O(10GB)) on my local cluster. Sometimes (not always), loading a file takes ages (>10min), but sometimes it only takes ~10 seconds. Usually after loading it slowly once, the second time loads fast. I don’t think the issue is JLD2 only. Has anyone experienced something similar? Is it related to the hardware infrastructure or am I doing something wrong?

df -T says the filesystem is gpfs. I didn’t find anything here or on google that describes such a behaviour.

Clearly, I am not an expert. Thanks for any help!

Could you please clarify the sentence “Usually after loading it slowly once, the second time loads fast”? Do you load one file multiple times?

Sorry, yes exactly: I just load the exact file again with the motivation of benchmarking.

Hard to say what the issue here is, but parallel file systems can have all kinds of issues :slight_smile:

that was not the response I was hoping for :sob:

The document said:
To exploit disk parallelism when reading a large file from a single-threaded application, whenever it can recognize a pattern, GPFS intelligently prefetches data into its buffer pool, ...
Maybe the reason is sometimes data are buffered while sometimes not?

This definitely sounds like the behaviour of buffers.

However if you are using a heirarchical storage management system it could eb old files are being pulled back from a slow tier - maybe tape.
Tme to bring cookies to the lair of your system admins.

1 Like

I really don’t think that tapes are involved here. These are not some old data files produced months/years ago (in fact the cluster didn’t exist a year ago). But maybe 50-100 MB/s hard drives are indeed close to the read speed.

It should be easy to test if the infrastructure is to blame, right? For example, you could simply cat each file and maybe count the bytes:

for file in *.jld2
do
    time cat $file | wc -c
done

If the time goes up sometimes, the problem is in the infrastructure.

Thanks for this check. It seems like the infrastructure is not the issue:

21115272060

real	0m28.891s
user	0m0.075s
sys	0m14.022s
18912694826

real	0m24.032s
user	0m0.048s
sys	0m12.997s
18828809688

real	0m23.701s
user	0m0.052s
sys	0m12.657s
20063077414

real	0m28.950s
user	0m0.069s
sys	0m12.961s
19977486641

real	0m27.586s
user	0m0.063s
sys	0m13.536s
20651053712

real	0m27.257s
user	0m0.072s
sys	0m13.589s

So can it be that the data structure is the caveat? Is having a bunch (<10) of dataframes stored in a namedtuple a bad idea?

I am not sure. Long tuples are generally discouraged, but <10 is not long. Do all the named tuples share the field names? I could see it be a problem if the field names change (calling a function with a tuple with differently named fields triggers compilation each time), but my understanding is only superficial.

It’s just one namedtuple per file, and in each file the namedtuple has the same field names. Here’s an example of the namedtuple that is saved in one .jld2 file:

@NamedTuple{esol::DataFrames.DataFrame, is1::Vector{Int64}, is2::Vector{Int64}, 
modes_observe::DataFrames.DataFrame, corrs_observe::Vector{Float64}, 
br_rms_observe::Vector{Vector{Float64}}, br_rms_ϕ_observe::Vector{Vector{Float64}}, 
uϕ_rms_observe::Vector{Vector{Float64}}, modes_geomag::DataFrames.DataFrame, 
corrs_geomag::Vector{Float64}, modes_u::DataFrames.DataFrame, corrs_u::Vector{Float64}}

The dataframes are all the same structure:

typeof.(eachcol(filtered_modes.esol))

4-element Vector{DataType}:
 Vector{ComplexF64} (alias for Array{Complex{Float64}, 1})
 Vector{Float64} (alias for Array{Float64, 1})
 Vector{Float64} (alias for Array{Float64, 1})
 Vector{Vector{ComplexF64}} (alias for Array{Array{Complex{Float64}, 1}, 1})

I do not see an issue with such a datastructure. Nothing is particularly obscure or type unstable.

Me neither! Do you think this Slow performance on Tuple{Type1, Type2} · Issue #2 · JuliaIO/JLD2.jl · GitHub could be related? If so, then it was addressed a month ago Add @nospecializeinfer around worst offenders by JonasIsensee · Pull Request #527 · JuliaIO/JLD2.jl · GitHub

That looks related, but I updated to the latest JLD2 version and the issue remains. When I load a file that has not been loaded before, it takes ages. Is the filesystem check with cat really comparable here? If I use two different julia sessions in one job (i.e. in one environment on one node), after loading the file once, the other session also loads the file fast the first time. If I start another slurm job on another node, the first load of that same file is slow. It seems like it really has to do with the distributed file system, but somehow cat does not capture this?

You are likely running cat on the login node or something? Maybe the issue only arises when the compute node access the memory for the first time?

Hmm no that does not seem to be the case. cat gives the same timing from anywhere, on files that have not been accessed before. I can cat a file in 20seconds, but if JLD2 hasn’t loaded it before, JLD2 takes >400 seconds to load it. The second time (even if restarting the julia session), JLD2 loads it in 20 seconds.

when you say “local” do you mean something literally on your laptop (hosted on a docker k8s instance or swarm) or do you mean the one you use in your network?

if the latter - have you asked your admins if there is any auto shutdown/restart or shared services / queues /activities taking place?

Regards,

“local” means a university HPC cluster that is not something like azure, aws etc. It’s SLURM managed and it’s about a few hundred nodes with lots of storage (kTB). I have not asked the admins about data-access issues as of now. I’m just at the stage of trying to narrow down what is happening.

I’ve had similar experience in the past, just sharing here. I didn’t have the chance to debug it in detail since it is a random issue, and I don’t have the computer science background to inspect file systems.

I think you should try to figure out if this is related to JLD2 or not. The easy way to do this is to save a copy of the data in another format and repeat your reading experiments.

2 Likes