Drop in performance after saving data with jldsave

J_B_R · September 19, 2024, 10:33am

Hi
I run my Julia code on a cluster and save the output data frequently into a jld2 file by using jldsave. On the desktop computer, performance remains constant, but on the cluster, the performance drops significantly after each time data is stored (overwriting the previous file). I am new to Julia but I think it is rather a computer environment issue, however, I am not at all a specialist. I hoped that maybe someone could help me here.
Many thanks in advance, J

eldee · September 21, 2024, 10:51am

Hi,

Seeing as your issue depends on the computer environment, it would help if you could post more information about this environment (e.g. OS, filesystem) (and in particular the differences compared to where it does work). When you use System Monitor / Task Manager, do you see anything out of the ordinary (e.g. 100% disk usage)?

You should also post (the relevant part of) this code, ideally condensed down to a MWE. See also Please read: make it easier to help you (4. and 5.).

J_B_R · September 22, 2024, 9:49am

Hi eldee
Thanks for the reply.

On the cluster, there is a batch system (slurm) running on Ubuntu, CPU usage close to 100%, and memory increasing slightly after saving the file I talked about in the main post. The increase is about 200 MB, while the file remains at the same size of around 2 GB. Total memory use is ~9 GB.
On the desktop I use VS code on windows. One timestep constantly takes around 30 seconds, while on the cluster it takes around 17 seconds, increasing to 46 after saving files.
Julia 1.10.3 and 1.10.4 i cluster, computer, respectively.
I make a garbage collection after each time step. Does not change anything when doing so versus not doing so. But generally, I think there may be a memory leak somewhere, again, without knowing much about the topic.
For nuw, I just did a workaround, breaking the code when saving a file, and start a new run from the respective time step, which then runs on the initial performance again.
Sorry for not being very clear. I tried to do a MWE, but the code is so huge and without knowing where the problem might be, I don’t know where to start…
Many thanks

eldee · September 22, 2024, 2:13pm

If this does indeed work faster than just keeping the code running (i.e. takes less than 46 s per timestep), then it seems to me that it’s not just an issue with the computer environment, as it shows the OS is capable of more computations and writes. So I would look more into the concrete Julia code or JLD2.jl itself.

Assuming the title of the topic is correct, the logical place to start would be to isolate the part with jldsave. If it’s too difficult to reduce the full code into a MWE, i.e. reduce the complexity while retaining the problems, you could conversely also start with simple working code, and add more complexity until you encounter the same problems.

Some things you could try, if you haven’t already:

If you keep the full code, but comment out the jldsave line, does everything then work fine?
- What if you do save, but just some random new data?
What happens if you don’t overwrite the same file every time, but instead always create a new one?
Does switching from JLD2.jl to another package for saving help?
If you don’t use SLURM (and I guess your code then only runs on a single node), do you encounter the same issues?

J_B_R · September 22, 2024, 3:02pm

Hi Eldee
Many thanks for the reply and the great advices. I will try to save much less data and go step by step with the other tips mentiones!
Great, thanks
jbr

Topic		Replies	Views
JLD2 seems slow at write operations compared to serialize and HDF5 General Usage data	3	1170	November 20, 2017
Intermittent JLD2 Save Error on HPC General Usage	5	120	March 2, 2025
Memory Error Performance	9	1202	October 24, 2019
Storing huge amount of data efficiently Performance performance , jld2 , numerics , io , arrow	15	2679	February 24, 2023
Debugging segfaults/running out of memory, maybe a memory leak? General Usage	5	183	December 16, 2024

Drop in performance after saving data with jldsave

Related topics