Running out of memory saving files with HDF5

Hey, I am trying to run something quite large in Julia, and I need to store a large number of Arrays (for constructing sparse matrices). Below is a simplified version of what I need to do, where it creates a random array, and saves it to a file. The dimensions are approximately what I need to use.

(Running Julia 1.1.1 on Ubuntu 18.04)

for j in 1:1200
    file = h5open("example$(j).h5","w")
    write(file, "X", rand(1,700000))
    close(file)
    GC.gc()
end

I don’t see why it is eating up memory when running. My machine has 16GB of RAM, will run out of memory before the above will complete. I have tried lots of ways to get around it, I was originally using JLD, which didn’t have this problem when writing the files, but encountered the same issue when reading the files.

Thanks in advance,
Phil.

1 Like

Rather than allocating a new array in each iteration, can you pre-allocate?

I can change it to:

for j in 1:1200
    x = rand(1,700000)
    file = h5open("example$(j).h5","w")
    write(file, "X", x)
    close(file)
    GC.gc()
end

But I need to calculate 1200 different arrays.

In the actual problem there are 1200 sparse matrices that need to be used in another loop, and storing all of them takes up too much memory. So I want to create and save them all, so I can load one as its needed, so I only have 1 matrix loaded at a time, rather than having 1200 loaded.

for the following code I do not see constant memory allocation:

A=Vector{Float64}(undef, 700000)
for j in 1:1200
    file = h5open("example$(j).h5","w")
    rand!(A)
    write(file, "A", A)
    close(file)
end

Can you confirm this @PhillipBC? But nevertheless, I do not understand why in the initial post there is a constant increase in memory usage. Does anyone know?

julia_1.0.4__LT , Win7, 8GB, both codes run without any probelms.
Paul

I can confirm that

for j in 1:1200
    file = h5open("example$(j).h5","w")
    write(file, "X", rand(1,700000))
    close(file)
    GC.gc()
end

and

for j in 1:1200
    x = rand(1,700000)
    file = h5open("example$(j).h5","w")
    write(file, "X", x)
    close(file)
    GC.gc()
end

both have a memory increase of roughly 50 MB per second on my machine - UBUNTU 16.04 LT

julia> versioninfo()
Julia Version 1.1.1
Commit 55e36cc308 (2019-05-16 04:10 UTC)
Platform Info:
  OS: Linux (x86_64-pc-linux-gnu)
  CPU: Intel(R) Core(TM) i3-8100 CPU @ 3.60GHz
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-6.0.1 (ORCJIT, skylake)

This is probably caused by https://github.com/JuliaLang/julia/issues/30888 or its variant, since you say you’re on Ubuntu. Can you try this on a nightly build of Julia and see if it’s still a problem?

1 Like

@jpsamaroo I have now tried this on julia-1.2.0 and it is not happening anymore, thanks!

3 Likes