OutOfMemory() when loading a database of images

Ivan · October 27, 2021, 11:30pm

I am trying to work with a database of images (2767 images in total), but I run into an OutOfMemory() error when loading the data. This is how I am doing it:

path = string(pwd(), "/Data")
list = readdir(path, join = true)
imgs = load.(list)

If I change the last line to imgs = load.(list[1:100]) it works as desired, but then I get the OutOfMemory() error when I do imgs_2 = load.(list[101:200]).

I have seen this topic discussing a similar issue, and tried the solution proposed by Sukera on Nov 19 (memory mapping), but when trying to implement it, I got an error saying something like the IOStream could not read RGB nor N0f8 values.

What is the preferred way of loading data like this? My images are divided into different subfolders according to the image labels, and I would like to reconstruct a file with a single data frame containing all pictures and their labels.

How can I -sequentially- load all my images to push them into the final data frame?
What file type would be the most appropriate? I was thinking of a CSV file with a table, where one column “Image” contains the pixel arrays as elements (eltype Matrix{RGB{N0f8}}}).

jling · October 27, 2021, 11:55pm

but how large are they in total? Try run this:

total_size = sum(filesize, list)

Ivan · October 28, 2021, 9:17am

julia> total_size = sum(filesize, list)
11328360809

where list is a 2767-element Vector containing the full path to all images.

JohnnyChen94 · October 28, 2021, 9:43am

The easiest solution is probably via MappedArrays.jl

using MappedArrays

# do add ; in REPL to suppress the eager collection.
imgs = mappedarray(load, list);

But this is lazy-loading, so if you do, say, imgs[1] multiple times, you load the same image multiple times from the disk.

JohnnyChen94 · October 28, 2021, 9:46am

Also, I’d recommend GitHub - shashi/FileTrees.jl: Parallel file processing made easy for complicated folder structure. It also has a built-in lazy-loading strategy but I’ve never tried it.

Ivan · October 28, 2021, 12:41pm

This made it! Thanks a lot.

jling · October 28, 2021, 12:43pm

so, you’re trying to load 10GB of images into RAM, probably you have an 8G or 16G computer? Although, I believed it would inflate further when you load them into matrix of RGB pixels, so yeah, at any rate, you shouldn’t try to load O(10GB) or images into RAM.

Ivan · October 28, 2021, 12:47pm

Yes, Im on a 8GB laptop. I did think about that, but I don’t come from a computer science background and didn’t really know how to investigate a better method

Oscar_Smith · October 28, 2021, 12:57pm

Can you solve your problem while only loading one image at a time? What is the actual problem you are trying to solve with the images?

jpsamaroo · October 28, 2021, 2:40pm

FileTrees.jl probably won’t work in lazy mode unless you limit how many files you load at once. I’m going to improve this behavior with some changes to Dagger.jl (the library supporting FileTrees’ lazy mode), but it’ll be a while before that’s available.

Generally, either Mmapping, or loading only a set of images at once, is the best strategy for now.

Topic		Replies	Views
How to load a portion of image dataset New to Julia question	5	1027	April 2, 2022
Loading lots of images Performance	5	1766	November 5, 2018
Operating on large TIFF stacks without loading whole thing into memory? General Usage question , images , memory	7	2096	January 11, 2021
Reading individual slices of a TIFF stack (aka multipage TIFF) Visualization images	8	3842	October 16, 2020
JuliaDB out-of-memory computations New to Julia	2	515	December 6, 2018

OutOfMemory() when loading a database of images

Related topics