I am trying to work with a database of images (2767 images in total), but I run into an OutOfMemory() error when loading the data. This is how I am doing it:
If I change the last line to imgs = load.(list[1:100]) it works as desired, but then I get the OutOfMemory() error when I do imgs_2 = load.(list[101:200]).
I have seen this topic discussing a similar issue, and tried the solution proposed by Sukera on Nov 19 (memory mapping), but when trying to implement it, I got an error saying something like the IOStream could not read RGB nor N0f8 values.
What is the preferred way of loading data like this? My images are divided into different subfolders according to the image labels, and I would like to reconstruct a file with a single data frame containing all pictures and their labels.
How can I -sequentially- load all my images to push them into the final data frame?
What file type would be the most appropriate? I was thinking of a CSV file with a table, where one column “Image” contains the pixel arrays as elements (eltypeMatrix{RGB{N0f8}}}).
so, you’re trying to load 10GB of images into RAM, probably you have an 8G or 16G computer? Although, I believed it would inflate further when you load them into matrix of RGB pixels, so yeah, at any rate, you shouldn’t try to load O(10GB) or images into RAM.
Yes, Im on a 8GB laptop. I did think about that, but I don’t come from a computer science background and didn’t really know how to investigate a better method
FileTrees.jl probably won’t work in lazy mode unless you limit how many files you load at once. I’m going to improve this behavior with some changes to Dagger.jl (the library supporting FileTrees’ lazy mode), but it’ll be a while before that’s available.
Generally, either Mmapping, or loading only a set of images at once, is the best strategy for now.