Should I use HDF5 to speed up reading image data repeatedly?

Hi all,

I’m trying to speed-up the following piece of code for an art project.
I have a collection of png images. All of them together is 25GB+

And in a loop, I need to keep copying an arbitrary pixel in one of the png images (simplified code below), right now it takes running overnight (~9 hours) to complete even with some preloading of images as much as I can.

How can I do this much faster possibly by preprocessing the images into a faster to access format, or possibly accessing the desired pixel without loading the entire image?

EDIT: Right now I’m also thinking about preprocessing it all into HDF5 and maybe that speeds it up

Here’s the basic code:

for frame = 1:Ntotalframes
    newimg = copy(preloaded_img_of_proper_dims)
    for j = 1:ysize
        for i = 1:xsize
            # Access and copy a single pixel in one of the png images in the 25GB collection
            file_get_to_from, pixel_coords_in_file = getDesiredFilenameAndPixelIndex(i, j, frame)
            newimg[i, j] = getpixelvalue(file_get_to_from, pixel_coords_in_file)
            
            # Basically I'm doing the code below (but with some preloading). How should I speed it up?
            # I can't preload preload entire 25 GB+ images into RAM
            # getpixelvalue = load(file_get_to_from)[pixel_coords_in_file[1], pixel_coords_in_file[2]]
        end
    end
    newimg_file_name = @sprintf("%04d.png", frame);
    save(newimg_file_name, newimg)
end

Thank you,
Kim

Are you reading one single picture ysize * xsize times?

Can you share more information about the order in which pixels/files are retrieved? Memory-mapping or chunking (with HDF5) would be faster than loading the whole PNG, of course, but if the frame order isn’t completely random, you can keep a large number of images in memory (maybe with LRUCache.jl) to cut down on storage-thrashing.

2 Likes

You can benchmark your code on a small number of pictures with Juno.@profiler in Atom or @profview in vscode. There can be bottlenecks in different places, like reading from disk or decoding picture format, but I think the main problem is that nested loop - you should keep it as small as possible, with all data already read from disk outside the loop. Or maybe rearrange cycle order - first find all pixels to change a single file, and then write file in one operation.

2 Likes

Are you reading one single picture ysize * xsize times?

I’m reading different images from all over the 25GB collection and collecting them into one image.

@profview in vscode.

Thank you! I’ll try this benchmarking macro.

I think the main problem is that nested loop - you should keep it as small as possible, with all data already read from disk outside the loop.

That is what I was thinking too. I tried preloading all the potential images at the start of the frame iteration before the for j, i loop, but it still took really long.

Can you share more information about the order in which pixels/files are retrieved?

It’s for an art project, so there’s no consistent way I’m grabbing the images from the 25GB+ collection.
Technically inside that inner-most loop, it could be any one of the images from the 25GB+ collection.
(Then I change the parameters, and see what ends up with a result that I like)

Memory-mapping or chunking (with HDF5) would be faster than loading the whole PNG,

Yeah I’m trying out this method now and seeing how well it performs since accessing a pixel from the 25GB+ is pretty arbitrary and seems like memory-mapping with HDF5 might be the best way to go right now to be able to access anywhere in the 25GB+ image collection fast (hopefully).

but if the frame order isn’t completely random, you can keep a large number of images in memory (maybe with LRUCache.jl) to cut down on storage-thrashing.

Thank you! I didn’t know this was a possible technique, I’ll keep this in mind for other projects

With GDAL you can read only a little chunk of an image (PNG or other formats)

2 Likes

One thing that looks expensive is that you allocate one array for each iteration when you do

newimg = copy(preloaded_img_of_proper_dims)

Have you tried preallocating just one newimg outside of your function and reusing the memory? That could save you some garbage-collection time if the number of frames is large.

Other than the regular things (putting things inside funcions, avoiding globals, etc) I wonder if you can paralellize some portion of the code with coroutines, or using GDAL as @joa-quim suggests.

1 Like

With HDF5 you can save arrays in chunks. So only one chunk would be read when you accessing one pixel. And that chunk would be cached as long as file is open. So you could try loading the whole collection into one HDF5 file in different datasets with chunking.

1 Like

I would set chunks length from very small to higher values and check timings for each. Like: 8x8, 16x16, 32x32, and so on. (But don’t use compression for chunks.)

1 Like

Stupid question from me… Clearly getting an efficient file format and choosing your data layout is the way forward - and is good experience for other projects.

However you could be lazy - why not hire a large memory cloud instance?

1 Like

I think you may be able to avoid loading an imagine multiple times per run. In the inner x,y loop, just compute which pixels you need from each image. Try storing the results as a Dict mapping image filenames to arrays of pixel coordinates. Then after you’ve computed which pixels come from which image, you should loop over the image files, reading each image only once, and extract the pixels from each one.

2 Likes

maybe you should look into the Parquet file format as well, you can just read / write the part you need to change if my understanding is correct.