Loading lots of images

robertdj · November 3, 2018, 8:55pm

I have a lot of folders, each with a lot of images. All images have the same size.
I want to compute the “average image” of each folder, that is, the pixel-wise average of the images in the folder.

With the Images package, the following code get the job done, where files is a vector with file names:

function average_image(files, sz)
    image_sum = zeros(Gray{Float64}, sz)

    for file in files
        image_sum += load(file)
    end

    return(image_sum / length(files))
end

Unfortunately, this is slow and use a lot of memory.
I suspect that every image is loaded into its own chunk of memory and soon after discarded by the garbage collector. But since they all have the same size I wonder if it’s possible to load each image into the same chunk of memory?

Sukera · November 3, 2018, 10:56pm

Take a look at the @code_warntype of your function - there’s a bunch of type conversion happening and thus a bunch of Any. In particular, image_sum += load(..) and image_sum / length(files) are the offenders - broadcasting those functions should make it faster (i.e., .+= and ./ respectively), but I don’t think you’re going to be able to remove all of them because of that load.

It’s difficult to give concrete advice though - do you have some benchmarking results you can compare to, maybe some small subset of the images as a testing ground?

robertdj · November 4, 2018, 1:42pm

Thanks! I feel pretty stupid for not @code_warntypeing.

One thing I find peculiar is that if I follow your advice with .+= and ./ I reduce the memory usage, but @code_warntype still gives an Any type in an intermediate result, although the final result has a known type.
I can remove this explicitly type annotating the load.

But I suppose it’s difficult for load to know the type of the file it loads?

Sukera · November 4, 2018, 1:54pm

Precisely! At compile time the return type of load cannot really be known for the general case - maybe there are some hints to give in that function curtesy of Images.jl? You’ll have to check the docs on that.

Those changes alone should already be pretty much the best you can do without too much work, I think.

yakir12 · November 4, 2018, 1:57pm

Also, if you ever need more than just the mean (via sum), then check this out. I needed something similar too…

robertdj · November 5, 2018, 7:59pm

I’m having one more problem: I would like to run the computation for the folders in “parallel” and reading the documentation about channels I see that channels are well suited for such an I/O intensive task.

However, since I have a lot folders, I would like to control the number of “concurrent” tasks. The documentation I link to above have an example where sleeping processes run 4 at a time, but I cannot figure out how to adapt that to my situation.

With an average_image function like in my first post I have a wrapper that saves the output:

save_average(dir, sz)
    files = joinpath.(dir, readdir(dir))
    avgimg = average_image(files, sz)

    savename = string(dir, ".png")
    save(savename, avgimg)
end

Processing all folders at once as explained in this question, which I fear is too agressive:

@sync for dir in dirs
    @async save_average(dir, sz)
end

Topic		Replies	Views
Efficiently Loading and Processing Large Number of Images New to Julia question , images	4	1601	March 31, 2021
OutOfMemory() when loading a database of images Performance images , memory	9	561	October 28, 2021
Image IO and processing efficiency: sequential versus bulk Performance images , performance , memory-allocation , optimization , time	9	926	October 27, 2020
Loading 60k images from a folder. Python code is way faster than Julia New to Julia question , images , speed-optimization	11	499	June 14, 2025
Processus stop for large array of images Performance array , memory-allocation	2	319	August 18, 2022

Loading lots of images

Related topics