Image IO and processing efficiency: sequential versus bulk

jakewilliami · October 25, 2020, 12:51pm

I have an algorithm that loads and processes images. There are two ways of doing this:

Load all images together, and process them all together (bulk); or
One-by-one, load and process the images (sequential).

I have constructed MWEs of 1. and 2. For reproduction, any image dataset will do. The size of the dataset I have used for testing is around 10,000 images.

Bulk processing is very memory inefficient—however, it provides processing must faster. Sequential processing is much more efficient for memory (as it has to load all images into memory), but seems to take a lot longer. In terms of benchmarking times, this is seen below from the MWEs (in my main algorithm, get_vote is a considerably more advanced function, and so this time efficiency gap is much more pronounced):

julia> @btime mwe_bulk()
  2.852 s (2217448 allocations: 973.27 MiB)

julia> @btime mwe_seq()
  3.356 s (2217426 allocations: 972.87 MiB)

Does anyone have a good solution to image IO and processing that optimises both time and memory?

kimikage · October 25, 2020, 1:30pm

In general terms, file accesses are significantly slower than RAM accesses, so it’s strange that the difference is greater depending on the complexity of functions which do not include file accesses. In other words, your codes may not be the MWEs for the problem you’re just facing.

Why not do some profiling first?

jakewilliami · October 25, 2020, 1:34pm

@kimikage I can append the MWEs to have the real functions, but they will be much longer.

What do you mean by profiling?

kimikage · October 25, 2020, 1:41pm

I think it’s better to first identify whether the bottleneck is in the file I/O or not.
If the bottleneck is different between your actual code profiling results and the current MWE’s profiling results, then some advice on the MWEs may not be helpful.

jakewilliami · October 25, 2020, 1:49pm

@kimikage good call. I will do some profiling and post the results soon. The actual algorithm may take a moment to run.

kimikage · October 25, 2020, 2:40pm

Also, note that @btime (with repetition) is likely to be inconsistent with actual use case results, since disk cache and memory cache have a large impact on this kind of processing.

jakewilliami · October 25, 2020, 3:33pm

@kimikage Interesting about @btime, thanks for the information. It probably shows in my question, but I don’t know very much about performance/this lower level programming stuff.

I don’t exactly know what you mean by profiling, but if you meant to use @profile, then here are some results from that:

main_bulk()
main_seq()
mwe_bulk(): gist jakewilliami/169b476e8d4b257c5716f3e1f73fe6ec
mwe_seq() gist jakewilliami/b5aeca52e5362e2a08e293c6e9be2af1

I can only embed two link in my comment at a time, sorry.

P.S. When I say main_seq, I am using examples/basic.jl from my FaceDetection.jl package. To give an understanding of how much longer seq takes, the “get votes” process took 1.5 hours, but for bulk it took 2 minutes and 13 seconds.

P.P.S., I love ColoredLLCodes

libensvivit · October 26, 2020, 12:54am

I’m also trying to read image files as fast as possible, you can check my question here

Loading 10k images with Images.load() on my computer takes:
21.982461 seconds (2.49 M allocations: 991.897 MiB, 0.47% gc time)

Loading the same images with ArrayFire.load_image() function gives:
7.332682 seconds (229.77 k allocations: 14.024 MiB)

ArrayFire stores the images on GPU so I’m not confident with memory allocation but it is surely time-efficient. I am not using bulk reading in my algorithm but load and process is fast enough combined with @distributed parallelism.

kimikage · October 27, 2020, 1:17am

Sorry for the late reply.

As you may know, the profile results can be visualized in ProfileView.jl etc. Also, the profile results can be saved as portable binary files (although some details are lost).

Just to clarify, are those profiles post-compilation results? Also, do they fit into the trace buffer?
I think the codes can be made type stable for the most part, even if some dynamic dispatch is necessary.

I don’t think the I/O issue alone would make such a difference. At the very least, I can’t find any cause for such a difference in MWEs. I think we should look for a bottleneck once more (with less image files).

Thanks!

jakewilliami · October 27, 2020, 1:25am

You’re exactly right. I have just found what was causing the sequential algorithm to be so slow. It is a problem independent of the kind of processing. Thank you for your help!

Neat, I didn’t know about ProfileView.jl, I’ll look into that

Topic		Replies	Views
Loading Images is very slow, is there a work around? Performance question , images	2	1911	October 31, 2019
Loading lots of images Performance	5	1764	November 5, 2018
Blur filter using Parallel Accelerator General Usage images , parallel	3	1114	September 4, 2017
Efficiently Loading and Processing Large Number of Images New to Julia question , images	4	1601	March 31, 2021
Loading 60k images from a folder. Python code is way faster than Julia New to Julia question , images , speed-optimization	11	499	June 14, 2025

Image IO and processing efficiency: sequential versus bulk

Related topics