Image IO and processing efficiency: sequential versus bulk

I have an algorithm that loads and processes images. There are two ways of doing this:

  1. Load all images together, and process them all together (bulk); or
  2. One-by-one, load and process the images (sequential).

I have constructed MWEs of 1. and 2. For reproduction, any image dataset will do. The size of the dataset I have used for testing is around 10,000 images.

Bulk processing is very memory inefficient—however, it provides processing must faster. Sequential processing is much more efficient for memory (as it has to load all images into memory), but seems to take a lot longer. In terms of benchmarking times, this is seen below from the MWEs (in my main algorithm, get_vote is a considerably more advanced function, and so this time efficiency gap is much more pronounced):

julia> @btime mwe_bulk()
  2.852 s (2217448 allocations: 973.27 MiB)

julia> @btime mwe_seq()
  3.356 s (2217426 allocations: 972.87 MiB)

Does anyone have a good solution to image IO and processing that optimises both time and memory?

1 Like

In general terms, file accesses are significantly slower than RAM accesses, so it’s strange that the difference is greater depending on the complexity of functions which do not include file accesses. In other words, your codes may not be the MWEs for the problem you’re just facing.

Why not do some profiling first?

@kimikage I can append the MWEs to have the real functions, but they will be much longer.

What do you mean by profiling?

I think it’s better to first identify whether the bottleneck is in the file I/O or not.
If the bottleneck is different between your actual code profiling results and the current MWE’s profiling results, then some advice on the MWEs may not be helpful.

1 Like

@kimikage good call. I will do some profiling and post the results soon. The actual algorithm may take a moment to run.

Also, note that @btime (with repetition) is likely to be inconsistent with actual use case results, since disk cache and memory cache have a large impact on this kind of processing.

@kimikage Interesting about @btime, thanks for the information. It probably shows in my question, but I don’t know very much about performance/this lower level programming stuff.

I don’t exactly know what you mean by profiling, but if you meant to use @profile, then here are some results from that:

  • main_bulk()
  • main_seq()
  • mwe_bulk(): gist jakewilliami/169b476e8d4b257c5716f3e1f73fe6ec
  • mwe_seq() gist jakewilliami/b5aeca52e5362e2a08e293c6e9be2af1

I can only embed two link in my comment at a time, sorry.

P.S. When I say main_seq, I am using examples/basic.jl from my FaceDetection.jl package. To give an understanding of how much longer seq takes, the “get votes” process took 1.5 hours, but for bulk it took 2 minutes and 13 seconds.

P.P.S., I love ColoredLLCodes :slight_smile:

I’m also trying to read image files as fast as possible, you can check my question here

Loading 10k images with Images.load() on my computer takes:
21.982461 seconds (2.49 M allocations: 991.897 MiB, 0.47% gc time)

Loading the same images with ArrayFire.load_image() function gives:
7.332682 seconds (229.77 k allocations: 14.024 MiB)

ArrayFire stores the images on GPU so I’m not confident with memory allocation but it is surely time-efficient. I am not using bulk reading in my algorithm but load and process is fast enough combined with @distributed parallelism.


Sorry for the late reply.

As you may know, the profile results can be visualized in ProfileView.jl etc. Also, the profile results can be saved as portable binary files (although some details are lost).

Just to clarify, are those profiles post-compilation results? Also, do they fit into the trace buffer?
I think the codes can be made type stable for the most part, even if some dynamic dispatch is necessary.

I don’t think the I/O issue alone would make such a difference. At the very least, I can’t find any cause for such a difference in MWEs. I think we should look for a bottleneck once more (with less image files).

Thanks! :smile:

1 Like

You’re exactly right. I have just found what was causing the sequential algorithm to be so slow. It is a problem independent of the kind of processing. Thank you for your help!

Neat, I didn’t know about ProfileView.jl, I’ll look into that :slight_smile: