I have a collection of relatively large videos (~10k frames), and I need to perform some operation on each of these frames, independently.
In an ideal world I would have each frame as a separate tiff, but these are .avi videos. Loading the full video at once has quite a significant memory footprint which my laptop is not always equipped to handle, so I played around a bit to find some way to make this process smooth and fast but I’m not totally satisfied. I don’t have much experience with handling large video data so not sure whether there is some obvious solution I am ignoring.
The best thing I have got so far was to parallelize access to video chunks, then operate sequentially on frames within each chunk, and finally collect all the chunks together at the end:
using VideoIO
using ThreadTools
dummy_func(x) = nothing
filename = "video.avi"
nframes = VideoIO.get_number_frames(filename)
chunk_size = 100
chunks = Iterator.partition(1:nframes, chunk_size)
maxthreads = 3
out_chunks = tmap(maxthreads, chunks, 1:length(chunks)) do chunk, i
vid = VideoIO.openvideo(filename; target_format=Int32(169)) # 16bit grayscale
offset = (i-1) * chunk_size
skipframes(vid, offset) # move to desired position in the video
# operate on each frame in the chunk individually
output = map(vid, chunk) do frame, _
dummy_func(frame)
end
close(vid)
output
end
out = vcat(out_chunks...)
The unexpected issue I have encountered is that the performance of this piece of code depends in some way on the chunk_size
as well as on the number of parallel processes maxthreads
.
For a single thread (maxthreads=1
) an individual chunk is processed the fastest (let’s give it a reference processing time of 1), and as the number of threads increases the processing performance worsens (maxthreads=2
→ time~1.5, maxthreads=3
→ time~2, …). For 2 or 3 threads this decreased performance is balanced by the parallelism, but for more threads the processing becomes significantly slower. In the end, for my case I found maxthreads=3
and chunk_size=100
to give the best performance.
I suppose the problem is that, even though I generate different videoreader objects, there will be some bottleneck in accessing the data from the hard drive, so this approach may be overall flawed
However, I could not think of any other way to speed this up.
Any help or suggestion greatly appreciated.