Fastest way to access individual frames from .avi video?

I have a collection of relatively large videos (~10k frames), and I need to perform some operation on each of these frames, independently.

In an ideal world I would have each frame as a separate tiff, but these are .avi videos. Loading the full video at once has quite a significant memory footprint which my laptop is not always equipped to handle, so I played around a bit to find some way to make this process smooth and fast but I’m not totally satisfied. I don’t have much experience with handling large video data so not sure whether there is some obvious solution I am ignoring.

The best thing I have got so far was to parallelize access to video chunks, then operate sequentially on frames within each chunk, and finally collect all the chunks together at the end:

using VideoIO
using ThreadTools

dummy_func(x) = nothing
filename = "video.avi"
nframes = VideoIO.get_number_frames(filename)
chunk_size = 100
chunks = Iterator.partition(1:nframes, chunk_size)
maxthreads = 3
out_chunks = tmap(maxthreads, chunks, 1:length(chunks)) do chunk, i
    vid = VideoIO.openvideo(filename; target_format=Int32(169)) # 16bit grayscale
    offset = (i-1) * chunk_size
    skipframes(vid, offset) # move to desired position in the video
    # operate on each frame in the chunk individually
    output = map(vid, chunk) do frame, _
        dummy_func(frame)
    end
    close(vid)
    output
end
out = vcat(out_chunks...)

The unexpected issue I have encountered is that the performance of this piece of code depends in some way on the chunk_size as well as on the number of parallel processes maxthreads.
For a single thread (maxthreads=1) an individual chunk is processed the fastest (let’s give it a reference processing time of 1), and as the number of threads increases the processing performance worsens (maxthreads=2 → time~1.5, maxthreads=3 → time~2, …). For 2 or 3 threads this decreased performance is balanced by the parallelism, but for more threads the processing becomes significantly slower. In the end, for my case I found maxthreads=3 and chunk_size=100 to give the best performance.

I suppose the problem is that, even though I generate different videoreader objects, there will be some bottleneck in accessing the data from the hard drive, so this approach may be overall flawed
However, I could not think of any other way to speed this up.

Any help or suggestion greatly appreciated.

1 Like

I guess you can’t go faster than hard drive speed and this is probably fastest if you just read from beginning to end sequentially. If you can’t keep up with the reading speed single threaded then you could work on the serially read frames with multiple threads.

So I’d try reading all the frames in sequence and pushing them to a channel. From this channel, a bunch of threads read those frames when they become available and work on them individually, pushing their own results to another channel that does the result aggregation.

Depending on how videoio implements seeking, this operation might involve loading all skipped frames as well. I don’t know if it’s clever enough to skip to the nearest key frame and only start decoding from there?

Could it be an advantage to convert once to tiff stacks via ffmpeg and then work with that? (I know, not elegant but since ffmpeg is quite optimized you might still get better results that way.)