Triple buffering

cscherrer · September 14, 2020, 11:16pm

Hi, I’m curious if anyone has tried triple buffering. This is a generally useful framework, but I’m specifically thinking allocation-free MCMC (or maybe more accurately, iteration-independent allocation count).

I think it’s something like this:

function triple_buffer!(init, write!, log!, num_iterations)
    reading = init
    writing = deepcopy(init)
    logging = deepcopy(init)

    # Fill the pipeline
    write!(writing, reading)
    (reading, writing, logging) = (writing, logging, reading)
    write!(writing, reading)

    for i in 1:(num_iterations-2)
        (reading, writing, logging) = (writing, logging, reading)

        @sync begin
            @async write!(writing, reading)
            @async log!(logging)
        end       
    end

    # Empty the pipeline
    (reading, writing, logging) = (writing, logging, reading)
    log!(logging)
    (reading, writing, logging) = (writing, logging, reading)
    log!(logging)

    return nothing
end

Here write!(dest, src) takes a step in the whatever space, e.g. an HMC step, and log! outputs the results. I’m assuming we might not need to store the entire state at each step, and also that write! and log! might be closures, so they could have some fixed “scratch space” set aside to use for intermediate computations as needed. Oh, and both could themselves be parallel as well, of course.

Any thoughts? Too simple? Too complicated?

Especially interested if performance-oriented folks like @Elrod and @mohamed82008 think this approach could get good performance, or if there’s generally a better way to go about this.

Elrod · September 15, 2020, 2:41am

Why the @async?
This seems rather general.
I think I’d have to try it out before I can make more informative comments.
My first comment would be that I think you could generally do better than 3x copies of init.
But anything heavy should of course be in the write! closure (or whatever the appropriate closure is).

For HMC I had just one, and copied the contents into the saved samples. The need to copy is the price you pay for having zero-allocation trees; you can’t just store the last state if that state is part of a preallocated buffer and about to be over written.
I think this is generally a lower cost with HMC, especially when sampling chains from multiple threads.
I think this is a similar idea.

cscherrer · September 15, 2020, 2:52am

The idea is to keep reading, writing, and logging from slowing each other down. Separating them makes it so we can do all three at the same time. For example, if I only had one copy I’d have to take a step and then wait for information to be copied out. Double-buffering should do better. Triple might be a bit extreme, guess it may depend whether we end up bound by IO or computation.

I haven’t done much with parallel programming in Julia (mostly OpenMP years ago), but the @async is just there to allow the parallelism.

Topic		Replies	Views
Feedback on a Threadsafe Buffer Manager General Usage	0	214	October 10, 2023
Statistical allocation profiling Internals & Design proposal , performance , memory-allocation	1	655	May 4, 2019
How to preallocate and reuse buffers for oft-repeated computation? General Usage question , performance , memory-allocation , profiling , speed-optimization	15	1475	July 12, 2023
Blog post: Asynchronous programming in Julia Community blog-post	16	1375	February 24, 2025
Within-chain parallelization with Turing.jl Probabilistic programming turing	6	481	September 1, 2023

Triple buffering

Related topics