Hi, I’m curious if anyone has tried triple buffering. This is a generally useful framework, but I’m specifically thinking allocation-free MCMC (or maybe more accurately, iteration-independent allocation count).
I think it’s something like this:
function triple_buffer!(init, write!, log!, num_iterations)
reading = init
writing = deepcopy(init)
logging = deepcopy(init)
# Fill the pipeline
write!(writing, reading)
(reading, writing, logging) = (writing, logging, reading)
write!(writing, reading)
for i in 1:(num_iterations-2)
(reading, writing, logging) = (writing, logging, reading)
@sync begin
@async write!(writing, reading)
@async log!(logging)
end
end
# Empty the pipeline
(reading, writing, logging) = (writing, logging, reading)
log!(logging)
(reading, writing, logging) = (writing, logging, reading)
log!(logging)
return nothing
end
Here write!(dest, src)
takes a step in the whatever space, e.g. an HMC step, and log!
outputs the results. I’m assuming we might not need to store the entire state at each step, and also that write!
and log!
might be closures, so they could have some fixed “scratch space” set aside to use for intermediate computations as needed. Oh, and both could themselves be parallel as well, of course.
Any thoughts? Too simple? Too complicated?
Especially interested if performance-oriented folks like @Elrod and @mohamed82008 think this approach could get good performance, or if there’s generally a better way to go about this.