but it does not allow me to flush io to the file after the write without closing the file io. Any idea if this is supported? Or is there a different format allows me to do this?
using Arrow
using Tables
row_A = (field=[1.0, 2.0], temp=[1.0, 1.0], energy=[-0.0, -0.0])
row_B = (field=[3.0, 2.0], temp=[1.0, 1.0], energy=[-0.0, -0.0])
io = open("test.arrow", "w")
Arrow.append(io, row_A)
flush(io)
tbl = Arrow.Table("test.arrow") # this has two rows
Arrow.append(io, row_B)
flush(io)
tbl = Arrow.Table("test.arrow") # this still have two rows does not have B
close(io)
ASDF.jl should be working, but I am not using it any more. I switched to using ADIOS2.jl as file format, which has many more features.
I am using ADIOS2 when running simulations of PDEs. Every few iterations one writes some variables to the file and flushes them. This use case is very efficient with ADIOS2. In other respects, ADIOS2 is similar to HDF5, in that it is designed to hold multi-dimensional arrays with attributes.
I don’t know if Arrow.jl is at fault here (our implementation is bad) or it’s a general Arrow design issue – they may not have crash recovery as a design goal.
OK, I think I just did this on my own - a custom data format that given the data I’d like to flush to disk is quite simple. I don’t believe Arrow works out for me in the end. But still thanks to everyone’s replies here.