I ended up doing the big arrow file with pyarrow, along with lines below:
with pa.output_stream("path/big.arrow") as sink:
with pa.ipc.new_file(sink, schema) as writer:
for arrowfile in glob.glob("path/to/files/*.arrow", recursive=False):
with pa.input_stream(arrowfile) as source:
with pa.ipc.open_file(source) as reader:
for i in range(0,reader.num_record_batches):
writer.write_batch(reader.get_batch(i))
That led to the another issue: How well Apache Arrow’s zero copy methodology is supported?