Does CSV.read allow another process to keep writing to file?

japhir · November 14, 2024, 10:19pm

I’m running a pretty expensive job on a cluster. This generates plaintext output files by appending new lines with data regularly (roughly once every second or so).

I’ve written some julia code to perform quick visualizations (also on the cluster!). But before accidentally messing anything up I want to double-check:

Is there any chance that CSV.read("filename.dat", DataFrame) could interfere with the other process writing to these files?
The CSV docs on input use a lot of words that I’m unfamiliar with.

Sukera · November 14, 2024, 10:31pm

In principle, nothing prevents a different process from opening a file you’ve already opened and writing to it at the same time as you’re reading from it. This is irrespective of CSV.jl.

japhir · November 14, 2024, 11:43pm

I’m worried about the “in principle” part here ;-). But I guess if I’m rsyncing the files over to another drive while it’s running it is probably doing something similar…

Ralph_Smith · November 14, 2024, 11:51pm

You might want to use a front-end IO handler in the reader to make sure the CSV parser doesn’t choke on incomplete records.

japhir · November 19, 2024, 1:07am

In the end I had to copy the data to a different drive anyway for more permanent storage. I’m making the figures now by reading the data from there.

dlakelan · November 19, 2024, 3:37am

In future if you want consistency and simultaneous read+write access, you might try DuckDB or SQLite.

stephancb · November 19, 2024, 10:00am

The CSV.read will not mess with the process writing to the file, but the “quick visualizations” are not guaranteed to see the latest writes. This would depend on whether/when the writing process flushes/syncs the file system.

Topic		Replies	Views
Multithreaded CSV writes Performance multithreading , csv	20	3463	April 14, 2023
Reading and writing to file simultaneously General Usage	4	1903	April 20, 2020
Reading and processing Data files concurrently Data parallel	18	3810	September 20, 2017
CSV.jl's CSV write seems slow Performance	32	5738	January 28, 2020
Questions about csv（How to write to csv faster） General Usage question , csv	10	540	October 26, 2022

Does CSV.read allow another process to keep writing to file?

Related topics