I’m running a pretty expensive job on a cluster. This generates plaintext output files by appending new lines with data regularly (roughly once every second or so).
I’ve written some julia code to perform quick visualizations (also on the cluster!). But before accidentally messing anything up I want to double-check:
Is there any chance that CSV.read("filename.dat", DataFrame) could interfere with the other process writing to these files?
The CSV docs on input use a lot of words that I’m unfamiliar with.
In principle, nothing prevents a different process from opening a file you’ve already opened and writing to it at the same time as you’re reading from it. This is irrespective of CSV.jl.
I’m worried about the “in principle” part here ;-). But I guess if I’m rsyncing the files over to another drive while it’s running it is probably doing something similar…
The CSV.read will not mess with the process writing to the file, but the “quick visualizations” are not guaranteed to see the latest writes. This would depend on whether/when the writing process flushes/syncs the file system.