Safe overwriting of files

I have a workflow where I repeatedly need to overwrite files with new content. I think the easiest way to overwrite a file is to use write(filename,newcontent).

My question is how safe this function is against data loss when the julia process calling it gets interrupted unexpectedly during the write operation (e.g. someone hitting CTRL-C, a cluster manager killing process because of timeout etc…). I basically want to make sure that the file always either contains the old content or the new one but it should ideally never end up in a state where both the old and the new content of the file are lost. Is it safer to do something like:

mv(filename, filename * ".old")
write(filename,newcontent)
rm(filename * ".old")

so I could recover the data manually or are there already existing protection mechanisms against data loss in the former short version?

3 Likes

The short version is indeed not data loss safe (but very convenient when that is not a significant issue).

The usual approach to improve safety is to write the new content into a temporary file and once that has succeeded (and possibly been verified by reading it back or computing a hash), move it into its target location.

4 Likes

The mv operation is atomic when used within a file system (e.g. within a directory), so you should use that. The typical pattern is as Gunnar says,

tfilename = filename * ".tmp"
write(tfilename, newcontent)
mv(tfilename, filename, force=true)

This will ensure that filename always contains valid content.

12 Likes

Thanks a lot to both of you, exactly the information I needed.

The mv(...; force=true) function in Julia is not atomic. It calls rm before renaming, so there can be a time where the destination file is missing. julia/base/file.jl at c6732a79494f604e0f320ea451856a1c10511659 · JuliaLang/julia · GitHub

3 Likes

There is a function in Python called os.replace that tries to be atomic but I don’t know how to do this in Julia (outside of using PythonCall)

Wow, what a mess. Then you can replace the mv with

@ccall rename(tfilename::Cstring, filename::Cstring)::Cint

or

run(`mv $tfilename $filename`)
1 Like

To make everything a bit more messy, both of the methods sgaure suggests above, will sadly not work on Windows, as rename does not overwrite and mv does not exist.

On Windows you can instead use
run(`cmd /C MOVE /Y $old_filename $new_filename`)

I’m not sure if this is in fact atomic though, as a quick search yields contradictory information.

Another possible strategy is to put your contents into a database, for example SQLite with file based storage (DuckDB also qualifies, I guess). Databases are often chosen for handling larger amounts of content with non-trivial structuring, and may feel to be an overkill in this respect for simpler data. But overlooked is frequently another advantage: SQLite provides save inserts and updates=ACID transactions, across all supported OSs, and is, owing to its enormous popularity, very thouroughly tested. Trying to overwrite files safely but “manually” is to some extend reinventing the wheel.

2 Likes

A possible low-tech workaround is to do a two-step mv dance where the original file is temporarily moved to a backup name and then removed after the new file is in place.

If you are SUPER paranoid. Then try this

I made a PR to make mv more atomic for this pattern to work.

2 Likes

I made a new PR to document Base.rename which is a more cross-platform version of @ccall rename(tfilename::Cstring, filename::Cstring)::Cint

1 Like