Changing file on disk not affecting IOStream data, is this a bug?

I’m extending my support for memory-mapped TIFF images in my package TiffImages.jl to support writing in addition to reading. One thing I’m running into is that my structs contain a file handle that I read/write to, but this IOStream object behaves unexpectedly when a file changes on disk.

julia> filepath = "test.txt";

julia> io = open(filepath, read=true, append=true);

julia> write(io, "test")
4

julia> flush(io)

julia> seekstart(io);

julia> read(io, String)
"test"

So far, it behaves as expected. But now if I open test.txt on disk and change the text to test123 that isn’t reflected in the iostream?

julia> seekstart(io);

julia> read(io, String)
"test"

Opening a new IOStream shows the changes:

julia> io = open(filepath, read=true, append=true);

julia> seekstart(io);

julia> read(io, String)
"test123\n"

How do I get the IOStream to reflect the file on disk?

This is on Julia 1.5.3

Can’t confirm this behaviour on windows:

julia> versioninfo()
Julia Version 1.5.3
Commit 788b2c77c1 (2020-11-09 13:37 UTC)
Platform Info:
  OS: Windows (x86_64-w64-mingw32)
  CPU: AMD Ryzen 9 3900X 12-Core Processor
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-9.0.1 (ORCJIT, znver2)

The changes I do with an editor are shown everytime in the REPL when I do a

julia> seekstart(io);read(io, String)
"test123"

The file is on a local SSD (d:\temp), of course, not a network file system or a RAID or something like that, where some buffers with higher latency may be causing such an issue.

I’m on Fedora 33. Writing to a local SSD.

julia> versioninfo()
Julia Version 1.5.3
Commit 788b2c77c1* (2020-11-09 13:37 UTC)
Platform Info:
  OS: Linux (x86_64-redhat-linux)
  CPU: Intel(R) Core(TM) i7-10710U CPU @ 1.10GHz
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-9.0.1 (ORCJIT, skylake)

This also happens on a different machine (Ubuntu 20.04.01 with Julia 1.4.2)

julia> filepath = "test.txt";

julia> io = open(filepath, read=true, append=true);

julia> write(io, "test")
4

julia> flush(io)

julia> seekstart(io);

julia> read(io, String)
"test"

shell> cat test.txt
test
shell> vim test.txt

shell> cat test.txt
test123

julia> seekstart(io); read(io, String)
"test"

can’t repro on Linux (Arch), with Julia 1.5.3

I was able to reproduce this

julia> versioninfo()
Julia Version 1.5.3
Commit 788b2c77c1 (2020-11-09 13:37 UTC)
Platform Info:
  OS: macOS (x86_64-apple-darwin18.7.0)
  CPU: Intel(R) Core(TM) i5-3470S CPU @ 2.90GHz
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-9.0.1 (ORCJIT, ivybridge)
Environment:
  JULIA_EDITOR = code
1 Like

I believe Julia uses fread/fwrite when accessing IOStream files and the standard library would be buffering the data in that case. Why everyone’s not seeing the buffering is beyond me.

Another thread that might be helpful:

I was able to reproduce and analyze this behaviour on a debian system.

The reason for this, in my experiments, is, that changing the content of the file while it is opened within Julia, e.g. with vim, does not only change the file, but deletes it and writes a new version. The deletion just deletes the inode to the original data, whereas the file descriptor opened within Julia still points to the original data on the filesystem which still exists. The changed data (the new test.txt) is now saved at a different place on the filesystem and test.txt is a new inode to the new data.

if any process has the file open when this happens, deletion is postponed until all processes have closed the file
Deleting Files (The GNU C Library)

See the following:

julia> filepath = "test.txt"; io = open(filepath, read=true, append=true);

julia> seekstart(io);read(io, String)
"test\n"

Now we change in another shell the file test.txt and see what happens to the file descriptor:

root:~# ps -efl | grep julia | grep -v grep
4 S root      1001 29194 10  80   0 - 125484 ep_pol 12:30 pts/1   00:00:00 julia-1.5.3/bin/julia

pid of our julia process is 1001 :

root:~# ll /proc/1001/fd/
...
lrwx------ 1 root root 64 Jan 15 12:31 20 -> /root/test.txt
...

You see an open file descriptor called 20 which points to the open file (and some more which I have removed here).
Now we edit the content of the file with vim and save it, now we get:

root:~# ll /proc/1001/fd/
...
lrwx------ 1 root root 64 Jan 15 12:31 20 ->  (deleted)/root/test.txt~
...

Which means, the original file, the one which is still opened in Julia, has been deleted and what is left is a symbolic link to the original place of the (unchanged) data on the filesystem.
You can try to edit /proc/1001/fd/20 (the 20 is my special case, you will have some other name) and you will see these changes reflected with the Julia command.

Ok, lets check, what happens, if we don’t change the file with an editor, but just append some data:

julia> close(io)

julia> filepath = "test.txt"; io = open(filepath, read=true, append=true);

julia> seekstart(io);read(io, String)
"test\n"

Second shell:

root:~# ll /proc/1001/fd/
...
lrwx------ 1 root root 64 Jan 15 12:31 20 -> /root/test.txt
...
root:~# echo "123" >> test.txt
root:~# ll /proc/1001/fd/
...
lrwx------ 1 root root 64 Jan 15 12:31 20 -> /root/test.txt
...
julia> seekstart(io);read(io, String)
"test\n123\n"

Voila, changes show up.

Of course, Windows is behaving differently, so it couldn’t be reproduced. What Linux (Arch) does, I don’t know, but apparently also something else or @jling didn’t edit with an editor.
The complete story is quite complex and can be searched with keywords: “linux delete unlink inodes”.

5 Likes

I feel I should point out that reading from a file while another process is writing to the file is risky at best. Unless you have control of both programs and are able to synchronize your reads and writes there is a very real possibility that when reading you only get half the changes because the other program is still writing the changes.

I did edit with an editor (vim in my case)

julia> io = open("test.txt", read=true, append=true);

julia> seekstart(io);

julia> read(io, String)
"test123\n"

shell> vim test.txt

julia> seekstart(io);

julia> read(io, String)
"test123\n123\n"

Hi all, thanks for the feedback. Thanks to @oheil’s careful sleuthing, I’m pretty sure the delete/write new file is the reason for the behavior I’m seeing, but it is curious that @jling can’t reproduce on Arch. Maybe they’re doing something different?

I guess in that case, what’s the best solution for my strategy for memory-mapping? I don’t think I can use the built-in memory-mapping facilities because of the complex layout of TIFFs on disk so my plan was to just read/write to disk on getindex/setindex calls, but now I’m questioning if that’s the right strategy given the behavior observed here. Any one have thoughts?

I think your best bet will be to watch the file/directory with:

https://docs.julialang.org/en/v1/stdlib/FileWatching/

When you notice a change reopen the file and read the contents. That will ensure that you get the latest data from the file.