Pipeline writes out of order

Is the following a bug or am I doing something wrong?

julia> open("test.txt", "w") do file
           write(file, "line1\n")
           run(pipeline(`echo line2`, file))
           write(file, "line3\n")
       end;

shell> cat test.txt
line2
line1
line3

Try using flush after each write: I/O and Network · The Julia Language

I would assume that the run command finishes before the underlying I/O system is done handling the write operation (i.e., it’s a race condition).

julia> open("test.txt", "w") do file
           write(file, "line1\n")
           flush(file)
           run(pipeline(`echo line2`, file))
           write(file, "line3\n")
       end

is correct with flush before run (doesn’t have to be after every write), and I’m not even sure it’s a race-condition without, meaning sometimes you would get correct order without. Skipping doing flush automatically for you is an (allowed) optimization, and everything will come out in program order.

That said, what you do seems like it’s not, but technically, with run, you open up a new program/process, and yes, those “two” are competing, but I’m just not sure your main one would ever to the flush at that point (I might be wrong, the OS has buffers, and the program too).

I tried sleep(10) instead of the flush, and it seems to prove it’s not a race-condition, at least in that case. I’m not sure if when you’re really unlucky, and some buffers, both the program’s, and the OS’, would fill at the same time, and they might race.

I’m thinking, if you and most users would want an implicit flush in that case for run. Maybe this can be thought of as a bug, or at least a surprise we could file an issue, or even better a PR to Julia.

1 Like

Thanks, the flush works. I would expect this kind of behavior if write and pipeline where given a filename, but it’s a bit surprising to see this for sequential accesses to a stream object controlled by Julia. It seems to me that an implicit flush before starting the external process shouldn’t have a significant performance impact… I’ll file an issue.

I would strongly recommend NOT relying on this. I believe the results are undefined. Meaning it might work now on your current machine but it might not work later or on another OS.

I didn’t think write checks if someone else wrote to a file since the last write. I’m actually extremely surprised that this works. I thought write had a “current offset” so when you write 6 bytes with the first command, then write 6 bytes with the last command it would write immediately after the previous 6 bytes…not check where the “current” end of the file is and write it there.

On a slightly side note I believe write has an 4k buffer so it won’t actually commit the bytes to disk without a flush until you write that much.

1 Like

As I understand, all the writes in the sequence end up as writes on the underlying file descriptor which maintains its own state, and the ios stream doesn’t manipulate the fd offset explicitely in this case. If the stream gets out of sync because of the out-of-band write from run I would see that as a bug, since I’m only using the high-level API for writing to the stream, and only in a sequence of synchronous method calls.

Since run can run arbitrary commands, I think it’d be practically impossible to detect, in general, that you’re writing to the same file as write.

2 Likes

In general, that’s true. But in the specific case of OP, the pipeline object has a reference to a stream object that ultimately owns an actual file descriptor. For the child process to write to it, the two processes will literally need to share the same file descriptor (e.g. it persists after the fork and exec syscalls; there’s also a syscall that passes an fd over IPC IIRC).

To me, that means (1) it’s definitely possible for run to call flush before fork, and (2) we probably should, because we are not talking about different file descriptors pointing at the same file-on-disk and so there’s well-defined control-flow / causality at least up to the point that fork happens.

Yes please!

1 Like

This seems wrong to me, because all the writes are in one task, which means it should behave as though all the operations where blocking. I’ll look into it.

2 Likes

Issue filed at Pipeline with stdout=IOStream writes out of order · Issue #36069 · JuliaLang/julia · GitHub .

3 Likes

Thank you for filing the issue!

2 Likes