Inconsistent results based on number of processes


#1

I stumbled across this inconsistency, which I find disconcerting, during a process of upgrading from 0.4.7 to 0.6. (Also, I discovered that the manual’s example of an echo server does not properly handle multiple connections, but perhaps that’s another topic, and is likely related to this example.)

Create a file, test.jl:

@sync begin
    ref = @spawn 1 + 1

    function whatever(ref)
        sleep(5)
        write(STDOUT, string(fetch(ref), "\n"))
    end

    @async whatever(ref)

    ref = @spawn 2 + 2
end

Then:

> julia test.jl
4
> julia -p 2 test.jl
2

What’s the reason for the discrepancy? Is this behavior expected?


#2

The order of the two actions:

  • write(STDOUT, string(fetch(ref), "\n"))
  • ref = @spawn 2 + 2

is unspecified in terms of the concurrency logic of Julia according to the code. When there are two Julia processes available, one order is chosen, and when there is one the reverse.

If for example the ref = @spawn 2+2 was after the @sync's end then it was guaranteed to run after the write and the script would always output 2.


#3

So I added the @sync so I’d get the stdout on the shell. I think the real issue here is when the expression whatever(ref) gets resolved. Let me point out the issue I have with the manual for the TCP server (the one using the nested @async at the bottom of this section). I don’t believe it does what it claims to do by allowing multiple connections. Here:

server = listen(2000)
while true
    sock = accept(server)
    @async begin
        x = readline(sock)
        write(sock, x)
    end
end

Then, in a shell just do > julia test.jl. Pardon my python, but in IPython, I can then do:

In [1]: import socket

In [2]: s = socket.socket()

In [3]: s.connect(('', 2000))

In [4]: s.send('check check\n')
Out[4]: 12

In [5]: s.recv(1024)
Out[5]: 'check check'

Works great. However, there’s an issue with multiple connections.

In [1]: import socket

In [2]: s = socket.socket()

In [3]: t = socket.socket()

In [4]: s.connect(('', 2000))

In [5]: t.connect(('', 2000))

In [6]: s.send('check 1\n')
Out[6]: 8

In [7]: s.recv(1024)

And this hangs, waiting for the response. This is because the first @async task is on the queue, blocking on reading from the s socket. But after opening the t socket, sock gets redefined and instead the first task writes to the t socket, which is not the appropriate client. Simple enough to confirm:

In [1]: import socket

In [2]: s = socket.socket()

In [3]: t = socket.socket()

In [4]: s.connect(('', 2000))

In [5]: t.connect(('', 2000))

In [6]: s.send('check 1\n')
Out[6]: 8

In [7]: t.recv(1024)
Out[7]: 'check 1'

These weren’t issues in 0.4.7 (I suspect they were introduced in 0.5 with some of the changes to function closures, but that’s speculation on my part). Perhaps this is expected, or even desired, behavior, but at the very least, the manual should be much clearer about the perils here.

Maybe I missed it in the language spec, but it would be nice to have tight control over when an expression is evaluated/resolved/etc.


#4

This seems to work appropriately, but is a bit convoluted:

server = listen(2000)
while true
    sock = accept(server)
    ex = quote
        x = readline($sock)
        write($sock, x)
    end
    @async eval(ex)
end

#5

This seems to work as well and is a little less convoluted. I don’t know why it works though…

server = listen(2000)
read_and_write(sock) = write(sock, readline(sock))
while true
    sock = accept(server)
    @async read_and_write(sock)
end

#6

Yeah, it’s not clear to me what makes that fundamentally different from the begin ... end block.

But you’ve expressed my overall concern: asynchronous tasks and parallelism can already be difficult to reason about, and adding an additional layer of uncertainty around when symbols get resolved makes it even harder. Generally speaking, it’s not a good situation where you write code that works (or seems to work…) and don’t know why it does what it does.

At the very least, these issues should be discussed heavily in the docs. It’s been a bumpy ride migrating a production system from 0.4.7 to 0.6, and that’s largely in part because the release notes for 0.5 and 0.6 didn’t, in my opinion, adequately touch on the consequences of the changes.


#7

The original example seems to have stopped working sometime between v0.4.7 and v0.5.0 (just tested on an old installation). So the closure changes might indeed be responsible. I think you should open a github issue about the concerns raised in this thread. The docs could certainly use improvement and at the very least that example should be fixed.


#8

A begin block doesn’t introduce a new scope and therefore doesn’t capture sock. You could use a closure or a let block and the code works fine:

server = listen(2001)
while true
    sock = accept(server)
    @async let sock = sock
        println(STDOUT, readline(sock))
    end
    # or:
    # @async (() -> println(STDOUT, readline(sock)))()
end

#9

But isn’t @async already wrapping its argument in a closure?


#10

It is. No idea why that isn’t enough.

Also

@async let sock = sock
    x = readline(sock)
    println(sock, x)
end

works, while

@async begin
    x = readline(sock)
    println(sock, x)
end

doesn’t. println(sock, readline(sock)) works without being wrapped though…


#11

Regardless of @async wrapping…

@async let sock = sock
    x = readline(sock)
    println(sock, x)
end

This isn’t clear to me when the let block actually gets executed. It seems like this may be the same as the previous race condition, where if the task sits on the queue, it’s not clear when sock is resolved. Semantically, wouldn’t this be clearer? (Haven’t tested this yet.)

let sock = sock
    @async begin
        x = readline(sock)
        println(sock, x)
    end
end

#12

At the risk of bifurcating the discussion, I created a GitHub issue: https://github.com/JuliaLang/julia/issues/23130