Reliable networking between Julia processes?

I’m trying setup communication between two Julia processes using TCPSocket, but I’m having some issues making it work reliably and I’m not sure my design is correct (I don’t know much about networking).

What I do essentially is to start a server on each process that listen to incoming data and process it:

function start_server()
    port, server = listenany(8000)
    @async begin
        while true
            sock = accept(server)
            @async process_client(sock)
        end
    end
    port, server
end

function process_client(sock)
    while isopen(sock)
        data = try
            deserialize(sock)
        catch err
            ...
        end

        response = process_message(data...)
        serialize(sock, response)
    end
end

And then to send data from process1 to process2 I do something like:

serialize(process1, data)
response = deserialize(process1)

I’ve got more try .. catch blocks to deal with errors but I’m still getting some issues, for example when trying to deserialize data types that are not defined on one of the process (I know that should fail, I just want it to fail gracefully).

Is that design correct ? I looked a bit at Distributed code but it’s rather complicated. I see it uses two streams (one to read one to write) instead of one, is that a better way to do it?

Also is there any other examples/resources around?

Thanks.

What you are describing thus far is that you have 2 julia processes, sitting and listening, but no-one as yet starting a conversation. Somewhere you need someone to act as client and call connect to the server.

Typically it is good enough to have one server listening as you describe who also needs to start first, the other as client calling connect. Now you should have a TCPSocket object on both sides and can use that to both read and write.

In some scenarios it makes sense to have both sides have a listening server and both connect as client and then use 2 one-way TCPSockets (read from server socket, write to client socket), however that is a bit of a waste. Maybe makes sense if code is exactly the same on both sides and either side can kick off communications for example triggered by some external event.

But maybe taking a step back, do you really need to do it at the TCP level or can you operate at a higher level, for example using Distribute or Julia channels or MPI (see Parallel Computing · The Julia Language for some details on all of them)? Although I haven’t tried setting up the higher level networking myself from what I have read Julia seems to have so many tools available that going down to the basics of TCPSockets only makes sense if you communicating to non-Julia end-points or if you want to do something very specific.

Ok, thanks for answer. I’ve got the connect part but I left it out, I’m trying to put a minimal example together but I was still a bit confused about the proper architecture.

Maybe I can could reuse some of Distributed code but I want something independent of the worker system (my two Julia process should still be able to manage workers normally).