Data exchange between Julia sessions (same or different machines)

Julia is great for implementing distributed computing. I was wondering how could it be possible to take a decentralized approach to this distributed computing. Instead of calling workers from a Julia (master) session (for example through @spawnat), have two “independent” Julia sessions interact with each other and exchange data.

One could for example establish a connection with Sockets:

Machine A

# Local example
using Sockets

l = listen(5000)  # Port 5000
a = accept(l)

Machine B

using Sockets

c = connect(5000) # Connect to port 5000 (local example, for remote: connect(host, port))                   

(I’m not sure how to properly implement sockets, would be also grateful for some help)

Then with the socket connection established, exchange data with some kind of Channel/RemoteChannel

Machine A

# establish connection AB/BA
# create channel result_a
# pre-processing

while statement
 
b = fetch(result_b)

a = foo(b)

put!(a, result_a)

end

Machine B

# establish connection AB/BA
# create channel result_b
# pre-processing

while statement
 
a = fetch(result_a)

b = bar(a)

put!(b, result_b)

end

Any ideas on how to bind a channel with a socket, or how to implement such decentralized computing?

1 Like

You could serialize/deserialize functions and arguments between processes, with the function only being serialized by name. I believe this is how Julia’s Distributed library works from a high level.

For local data exchange, consider using shared memory (e.g. /dev/shm on linux) over sockets. Look up Apache Arrow and its subprojects (Arrow Flight, Arrow Plasma Store) for an example of low-overhead data transfer and serialization.

1 Like

@jpsamaroo @risingganymede Thank you for your responses. That’s not exactly what I’m looking for. My goal is not exclusively increase performance by multithreading/clustering, but have two independent Machines collaborating by passing each other some data while keeping most of it private.

Right, both proposed approaches would do that; when starting two separate Julia processes manually, you can then use either sockets or shared memory (plus some form of serialization) to pass only specific pieces of data between processes, or even to make remote procedure calls. Neither approach relies on Julia’s Distributed package, therefore each process can choose which data is shared and which data is private.

Thanks @jpsamaroo ! The documentation on serialization is somehow vague, now that I’ve tried it out it seems it is what I was looking for.

@risingganymede I’ll also take a deeper look on shared memory, see if it also fits my needs. Thanks

It would be nice if there were a package that wrapped all this together for fast and convenient sharing of data and execution of code between 2 independent Julia processes (where one was not started as a worker of the other), on the same or different machines.

I miss having the convenient IPC from kdb+/q:
https://code.kx.com/v2/basics/ipc/

@martincornejo, if you write it, I’d help test and use it. :wink:

I wrote a first version with some of the functionality I’d like to have:

It’s still a bit buggy. Evaluation errors seem to be swallowed up and cause the socket to break. Any tips on making it more robust would be appreciated.

3 Likes

I was thinking about it, but I don’t find the current procedure that inconvenient and I’m also short on free time right now. Nice to see you’ve taken the initiative, I’m not familiar with kdb+ nor q but I’ll take a look on your package.

I’ve pushed a fix that should solve the problem with evaluation errors.

Performance is acceptable for some purposes (1-3 milliseconds per function call), but slower than it could be because deserialization isn’t type-stable.

I plan on adding a type-stable function call that should get performance closer to 100 microseconds. At that point, I might add named pipe support as well.