Data exchange between Julia sessions (same or different machines)

martincornejo · April 24, 2020, 4:27pm

Julia is great for implementing distributed computing. I was wondering how could it be possible to take a decentralized approach to this distributed computing. Instead of calling workers from a Julia (master) session (for example through @spawnat), have two “independent” Julia sessions interact with each other and exchange data.

One could for example establish a connection with Sockets:

Machine A

# Local example
using Sockets

l = listen(5000)  # Port 5000
a = accept(l)

Machine B

using Sockets

c = connect(5000) # Connect to port 5000 (local example, for remote: connect(host, port))

(I’m not sure how to properly implement sockets, would be also grateful for some help)

Then with the socket connection established, exchange data with some kind of Channel/RemoteChannel

Machine A

# establish connection AB/BA
# create channel result_a
# pre-processing

while statement
 
b = fetch(result_b)

a = foo(b)

put!(a, result_a)

end

Machine B

# establish connection AB/BA
# create channel result_b
# pre-processing

while statement
 
a = fetch(result_a)

b = bar(a)

put!(b, result_b)

end

Any ideas on how to bind a channel with a socket, or how to implement such decentralized computing?

jpsamaroo · April 24, 2020, 5:08pm

You could serialize/deserialize functions and arguments between processes, with the function only being serialized by name. I believe this is how Julia’s Distributed library works from a high level.

risingganymede · April 24, 2020, 5:16pm

For local data exchange, consider using shared memory (e.g. /dev/shm on linux) over sockets. Look up Apache Arrow and its subprojects (Arrow Flight, Arrow Plasma Store) for an example of low-overhead data transfer and serialization.

martincornejo · April 24, 2020, 5:52pm

@jpsamaroo @risingganymede Thank you for your responses. That’s not exactly what I’m looking for. My goal is not exclusively increase performance by multithreading/clustering, but have two independent Machines collaborating by passing each other some data while keeping most of it private.

jpsamaroo · April 24, 2020, 10:04pm

Right, both proposed approaches would do that; when starting two separate Julia processes manually, you can then use either sockets or shared memory (plus some form of serialization) to pass only specific pieces of data between processes, or even to make remote procedure calls. Neither approach relies on Julia’s Distributed package, therefore each process can choose which data is shared and which data is private.

martincornejo · April 24, 2020, 10:23pm

Thanks @jpsamaroo ! The documentation on serialization is somehow vague, now that I’ve tried it out it seems it is what I was looking for.

@risingganymede I’ll also take a deeper look on shared memory, see if it also fits my needs. Thanks

robsmith11 · April 25, 2020, 12:12am

It would be nice if there were a package that wrapped all this together for fast and convenient sharing of data and execution of code between 2 independent Julia processes (where one was not started as a worker of the other), on the same or different machines.

I miss having the convenient IPC from kdb+/q:

@martincornejo, if you write it, I’d help test and use it.

robsmith11 · April 25, 2020, 9:03am

I wrote a first version with some of the functionality I’d like to have:

It’s still a bit buggy. Evaluation errors seem to be swallowed up and cause the socket to break. Any tips on making it more robust would be appreciated.

martincornejo · April 25, 2020, 10:25am

I was thinking about it, but I don’t find the current procedure that inconvenient and I’m also short on free time right now. Nice to see you’ve taken the initiative, I’m not familiar with kdb+ nor q but I’ll take a look on your package.

robsmith11 · April 26, 2020, 8:31am

I’ve pushed a fix that should solve the problem with evaluation errors.

Performance is acceptable for some purposes (1-3 milliseconds per function call), but slower than it could be because deserialization isn’t type-stable.

I plan on adding a type-stable function call that should get performance closer to 100 microseconds. At that point, I might add named pipe support as well.

Thomas_Markusic · February 24, 2023, 4:04pm

Thanks for creating SimpleIPC.jl. I am having problems calling remote process user functions (versus julia-native functions), for example, using ipc_eval. Can you provide some guidance?

Topic		Replies	Views
Should we control/optimize communications in Julia? General Usage	6	466	June 24, 2022
Reliable networking between Julia processes? General Usage	2	485	November 30, 2018
How to interact with a Julia process inside a Julia process General Usage	2	307	September 20, 2021
Two level distributed / parallel execution Julia at Scale question	4	1076	April 22, 2020
What's under the hood of Julia's remote call procedure? Internals & Design distributed , rpc , remote	2	305	April 2, 2025

Data exchange between Julia sessions (same or different machines)

Related topics