How to share a dataframe across machines on a wired wan

I want to share a realtime dataframe across a wired wan so others can READ the data ONLY. It’s my data and I don’t want it polluted by my python “colleagues” I have seen what they can do :slight_smile:

they are envious of my delight in making the leap to julia which gives me pleasure :slight_smile:

I am TRYING to avoid the noise of a web solution to this.

loving using DataFrames.jl especially now I am reading @bkamins julia for data analysis to help me consider things more rationally.

One thing is bothering me though.

I can build a dataframe and maintain it real time with a ZMQ data stream BUT that’s on My machine and I am a happy bunny. BUT others are whining that I am being a data hog and not sharing. I coded up a web app using stipple.jl but the real time data feed keeps freezing randomly.

I LOVE the new Pluto initiative to provide for real time population of a cell. That married with the Pluto api ability is exciting as well but before heading down that route, which I will, I was wondering IF there is a way to share my dataframe with others across our wired wan. I am familiar with sharedarray but would like to investigate sharing the dataframe if possible.
thanks for ANY help
a LONELY Chicago based Julia amateur developer :slight_smile:

How do you want to share the data frame exactly? I.e. how do you want the Python process to be allowed to query the Julia process for data? Do you want to create a web-service that allows other users to query your data? (which could make sense if your data is small). If this is the case then the pattern you could use is shown, for example, in JuliaForDataAnalysis/ch14_server.jl at main · bkamins/JuliaForDataAnalysis · GitHub (you just need to change the code so that it serves the data frame in JSON payload)

1 Like

hi there @bkamins thanks for the reply

conceptually all that would happen is that a dataframe would MAGICALLY appear on each of the remote machines on the network.

I can achieve this easily using ZMQ by just running the julia script on each machine ( loading from cron to make my life easy) and sending the message stream out to multiple listeners so they can do what the like with it. This means that the dataframe would be constructed on all their machines thus NOT shared. It would be the simple approach ( I don’t want to have any web pages here).

Another approach would be dataframe to be an in memory construct possibly Parquet but I haven’t thought that far ahead. Again attractive to me.

I haven’t gotten to chapter 14 in your WONDERFUL book. Thanks for the SPOILER :slight_smile: I like that idea and will certainly look into it next week. I wanted to avoid using anything to do with web to be honest as I REALLY want to look into PlutoHooks.jl from @fonsp and from the splendid @lungben

I also considered just using a sharedarray and making the python luddites ( just kidding) do some work.

looking forward to attending your course in Chicago later in 2023. Thanks again for your wonderful blog which is a must read for me munching breakfast in the house of pancakes.

dent ( lonely julia person in Chicago)

1 Like

You could transform it to Arrow.jl format, as it will be easy to read for Python users.

1 Like

Hmm
I forgot about arrow, thanks for the reminder. I was checking out Parquet2.jl I might try all of the approaches. Thanks for the reminder

Awesome! :tada: Just want to say that PlutoHooks.jl was mainly developed by our collaborator Paul Berg, who would also be more than happy to answer your questions! (paul@plutojl.org)

1 Like

Hey fons THANK YOU for such a inspirational application. I am looking forward to Paul’s wonderful real time cell generation. I have a couple of issues with julia but NOTHING to do with Pluto. Paul has already helped me our tremendously.
thanks again

1 Like