Sharing python memory

so I am going to be populating a python data structure with financial instrument data. I want to use Julia to analyze the information. What is the most effective way to share the data between python and julia.
thank you

PyCall? I don’t know if they’re able to literally share objects in memory, but this makes it quite simple to pass data back and forth.

Is there some data structure that exists in python but does not in julia? Or is it just the library that does the work of building the data structure that you need?

thank you for the reply, most kind.

I don’t think Pycall will help me as I couldn’t see how it would share a Python data structure. SO to answer your question, I want to have the julia script access the Python memory directly.

Here is a better description of the usecase.

a python script sets up an asynch io stream to a brokerage via an api

it requests a stream of data for 300 financial instruments ie LAST, Implied vol.

the data populates a structure in Python memory and updates the relevant locations as new data comes in.

the julia script takes a periodic “snapshot” of the Python data structure.

PyCall can easily share data structures in memory with Python (libpython is loaded and running in the same process as libjulia).

3 Likes

It can??? could you point me at an example in the docs please? I haven’t really started using Julia yet but 2021 will be the make or break year. What I would appreciate is an example that does something simple. Just the Python script which creates a dataframe and the Julia code that accesses it. I can’t seem to find a SIMPLE example that would help me learn by doing.

Every Python object (PyObject) in PyCall is shared in memory with Python.

Python lists and arrays are converted by default when they are returned to Julia, but you can suppress the conversion by passing a specific return type (e.g. PyObject) to the pycall function or by accessing attributes as o."attribute" rather than o.attribute. If you want to represent a Python list as a Julia array type while sharing memory, ask for it as a PyVector. Similarly for a Python dictionary with PyDict and a NumPy array with PyArray. Going the other direction, when a Julia array is passed to Python, it is automatically passed as a NumPy array that shares memory.

6 Likes

hi there
thank you so much for such a considered reply. I edited my reply to your previous before reading this one. What I was hoping for was a simple worked example as I don’t really want to know how the sausage is made :wink: BUT I am reading the docs tomorrow!!

thank you again for your wonderful guidance. Happy new year

Here is Julia code that creates a pandas.DataFrame:

julia> using PyCall

julia> pd = pyimport("pandas")
PyObject <module 'pandas' from '/Users/stevenj/.julia/conda/3/lib/python3.7/site-packages/pandas/__init__.py'>

julia> df = pd.DataFrame([["tom",10], ["nick",15], ["julie",14]], columns=["Name", "Age"])
PyObject     Name  Age
0    tom   10
1   nick   15
2  julie   14

julia> typeof(df)
PyObject

The df object in Julia is shared with Python.

(A somewhat nicer Julia interface to Pandas can be found in the Pandas.jl package, which is built on top of PyCall and still shares data.)

It is pretty hard to program unless your mental model is at least a rough facsimile of what is actually happening. (As Knuth once wrote on a related note, People who are more than casually interested in computers should have at least some idea of what the underlying hardware is like. Otherwise the programs they write will be pretty weird.)

I think that many people have some notion that PyCall must be passing data to/from Python by constructing strings of Python code and running it with the python executable, which would make it impossible to share data, among many other problems. That’s not what it does — it loads the libpython library in memory and directly accesses Python objects via Python’s C API.

4 Likes

I agree that I have been too lax in understanding the rudiments of julia. Isn’t my mental model a fair representation of what is actually happening?
I create a dataframe in Python
Magically I access the dataframe in Julia
the dataframe in Python changes
I see the same change in the Julia dataframe.
I’d say that’s an exact facsimile of what I would expect to happen. I am not being funny, it’s a serious question. Do I really have to understand how a complier works to use it? How a REPL functions?

As to how pycall functioned, your original pointer to the docs cleared that up for me. I still think that the docs would benefit from a simple example showing both sides of the coin

I just conferenced with a chum who is NOT making the switch to Julia as he finds the lack of simple worked examples ( python to julia interelationship) a problem. They have about 20 people using python and thousands of lines of python script.

I’ll see if I can come up with something using pycall.

I will certainly look into this tomorrow and thank you for taking the time

I think you need to distinguish between running separate Python and Julia processes and sharing data (much more difficult, and not clear why you would want to do that), versus running a single process with inter-language calls (either a Julia process calling Python code or vice versa).

Let’s say that there is an existing and internally tested and ratified library of Python scripts. So they wish to consider using Julia to handle the parallel processing aspect of the next generation data visualization system. Wouldn’t it be more prudent to leave the existing code base in place and have julia access the existing memory without modifying the tried and tested code?

In MY case I am happy to test out the single process call approach. It’s my chum who has the problem.

Sure. Just put them into Python modules and call them from Julia.

(Conversely, if you are running a bunch of Python “scripts” as individual python runs, then you aren’t even sharing memory in Python between one script and another. You can also exec them from Julia if you haven’t organized them into re-usable modules, but organizing code into separately run “scripts” is a long-term headache in any language. Unless I’m misunderstanding your use of the term “scripts”?)

4 Likes

makes sense AND clearly illustrates you point that my ignorance of what Pycall does and how it does it is the problem here :slight_smile:

lesson learnt TO THE DOCUMENTATION LADS!!

thank you for all your help and have an excellent new year

Here is a simple Pluto.jl Notebook I prepared some time ago showing some Python interoperability, maybe it is useful for you.

https://gist.github.com/lungben/b2766fb37d57cd429da05892fd26e364

You can open the Gist link directly in Pluto.

2 Likes

Hi there
thank you so much for taking the time and it’s a great help. I can get the gist of what is going on though I think my situation is much simpler than yours :slight_smile: thank you again.

have an excellent new year.

2 Likes

hi there
we looked over pycall and it’s a great idea BUT, there is always a but, my chum thinks that taking that path would cause too much disruption to the existing code base. They are using modules but their internal code auditors would need all of them to be retested post pycall code insertion. Apparently it would cost $100,000 + for recertification. This is money that could be allocated to facilitating the move of the developer base to Julia.

There is another route I was looking at using Apache arrow, I was going to suggest it to my chum.

f you have any thoughts about the concerns about Pycall or taking the Apache Arrow route to integrate python and julia happy to read them.

I think there may be a misunderstanding here. PyCall does not require you to edit existing Python code. If they have pretty much everything in modules already, it’s just a matter of importing from those modules in Julia via PyCall.

Arrow is a great format for sharing data between different languages, but it’s a very different paradigm. You won’t be able to send an array from Python → Julia, modify an element (in Julia) and see that change reflected in the original data (in Python). In that sense it’s better to think of Arrow as an efficient IPC format instead of a big block of shared memory. If that works for your use-case though, I think it’s a pretty viable path. At least a couple of people have done something similar.

1 Like

Hi there
no misunderstanding but I can see why you would think that given the way I wrote it. In both our cases ( mine and my chums) the modules are the atomic actions in a finite state machine. Each atomic is part of an ACTION list which is called as a result of a EVENT/STATE interaction. They have about 100 state machines. I have 3.

In my case it would be a doddle to implement Pycall BUT I don’t have the internal audit people breathing down my neck armed with hashing values which detect ANY change in the state machines. So ANY change in the environment would kick off a revisting of the audit and adding a new language to the mix would certainly cause issues. One of the things that caused me to suggest Julia to my chum is that it’s IDEALLY suited to the whole FSM concept we base our systems on. ALSO we can see the HUGE advantages of Julia’s use in the Nvidia gpu world.

Neither of us want Julia to modify anything, it would just be READ only. Again that’s MY fault for asking the question in such a dumb way. Sorry about that everyone. I think Arrow is going to be our choice, can you point me at anyone in the Julia community who has trodden this path please? thank you so much for you help