Passing DataFrame via ZMQ


#1

I want to pass a DataFrame via ZMQ. What would be the most effective way to do that?

I was thinking about serializing with JSON but not sure how to deserialize at the other end without assuming the internal structure of DataFrame, plus I lose the column type information… Any suggestions?

julia> y = JSON.json(DataFrame(a=1:10, b=rand(10), c=["abc$i" for i in 1:10]))
"{\"columns\":[[1,2,3,4,5,6,7,8,9,10],[0.33594662097541916,0.2867361571846303,0.31531071798857035,0.06042841515321973,0.10010429483452854,5.7254177610932544e-5,0.3638153944324909,0.3119557802963746,0.6338407676412947,0.7744933397995217],[\"abc1\",\"abc2\",\"abc3\",\"abc4\",\"abc5\",\"abc6\",\"abc7\",\"abc8\",\"abc9\",\"abc10\"]],\"colindex\":{\"lookup\":{\"a\":1,\"b\":2,\"c\":3},\"names\":[\"a\",\"b\",\"c\"]}}"

julia> x = JSON.parse(y)
Dict{String,Any} with 2 entries:
  "colindex" => Dict{String,Any}(Pair{String,Any}("names", Any["a", "b", "c"]),Pair{String,Any}("lookup", Dict{…
  "columns"  => Any[Any[1, 2, 3, 4, 5, 6, 7, 8, 9, 10], Any[0.335947, 0.286736, 0.315311, 0.0604284, 0.100104, …

julia> z = DataFrame(x["columns"], Symbol.(x["colindex"]["names"]))
10×3 DataFrames.DataFrame
│ Row │ a  │ b          │ c     │
├─────┼────┼────────────┼───────┤
│ 1   │ 1  │ 0.335947   │ abc1  │
│ 2   │ 2  │ 0.286736   │ abc2  │
│ 3   │ 3  │ 0.315311   │ abc3  │
│ 4   │ 4  │ 0.0604284  │ abc4  │
│ 5   │ 5  │ 0.100104   │ abc5  │
│ 6   │ 6  │ 5.72542e-5 │ abc6  │
│ 7   │ 7  │ 0.363815   │ abc7  │
│ 8   │ 8  │ 0.311956   │ abc8  │
│ 9   │ 9  │ 0.633841   │ abc9  │
│ 10  │ 10 │ 0.774493   │ abc10 │

julia> showcols(z)
10×3 DataFrames.DataFrame
│ Col # │ Name │ Eltype │ Missing │ Values                │
├───────┼──────┼────────┼─────────┼───────────────────────┤
│ 1     │ a    │ Any    │ 0       │ 1  …  10              │
│ 2     │ b    │ Any    │ 0       │ 0.335947  …  0.774493 │
│ 3     │ c    │ Any    │ 0       │ abc1  …  abc10        │



#2

Looks like Base.serialize / Base.deserialize is exactly what I was looking for. I should have RTFM :slight_smile:

Any other ideas welcome.


#3

If this is only passing between processes, and never saved on disk or to a database, then serialize/deserialize should be fine.


#4

Yes, the data is transient for IPC only. Thanks, Scott.