Here is an example what I understand you need:
julia> using DataFrames
julia> robots = [DataFrame(x=rand(2), y=rand(2)) for i in 1:3] # 3 robots
3-element Vector{DataFrame}:
2×2 DataFrame
Row │ x y
│ Float64 Float64
─────┼────────────────────
1 │ 0.832353 0.395642
2 │ 0.250032 0.403825
2×2 DataFrame
Row │ x y
│ Float64 Float64
─────┼────────────────────
1 │ 0.308752 0.560986
2 │ 0.420184 0.595701
2×2 DataFrame
Row │ x y
│ Float64 Float64
─────┼────────────────────
1 │ 0.860213 0.633597
2 │ 0.785208 0.829677
julia> vcat(robots..., source=:robot_id)
6×3 DataFrame
Row │ x y robot_id
│ Float64 Float64 Int64
─────┼──────────────────────────────
1 │ 0.832353 0.395642 1
2 │ 0.250032 0.403825 1
3 │ 0.308752 0.560986 2
4 │ 0.420184 0.595701 2
5 │ 0.860213 0.633597 3
6 │ 0.785208 0.829677 3
julia> res = reduce(vcat, robots, source=:robot_id) # alternative syntax that might be faster if you have millions of robots
6×3 DataFrame
Row │ x y robot_id
│ Float64 Float64 Int64
─────┼──────────────────────────────
1 │ 0.832353 0.395642 1
2 │ 0.250032 0.403825 1
3 │ 0.308752 0.560986 2
4 │ 0.420184 0.595701 2
5 │ 0.860213 0.633597 3
6 │ 0.785208 0.829677 3
Now if you want to pick robot by :robot_id
do:
julia> gdf = groupby(res, :robot_id)
GroupedDataFrame with 3 groups based on key: robot_id
First Group (2 rows): robot_id = 1
Row │ x y robot_id
│ Float64 Float64 Int64
─────┼──────────────────────────────
1 │ 0.832353 0.395642 1
2 │ 0.250032 0.403825 1
⋮
Last Group (2 rows): robot_id = 3
Row │ x y robot_id
│ Float64 Float64 Int64
─────┼──────────────────────────────
1 │ 0.860213 0.633597 3
2 │ 0.785208 0.829677 3
julia> gdf[(robot_id=1,)] # passing a NamedTuple with robot id
2×3 SubDataFrame
Row │ x y robot_id
│ Float64 Float64 Int64
─────┼──────────────────────────────
1 │ 0.832353 0.395642 1
2 │ 0.250032 0.403825 1
julia> gdf[(1,)] # the same, but passing a Tuple with robot id
2×3 SubDataFrame
Row │ x y robot_id
│ Float64 Float64 Int64
─────┼──────────────────────────────
1 │ 0.832353 0.395642 1
2 │ 0.250032 0.403825 1
In this case group number and robot id are the same so you could write:
julia> gdf[1]
2×3 SubDataFrame
Row │ x y robot_id
│ Float64 Float64 Int64
─────┼──────────────────────────────
1 │ 0.832353 0.395642 1
2 │ 0.250032 0.403825 1
to get the first group, but in general robot id could be any identifier, eg.:
julia> reduce(vcat, robots, source=:robot_random_id => rand(UInt64, length(robots)))
6×3 DataFrame
Row │ x y robot_random_id
│ Float64 Float64 UInt64
─────┼──────────────────────────────────────────
1 │ 0.832353 0.395642 8303239618947412160
2 │ 0.250032 0.403825 8303239618947412160
3 │ 0.308752 0.560986 2175806809747680279
4 │ 0.420184 0.595701 2175806809747680279
5 │ 0.860213 0.633597 12898194048670657353
6 │ 0.785208 0.829677 12898194048670657353
This is a solution if you want to store all robots in a single data frame (this has many benefits - e.g. later aggregation is much easier if you do so). But maybe you would prefer something else. If this is the case please comment.
On the other hand - if you want to use robot information for compute intensive operations that are not aggregations (e.g. in simulation, where you would access intividual elements of vectors) then most likely you will be better off by not using DataFrame
to store robot information but rather e.g. Tables.columntable
(i.e. NamedTuple
of vectors) as it will be type stable and faster in such applications. DataFrame
is designed for efficient working with whole columns.