What I want to do is something like “unstack” but I do not know how to do it.
A simple example:
l = DataFrame(a = ["a", "a", "b", "b"], c = ["ZA", "ZM", "BW", "ZA"])
4×2 DataFrame
│ Row │ a │ c │
│ │ String │ String │
├─────┼────────┼────────┤
│ 1 │ a │ ZA │
│ 2 │ a │ ZM │
│ 3 │ b │ BW │
│ 4 │ b │ ZA │
I want to convert l to a dataframe that looks something like this:
| Row | x1 | x2 |
|-----+----+--------------|
| 1 | a | ["ZA", "ZM"] |
| 2 | b | ["BW", "ZA"] |
In the end I want to work with l[:x2]
.
In the real world the array in x2 will be of unpredictable length.
At the moment I am doing something very inefficient like this on a DataFrame with more than 400000 rows where z is the dataframe and values an array with the unique values that would be l[:a]
in the above example and combinations
would be the equivalent of l[:x2]
in the example above.
combinations = []
@time for v in values
push!(combinations, Set(filter(row -> row[:v] == v, z)[:code]))
end
There must be a more efficient way of doing this.