DataFrame transformation question

johann.spies · March 11, 2019, 2:37pm

What I want to do is something like “unstack” but I do not know how to do it.

A simple example:

l = DataFrame(a = ["a", "a", "b", "b"], c = ["ZA", "ZM", "BW", "ZA"]) 
4×2 DataFrame
│ Row │ a      │ c      │
│     │ String │ String │
├─────┼────────┼────────┤
│ 1   │ a      │ ZA     │
│ 2   │ a      │ ZM     │
│ 3   │ b      │ BW     │
│ 4   │ b      │ ZA     │

I want to convert l to a dataframe that looks something like this:

| Row | x1 | x2           |
|-----+----+--------------|
|   1 | a  | ["ZA", "ZM"] |
|   2 | b  | ["BW", "ZA"] |

In the end I want to work with l[:x2].
In the real world the array in x2 will be of unpredictable length.

At the moment I am doing something very inefficient like this on a DataFrame with more than 400000 rows where z is the dataframe and values an array with the unique values that would be l[:a] in the above example and combinations would be the equivalent of l[:x2] in the example above.

 combinations = []
 
@time for v in values
         push!(combinations, Set(filter(row -> row[:v] == v, z)[:code]))
     end

There must be a more efficient way of doing this.

amellnik · March 11, 2019, 3:19pm

by(l, :a, c_values = :c => x -> [unique(x)])
2×2 DataFrame
│ Row │ a      │ c_values     │
│     │ String │ Array…       │
├─────┼────────┼──────────────┤
│ 1   │ a      │ ["ZA", "ZM"] │
│ 2   │ b      │ ["BW", "ZA"] │

Will do the trick.

davidanthoff · March 11, 2019, 4:30pm

A slightly more concise version with Query.jl would be this:

l |> @groupby(_.a) |> @map({a=key(_), cv=unique(_.c)}) |> DataFrame

Topic		Replies	Views
Unstack using row numbers Data question , dataframes	5	742	August 16, 2021
Combining a col from each DF group into a single DF New to Julia question , dataframes	5	296	August 25, 2022
Best practice to unstack a dataframe with a lot of columnes? General Usage question , dataframes	5	1247	March 11, 2020
Unstack dataframe with combine by the same date General Usage question	1	168	November 23, 2023
New DataFrame whose columns are values of a column and grouped by another column General Usage dataframes	2	328	May 14, 2021

DataFrame transformation question

Related topics