If I have this df
df1 = DataFrame(
field = repeat(["A", "B", "C", "D", "E"], 12),
data = rand(60),
group = repeat(1:4, outer = 15))
if i try to do the following i get the suggestion to use allowduplicates
julia> unstack(df1,:group,:data)
ERROR: ArgumentError: Duplicate entries in unstack at row 21 for key ("A",) and variable 1. Pass allowduplicates=true to allow them.
following the suggestion I get the following result.
julia> unstack(df1,:field,:data,allowduplicates=true)
4×6 DataFrame
Row │ group A B C D E
│ Int64 Float64? Float64? Float64? Float64? Float64?
─────┼─────────────────────────────────────────────────────────
1 │ 1 0.375523 0.341876 0.360489 0.925469 0.593533
2 │ 2 0.310565 0.21974 0.121332 0.730402 0.787124
3 │ 3 0.37437 0.164914 0.350874 0.857969 0.291186
4 │ 4 0.856412 0.905269 0.132294 0.168782 0.232174
which would be equivalent to the following code, but done explicitly.
cgl=combine(groupby(df1,[:group,:field]),:data=>last)
ucgl=unstack(cgl, :field,:data_last)
I am sure that the matter has been dealt with, but I have not found where and how.
I wonder if in these cases it is not useful to add a function that does aggregation, instead of kw = allowduplicates.