Hi all-
I seeking solutions for two problems I have encountered while manipulating dataframes. Iβve been struggling to think of a good solution. Although I can probably duct tape some solutions, I was wondering whether there are utilities for these operations or elegant solutions.
Problem 1
I need to recode multiple columns in a dataframe into a variable such that each unique combination of variables in the old columns is assigned a new unique value in the variable. In the following example, unique combinations of values in columns a and b are recoded into the column new_indicator:
using DataFrames
df = DataFrame(a=[1,1,2,2,1],b=[1,2,1,2,1],new_indicator=[1,2,3,4,1])
Output:
5Γ3 DataFrame
β Row β a β b β new_indicator β
β β Int64 β Int64 β Int64 β
βββββββΌββββββββΌββββββββΌββββββββββββββββ€
β 1 β 1 β 1 β 1 β
β 2 β 1 β 2 β 2 β
β 3 β 2 β 1 β 3 β
β 4 β 2 β 2 β 4 β
β 5 β 1 β 1 β 1 β
Problem 2
I have a second problem in which I want to remove duplicate rows (defined by a set of columns) and create a new column for the number of duplicates. Here is an example:
Current data
using DataFrames
df = DataFrame(a=[1,1,2,2,3,3],b=[1,1,2,2,1,1],c=[1,1,2,2,1,2])
6Γ3 DataFrame
β Row β a β b β c β
β β Int64 β Int64 β Int64 β
βββββββΌββββββββΌββββββββΌββββββββ€
β 1 β 1 β 1 β 1 β
β 2 β 1 β 1 β 1 β
β 3 β 2 β 2 β 2 β
β 4 β 2 β 2 β 2 β
β 5 β 3 β 1 β 1 β
β 6 β 3 β 1 β 2 β
desired data:
df = DataFrame(a=[1,2,3,3],b=[1,2,1,1],c=[1,2,1,2],counts=[2,2,1,1])
4Γ4 DataFrame
β Row β a β b β c β counts β
β β Int64 β Int64 β Int64 β Int64 β
βββββββΌββββββββΌββββββββΌββββββββΌβββββββββ€
β 1 β 1 β 1 β 1 β 2 β
β 2 β 2 β 2 β 2 β 2 β
β 3 β 3 β 1 β 1 β 1 β
β 4 β 3 β 1 β 2 β 1 β
Thanks in advance.