Group by Multi Column Dataframe to share ID? Maybe?

You can create a dictionary of currency to ID mapping outside of the dataframe:

julia> df = DataFrame(currencyA = rand(["USD", "EUR", "JPY"], 8), currencyB =  rand(["USD", "EUR", "JPY"], 8))
8×2 DataFrame
 Row │ currencyA  currencyB 
     │ String     String    
─────┼──────────────────────
   1 │ JPY        JPY
   2 │ USD        JPY
   3 │ JPY        JPY
   4 │ USD        EUR
   5 │ JPY        EUR
   6 │ EUR        EUR
   7 │ USD        USD
   8 │ JPY        JPY

julia> curr_ids = Dict(map(enumerate(unique(df.currencyA))) do (idx, currency)
         currency => idx
       end)
Dict{String, Int64} with 3 entries:
  "EUR" => 3
  "JPY" => 1
  "USD" => 2

and then use that to generate the id columns in one go:

julia> transform(df, [:currencyA, :currencyB] .=> ByRow(c -> curr_ids[c]) .=> [:idA, :idB])
8×4 DataFrame
 Row │ currencyA  currencyB  idA    idB   
     │ String     String     Int64  Int64 
─────┼────────────────────────────────────
   1 │ JPY        JPY            1      1
   2 │ USD        JPY            2      1
   3 │ JPY        JPY            1      1
   4 │ USD        EUR            2      3
   5 │ JPY        EUR            1      3
   6 │ EUR        EUR            3      3
   7 │ USD        USD            2      2
   8 │ JPY        JPY            1      1
1 Like