Joining on a DataFrame on an argument and column

Hello,
Is it possible to join DataFrames on an argument and a column header? E.g.:

New_DF = innerjoin(DF1,DF2, on = ID => :Store_ID)

where “ID” is an argument of a function and “:Store_ID” is a specific column within DF2?

For more context, I have user input data that contains a DataFrame and I do not know the column name of the ID / Key. There is another DataFrame created in a separate function that I will eventually need to join with the user input data. Because the second DataFrame is created it will always have a consistent column header and thus I know what key to use.

I would expect this to work:


function join_the_df(;data_1::DataFrame,key::String)
    DF2 = DataFrame(ID=[20, 40], Job=["Lawyer", "Doctor"])
    New_DF = innerjoin(data_1,DF2, on = key => :ID)
    return(New_DF)
end

join_the_df(;
    data_1 = User_DF,
    key = "Identification"
    )

but it throws the error:
ERROR: TypeError: in keyword argument on, expected Union(AbstractVector, var"#s493"} where var"#s493"<:Union{AbstractString, Pair{Symbol, Symbol}, Symbol, Tuple{Symbol, Symbol}, Tuple{Symbol, Symbol}, Pair{<:AbstractString, <:AbstractString}}, got a value of type Pair{String, Symbol}

(btw how do I copy out of the Terminal in VSCode so I don’t have to type this error message? :slight_smile: )

If I modify the code and hard-code the value in, it works… but this would defeat the purpose of the function. E.g.:


function join_the_df(;data_1::DataFrame)
    DF2 = DataFrame(ID=[20, 40], Job=["Lawyer", "Doctor"])
    New_DF = innerjoin(data_1,DF2, on = :Identification => :ID)
    return(New_DF)
end

join_the_df(;
    data_1 = User_DF,
    )

Thank you for your help!

Have you tried:

New_DF = innerjoin(DF1,DF2, on = Symbol(ID) => :Store_ID)

The :somename notation is a Julia literal of type Symbol which can also be written Symbol("somename").

2 Likes

Thanks – This worked. Would advocate for a bit better error message here though to help noobs like me :slight_smile: Earlier in my script I mixed arguments and literals together and DataFrames gave me a message saying, “did you mean to use Cols?” (The error message came from this issue: why `mixing strings with other selectors is not allowed`? · Issue #3241 · JuliaData/DataFrames.jl · GitHub) would be useful to have something similar here.

As the error message indicates, the key column identifiers for the join must be provided as symbol pairs or string pairs. A mixed pair is not accepted.

julia> using DataFrames

julia> function join_the_df(;data_1::DataFrame,key::String)
           DF2 = DataFrame(ID=[20, 40], Job=["Lawyer", "Doctor"])
           New_DF = innerjoin(data_1,DF2, on = key => "ID")
           return(New_DF)
       end
join_the_df (generic function with 1 method)

julia> User_DF = DataFrame(Identification=[10, 20, 30, 40], name=["Ab", "Bc", "Cd", "De"])
4×2 DataFrame
 Row │ Identification  name   
     │ Int64           String
─────┼────────────────────────
   1 │             10  Ab
   2 │             20  Bc
   3 │             30  Cd
   4 │             40  De

julia> join_the_df(;
           data_1 = User_DF,
           key = "Identification"
           )
2×3 DataFrame
 Row │ Identification  name    Job    
     │ Int64           String  String
─────┼────────────────────────────────
   1 │             20  Bc      Lawyer
   2 │             40  De      Doctor