Ifelse and datawrangling

Let’s say I have the dataframe

using DataFrames
a1=DataFrame(A=1:3, B=5:7)

and I want to replace 1 by USA, 2 by MEX, and 3 by CAN in the first column. I can do

a2=hcat(
    ifelse.(a1[!,[:A]].==1,"USA",
    ifelse.(a1[!,[:A]].==2,"MEX",
    ifelse.(a1[!,[:A]].==3,"CAN",
    "ERROR"))),select(a1,:B)
    )

This has exactly the effect that I want. However, I would like to issue this command using two vectors x=[1;2;3] and y=[“USA”,“MEX”,“CAN”] in order to make it more flexible, so I don’t have to change the code when these vectors change. Is there a way to do this?

1 Like

You might want to look at CategoricalArrays.jl, and the recode!() function. I think that interface is really nice and intuitive, and designed for exactly this kind of situation.

Another approach that doesn’t use another package would be to use a dictionary.

d = Dict(k => v for (k, v) in zip(1:3, ("USA", "MEX", "CAN")))

a1.label = [d[n] for n in a1.A]
3 Likes

Thank you @kevbonham – just what I was looking for!

or

d = Dict(enumerate(["USA", "MEX", "CAN"]))
3 Likes

insertcols!(a1,:label => [ "USA", "MEX", "CAN"][a1.A])

3 Likes

Another variant:

using DataFrames

a1 = DataFrame(A=1:4, B=5:8)

x = [1; 2; 3]
y = ["USA", "MEX", "CAN"]
d = Dict(x .=> y)
a1[!,1] = get.(Ref(d), a1[:,1], missing)
a1
1 Like

Ooh, handling the missing key, I dig it!