Let’s say I have a DataFrame and a Dict:
df = DataFrame(
a = 1:3,
b = ["beetle", "fly", "spider"])
arthropod_dict = Dict(
"beetle" => "insect",
"fly" => "insect",
"spider" => "arachnid")
And I’d like to apply the Dict like a map to the DataFrame, producing
df = DataFrame(
a = 1:3,
b = ["beetle", "fly", "spider"],
c = ["insect", "insect", "arachnid"])
Is there a way to do that using transform
or another DataFrame function so that I can apply the Dict within a @chain
call?
1 Like
transform(df, :b => ByRow(x -> arthropod_dict[x]) => :c)
4 Likes
Nice (own) answer.
I really like using the @eachrow
macro in DataFramesMeta.jl too:
@eachrow df begin
@newcol c::Vector{String}
:c = arthropod_dict[:b]
end
also compatible within a @chain
call:
@chain df begin
filter(:a => >(2), _) # for example
@eachrow begin
@newcol c::Vector{String}
:c = arthropod_dict[:b]
end
end
It’s a bit more than a one-liner but handy for workflows with more complicated transforms.
2 Likes
Note that there is no requirement to use transform
for everything in DataFrames, and at times I find it actively reducing code clarity. In my opinion (and this is nothing more!) this is one of those situations. I would write:
df[!, :c] = [arthropod_dict[x] for x ∈ df.b]
6 Likes
I would like to add a different solution (which, incidentally, also takes into account situations where the dictionary is not complete)
df = DataFrame(
a = 1:4,
b = ["beetle", "fly", "spider","unicorno"])
arthropod_dict = Dict(
"beetle" => "insect",
"fly" => "insect",
"spider" => "arachnid")
dict=DataFrame(from=collect(keys(arthropod_dict)),to=collect(values(arthropod_dict)))
in this case a sort of function transpose(dataframe) would have been convenient
outerjoin(df,dict,on=:b=>:from)
a variant @nilshg solution
df[!, :c] = [get(arthropod_dict,x,"unknown") for x ∈ df.b]
2 Likes
Just a quick thank you for this. I’ve been wrestling for hours with trying to do just this and coming very close, but eating lots of errors.
1 Like