How to edit row values of a dataframe column based on condition using query.jl

i have filtered unique values from a column :CarCompany from dataframe:

df |> @map(_.CarCompany) |> @unique() |> collect;
the output shows fields with spelling mistakes:
28-element Array{SubString{String},1}:
“alfa-romero”
“audi”
“bmw”
“chevrolet”
“dodge”
“honda”
“isuzu”
“jaguar”
“maxda”
“mazda”
“buick”
“mercury”
“mitsubishi”

“plymouth”
“porsche”
“porcshce”
“renault”
“saab”
“subaru”
“toyota”
“toyouta”
“vokswagen”
“volkswagen”
“vw”
“volvo”

How to edit the rows with spelling mistake using Query.jl ?

Please make sure that you quote your code with triple backticks, and ideally include an MWE with your question.

DataFrames columns can just be treated as vectors so you can broadcast a replace string operation:

julia> using DataFrames

julia> df = DataFrame(CarCompany = rand(["porsche", "porcshce"], 10))

julia> df.CarCompany = replace.(df.CarCompany, Ref("porcshce" => "porsche"))

julia> unique(df.CarCompany)
1-element Array{String,1}: 
"porsche"
5 Likes

What is the Ref needed for? Is there something in the documentation for this?

Basically to make it look like a scalar when it comes to broadcasting because pairs (perhaps dubiously) have shape:

julia> (1 => 2) .+ [3, 4]
2-element Array{Int64,1}:
 4
 6
1 Like

Thanks, but I am still (very) confused. The ref makes what look like a scaler? What happens when I Ref your example? (Sorry a little off-topic here)

julia> (1 => 2) .+ [3, 4]
2-element Array{Int64,1}:
4
6

julia> Ref((1 => 2)) .+ [3, 4]
ERROR: MethodError: no method matching +(::Pair{Int64,Int64}, ::Int64)

Exactly, then it is not broadcasted over.

Okay, last question. Why do we need to Ref in the replace function? Maybe this is a DataFrames specific question.

Because we want the replacement (which is a pair) to apply to all elements in the vector of strings. Otherwise broadcasting would try to fuse the pair with the vector.

1 Like