i have filtered unique values from a column :CarCompany from dataframe:
df |> @map (_.CarCompany) |> @unique () |> collect;
the output shows fields with spelling mistakes:
28-element Array{SubString{String},1}:
“alfa-romero”
“audi”
“bmw”
“chevrolet”
“dodge”
“honda”
“isuzu”
“jaguar”
“maxda”
“mazda”
“buick”
“mercury”
“mitsubishi”
⋮
“plymouth”
“porsche”
“porcshce”
“renault”
“saab”
“subaru”
“toyota”
“toyouta”
“vokswagen”
“volkswagen”
“vw”
“volvo”
How to edit the rows with spelling mistake using Query.jl ?
nilshg
July 17, 2019, 1:26pm
2
Please make sure that you quote your code with triple backticks, and ideally include an MWE with your question.
DataFrames columns can just be treated as vectors so you can broadcast a replace
string operation:
julia> using DataFrames
julia> df = DataFrame(CarCompany = rand(["porsche", "porcshce"], 10))
julia> df.CarCompany = replace.(df.CarCompany, Ref("porcshce" => "porsche"))
julia> unique(df.CarCompany)
1-element Array{String,1}:
"porsche"
5 Likes
affans
July 17, 2019, 1:33pm
3
What is the Ref
needed for? Is there something in the documentation for this?
Basically to make it look like a scalar when it comes to broadcasting because pairs (perhaps dubiously) have shape:
julia> (1 => 2) .+ [3, 4]
2-element Array{Int64,1}:
4
6
1 Like
affans
July 17, 2019, 1:44pm
5
Thanks, but I am still (very) confused. The ref
makes what look like a scaler? What happens when I Ref
your example? (Sorry a little off-topic here)
julia> (1 => 2) .+ [3, 4]
2-element Array{Int64,1}:
4
6
julia> Ref((1 => 2)) .+ [3, 4]
ERROR: MethodError: no method matching +(::Pair{Int64,Int64}, ::Int64)
Exactly, then it is not broadcasted over.
affans
July 17, 2019, 6:27pm
7
Okay, last question. Why do we need to Ref
in the replace
function? Maybe this is a DataFrames
specific question.
Because we want the replacement (which is a pair) to apply to all elements in the vector of strings. Otherwise broadcasting would try to fuse the pair with the vector.
1 Like