Hi, I have got a dataset with a city and country column. However, there are some names in country column are actually the same place but different names, for example:
row 7 and 123, I want both country name be Australia, the way I prefer is to use package named Countries List of country codes by alpha-2, alpha-3 code (ISO 3166) and joins function(leftjoins, semijoinβ¦) from inmemorydataset package.
I have tried to use joins but it just didnβt work:
Here is the code to get the dataset which named ds and countries named count:
using InMemoryDatasets,DLMReader, Countries
import Downloads
data=Downloads.download("https://raw.githubusercontent.com/akshdfyehd/travel/main/Travel%20details%20dataset.csv")
data=filereader(data, quotechar='"', dtformat=Dict(3:4 .=> dateformat"m/d/y"))
data=data[completecases(data),:]
modify!(data, 11 => x -> parse.(Int, replace.(x, "\$" =>"","USD" =>"",","=>"")))
modify!(data, 13 => x -> parse.(Int, replace.(x, "\$" =>"","USD" =>"",","=>"")))
split_comma_pair(str) = collect(match(r"([^,]+),?(.*)", str))
modify!(data, :Destination => byrow(Tupleβsplit_comma_pair),
:Destination => splitter => [:city, :country])
ds=select(data, 14:15)
count=Dataset(all_countries())
really appreciate any advices.