Use joins to change country names

akshdfyehd · April 15, 2023, 6:54am

Hi, I have got a dataset with a city and country column. However, there are some names in country column are actually the same place but different names, for example:

row 7 and 123, I want both country name be Australia, the way I prefer is to use package named Countries List of country codes by alpha-2, alpha-3 code (ISO 3166) and joins function(leftjoins, semijoin…) from inmemorydataset package.
I have tried to use joins but it just didn’t work:

Here is the code to get the dataset which named ds and countries named count:

using InMemoryDatasets,DLMReader, Countries
import Downloads
data=Downloads.download("https://raw.githubusercontent.com/akshdfyehd/travel/main/Travel%20details%20dataset.csv")
data=filereader(data, quotechar='"', dtformat=Dict(3:4 .=> dateformat"m/d/y"))
data=data[completecases(data),:]
modify!(data, 11 => x -> parse.(Int, replace.(x, "\$" =>"","USD" =>"",","=>"")))
modify!(data, 13 => x -> parse.(Int, replace.(x, "\$" =>"","USD" =>"",","=>"")))
split_comma_pair(str) = collect(match(r"([^,]+),?(.*)", str))
modify!(data, :Destination => byrow(Tuple∘split_comma_pair),
                         :Destination => splitter => [:city, :country])
ds=select(data, 14:15)
count=Dataset(all_countries())

really appreciate any advices.

rocco_sprmnt21 · April 15, 2023, 10:07am

I think there are two kinds of problems:

the space after the comma
uppercase characters


split_comma_pair(str) = let m=match(r"([^,]+)(, )?(.*)", str); (first(m.captures),last(m.captures)) end
modify!(data, :Destination => byrow(split_comma_pair),
                         :Destination => splitter => [:city, :country])
ds=select(data, 14:15)
modify!(ds, :country=>byrow(uppercase))
count=Dataset(all_countries())


leftjoin(ds, count, on=:country=>:alpha3)

a more general form of the split_comma… function

split_comma_pair(str)=Tuple(match(r"([^,]+),? *(.*)", str))

akshdfyehd · April 19, 2023, 5:31am

Thanks for the reply!!!

akshdfyehd · April 19, 2023, 6:21am

Hi,
I got the dataset which is correct, but my problem is to replace to the name, for example replace USA to united states in column 17, rest are stay untouched:

Do you know an easy way to replace all these?
really appreciate.

rocco_sprmnt21 · April 19, 2023, 7:00am

this is one way

acount=Dataset(all_countries())

ccn=leftjoin(ds, acount, on=:country=>:alpha3)

coalesce.(ccn[:,:common_name],ccn[:,:country])

coalesce.(ccn[:,:common_name],data[:,:country]) # to get ProperCase

akshdfyehd · April 23, 2023, 4:21am

Hi, I am so sorry to bother you again, my tutor want me to use modify to do this(from inmemorydatasets package) , do you know a function that can perform the same in inmemorydataset package as coalesce? I have tried replace but seems that function can’t do the missing value. Thanks for any advices, really appreciate

rocco_sprmnt21 · April 23, 2023, 6:18am

try something like this

julia> ds
5×2 Dataset
 Row │ x1        x2       
     │ identity  identity
     │ Int64?    Int64?
─────┼────────────────────
   1 │       -1         1
   2 │       10   missing
   3 │        4   missing
   4 │        5        -8
   5 │        1   missing

julia> modify(ds,(:x1,:x2)=>byrow((x1,x2)->ismissing(x2) ? x1 : x2)=>:new_x)
5×3 Dataset
 Row │ x1        x2        new_x    
     │ identity  identity  identity
     │ Int64?    Int64?    Int64?
─────┼──────────────────────────────
   1 │       -1         1         1
   2 │       10   missing        10
   3 │        4   missing         4
   4 │        5        -8        -8
   5 │        1   missing         1

akshdfyehd · April 23, 2023, 6:37am

Thank you!!!

Topic		Replies	Views
How to change the entire column form for different situation? General Usage question , inmemorydatasets	2	431	March 14, 2023
Split country column into a column with city and a column with country General Usage question , package , dataframes , inmemorydatasets	5	308	March 28, 2023
How to change form using InMemoryDatasets package? General Usage question , package , inmemorydatasets	7	715	March 16, 2023
Tidying up a csv file (follow-up to question 53261/4) New to Julia dates , csv	9	1219	January 14, 2021
Rename column if its name matches to a name in given list Data dataframes , csv	26	1608	August 17, 2023

Use joins to change country names

Related topics