Hi, I have got a dataset from kaggle and its relate to travel, Here are the link for raw datasets:
https://raw.githubusercontent.com/akshdfyehd/travel/main/Travel%20details%20dataset.csv
there is a column call “Destination”, originally is like this:
so there are some destination have " " and comma (eg, “London, UK”) and some place don’t (eg, New York)
when I load data into julia it represent like this:
julia> using InMemoryDatasets,DLMReader,Chain
julia> import Downloads
julia> data=Downloads.download("https://raw.githubusercontent.com/akshdfyehd/travel/main/Travel%20details%20dataset.csv")
julia> data=filereader(data)
139×13 Dataset
Row │ \ufeffTrip ID Destination Start date End date Duration (days) Traveler name Traveler age Traveler gender Traveler nationality Accommodation type Accommodation cost Tra ⋯
│ identity identity identity identity identity identity identity identity identity identity identity ide ⋯
│ Int64? String? String? String? String? String? String? String? String? String? String? Str ⋯
─────┼─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
1 │ 1 "London UK" 5/1/2023 5/8/2023 7 John Smith 35 Male American Hotel 120 ⋯
2 │ 2 "Phuket Thailand" 6/15/2023 6/20/2023 5 Jane Doe 28 Female Canadian Resort 800
3 │ 3 "Bali Indonesia" 7/1/2023 7/8/2023 7 David Lee 45 Male Korean Villa 100
4 │ 4 "New York USA" 8/15/2023 8/29/2023 14 Sarah Johnson 29 Female British Hotel 200
5 │ 5 "Tokyo Japan" 9/10/2023 9/17/2023 7 Kim Nguyen 26 Female Vietnamese Airbnb 700 ⋯
6 │ 6 "Paris France" 10/5/2023 10/10/2023 5 Michael Brown 42 Male American Hotel 150
7 │ 7 "Sydney Australia" 11/20/2023 11/30/2023 10 Emily Davis 33 Female Australian Hostel 500
8 │ 8 "Rio de Janeiro Brazil" 1/5/2024 1/12/2024 7 Lucas Santos 25 Male Brazilian Airbnb 900
9 │ 9 "Amsterdam Netherlands" 2/14/2024 2/21/2024 7 Laura Janssen 31 Female Dutch Hotel 120 ⋯
10 │ 10 "Dubai United Arab Emirates" 3/10/2024 3/17/2024 7 Mohammed Ali 39 Male Emirati Resort 250
11 │ 11 "Cancun Mexico" 4/1/2024 4/8/2024 7 Ana Hernandez 27 Female Mexican Hotel 100
12 │ 12 "Barcelona Spain" 5/15/2024 5/22/2024 7 Carlos Garcia 36 Male Spanish Airbnb 800
13 │ 13 "Honolulu Hawaii" 6/10/2024 6/18/2024 8 Lily Wong 29 Female Chinese Resort 300 ⋯
14 │ 14 "Berlin Germany" 7/1/2024 7/10/2024 9 Hans Mueller 48 Male German Hotel 140
15 │ 15 "Marrakech Morocco" 8/20/2024 8/27/2024 7 Fatima Khouri 26 Female Moroccan Riad 600
16 │ 16 "Edinburgh Scotland" 9/5/2024 9/12/2024 7 James MacKenzie 32 Male Scottish Hotel 900
17 │ 17 Paris 9/1/2023 9/10/2023 9 Sarah Johnson 30 Female American Hotel $900 Pla ⋯
18 │ 18 Bali 8/15/2023 8/25/2023 10 Michael Chang 28 Male Chinese Resort "$1 500
19 │ 19 London 7/22/2023 7/28/2023 6 Olivia Rodriguez 35 Female British Hotel "$1 200
20 │ 20 Tokyo 10/5/2023 10/15/2023 10 Kenji Nakamura 45 Male Japanese Hotel "$1 200
⋮ │ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋱
120 │ 120 "Rome Italy
I am not sure what is the reason this happen and how to fix it, any advices really appreciated.
Thank you!