How to change the entire column form for different situation?

Hi, I have got a dataset from kaggle and its relate to travel, Here are the link for raw datasets:
https://raw.githubusercontent.com/akshdfyehd/travel/main/Travel%20details%20dataset.csv

there is a column call “Destination”, originally is like this:
image
so there are some destination have " " and comma (eg, “London, UK”) and some place don’t (eg, New York)

when I load data into julia it represent like this:

julia> using InMemoryDatasets,DLMReader,Chain
julia> import Downloads
julia> data=Downloads.download("https://raw.githubusercontent.com/akshdfyehd/travel/main/Travel%20details%20dataset.csv")
julia> data=filereader(data)
139×13 Dataset
 Row │ \ufeffTrip ID  Destination      Start date              End date    Duration (days)  Traveler name     Traveler age     Traveler gender  Traveler nationality  Accommodation type  Accommodation cost  Tra ⋯
     │ identity       identity         identity                identity    identity         identity          identity         identity         identity              identity            identity            ide ⋯
     │ Int64?         String?          String?                 String?     String?          String?           String?          String?          String?               String?             String?             Str ⋯
─────┼─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
   1 │             1  "London           UK"                    5/1/2023    5/8/2023         7                 John Smith       35               Male                  American            Hotel               120 ⋯
   2 │             2  "Phuket           Thailand"              6/15/2023   6/20/2023        5                 Jane Doe         28               Female                Canadian            Resort              800
   3 │             3  "Bali             Indonesia"             7/1/2023    7/8/2023         7                 David Lee        45               Male                  Korean              Villa               100
   4 │             4  "New York         USA"                   8/15/2023   8/29/2023        14                Sarah Johnson    29               Female                British             Hotel               200
   5 │             5  "Tokyo            Japan"                 9/10/2023   9/17/2023        7                 Kim Nguyen       26               Female                Vietnamese          Airbnb              700 ⋯
   6 │             6  "Paris            France"                10/5/2023   10/10/2023       5                 Michael Brown    42               Male                  American            Hotel               150
   7 │             7  "Sydney           Australia"             11/20/2023  11/30/2023       10                Emily Davis      33               Female                Australian          Hostel              500
   8 │             8  "Rio de Janeiro   Brazil"                1/5/2024    1/12/2024        7                 Lucas Santos     25               Male                  Brazilian           Airbnb              900
   9 │             9  "Amsterdam        Netherlands"           2/14/2024   2/21/2024        7                 Laura Janssen    31               Female                Dutch               Hotel               120 ⋯
  10 │            10  "Dubai            United Arab Emirates"  3/10/2024   3/17/2024        7                 Mohammed Ali     39               Male                  Emirati             Resort              250
  11 │            11  "Cancun           Mexico"                4/1/2024    4/8/2024         7                 Ana Hernandez    27               Female                Mexican             Hotel               100
  12 │            12  "Barcelona        Spain"                 5/15/2024   5/22/2024        7                 Carlos Garcia    36               Male                  Spanish             Airbnb              800
  13 │            13  "Honolulu         Hawaii"                6/10/2024   6/18/2024        8                 Lily Wong        29               Female                Chinese             Resort              300 ⋯
  14 │            14  "Berlin           Germany"               7/1/2024    7/10/2024        9                 Hans Mueller     48               Male                  German              Hotel               140
  15 │            15  "Marrakech        Morocco"               8/20/2024   8/27/2024        7                 Fatima Khouri    26               Female                Moroccan            Riad                600
  16 │            16  "Edinburgh        Scotland"              9/5/2024    9/12/2024        7                 James MacKenzie  32               Male                  Scottish            Hotel               900
  17 │            17  Paris            9/1/2023                9/10/2023   9                Sarah Johnson     30               Female           American              Hotel               $900                Pla ⋯
  18 │            18  Bali             8/15/2023               8/25/2023   10               Michael Chang     28               Male             Chinese               Resort              "$1                 500
  19 │            19  London           7/22/2023               7/28/2023   6                Olivia Rodriguez  35               Female           British               Hotel               "$1                 200
  20 │            20  Tokyo            10/5/2023               10/15/2023  10               Kenji Nakamura    45               Male             Japanese              Hotel               "$1                 200
  ⋮  │       ⋮               ⋮                   ⋮                 ⋮              ⋮                ⋮                 ⋮                ⋮                  ⋮                    ⋮                   ⋮               ⋱
 120 │           120  "Rome             Italy

I am not sure what is the reason this happen and how to fix it, any advices really appreciated.
Thank you!

I think you could use CSV.read which has the option quotechar='"', eg

CSV.read(data; quotechar='"', escapechar='"')

See here:
https://csv.juliadata.org/stable/examples.html#quotechar_example

1 Like
using CSV, DataFrames

julia> df=CSV.read("Travel details dataset.csv", DataFrame;dateformat="mm/dd/yyyy")
139×13 DataFrame
 Row │ Trip ID  Destination                  Start date   End date     Duration (days)  Traveler name     Travel ⋯
     │ Int64    String31?                    Dates.Date?  Dates.Date?  Int64?           String31?         Int64? ⋯
─────┼────────────────────────────────────────────────────────────────────────────────────────────────────────────
   1 │       1  London, UK                   2023-05-01   2023-05-08                 7  John Smith               ⋯
   2 │       2  Phuket, Thailand             2023-06-15   2023-06-20                 5  Jane Doe
   3 │       3  Bali, Indonesia              2023-07-01   2023-07-08                 7  David Lee
   4 │       4  New York, USA                2023-08-15   2023-08-29                14  Sarah Johnson
   5 │       5  Tokyo, Japan                 2023-09-10   2023-09-17                 7  Kim Nguyen               ⋯
   6 │       6  Paris, France                2023-10-05   2023-10-10                 5  Michael Brown
   7 │       7  Sydney, Australia            2023-11-20   2023-11-30                10  Emily Davis
   8 │       8  Rio de Janeiro, Brazil       2024-01-05   2024-01-12                 7  Lucas Santos





julia> describe(df)
13×7 DataFrame
 Row │ variable              mean     min         median      max                nmissing  eltype                ⋯
     │ Symbol                Union…   Any         Any         Any                Int64     Type                  ⋯
─────┼────────────────────────────────────────────────────────────────────────────────────────────────────────────
   1 │ Trip ID               70.0     1           70.0        139                       0  Int64                 ⋯
   2 │ Destination                    Amsterdam               Vancouver, Canada         2  Union{Missing, String  
   3 │ Start date                     2021-06-15  2023-06-15  2025-05-21                2  Union{Missing, Date}   
   4 │ End date                       2021-06-20  2023-06-20  2025-05-29                2  Union{Missing, Date}   
   5 │ Duration (days)       7.60584  5           7.0         14                        2  Union{Missing, Int64} ⋯
   6 │ Traveler name                  Adam Lee                William Davis             2  Union{Missing, String  
   7 │ Traveler age          33.1752  20          31.0        60                        2  Union{Missing, Int64}  
   8 │ Traveler gender                Female                  Male                      2  Union{Missing, String  
   9 │ Traveler nationality           American                Vietnamese                2  Union{Missing, String ⋯
  10 │ Accommodation type             Airbnb                  Villa                     2  Union{Missing, String  
  11 │ Accommodation cost             $1,000                  900 USD                   2  Union{Missing, String  
  12 │ Transportation type            Airplane                Train                     3  Union{Missing, String  
  13 │ Transportation cost            $1,000                  900                       3  Union{Missing, String ⋯


 
3 Likes