Load and reformatting CSV file

Hello,

I am trying to load a CSV file from a Github rep, reformat, and store it as DataFrame obj. Here is what I tried:

using CSV, HTTP, DataFrames
url = "https://raw.githubusercontent.com/RamiKrispin/coronavirus/master/csv/coronavirus.csv"
http_response = HTTP.get(url)
file = CSV.File(http_response.body)
df = DataFrame(file)

Here is the output I am getting:

julia> df
514572×7 DataFrame
│ Row    │ date       │ province │ country  │ lat        │ long      │ type      │ cases    │
│        │ Dates.Date │ String63 │ String63 │ String15   │ String15  │ String15  │ String15 │
├────────┼────────────┼──────────┼──────────┼────────────┼───────────┼───────────┼──────────┤
│ 1      │ 2020-01-22 │ Alberta  │ Canada   │ 53.9333    │ -116.5765 │ confirmed │ 0        │
│ 2      │ 2020-01-23 │ Alberta  │ Canada   │ 53.9333    │ -116.5765 │ confirmed │ 0        │
│ 3      │ 2020-01-24 │ Alberta  │ Canada   │ 53.9333    │ -116.5765 │ confirmed │ 0        │
│ 4      │ 2020-01-25 │ Alberta  │ Canada   │ 53.9333    │ -116.5765 │ confirmed │ 0        │
│ 5      │ 2020-01-26 │ Alberta  │ Canada   │ 53.9333    │ -116.5765 │ confirmed │ 0        │
│ 6      │ 2020-01-27 │ Alberta  │ Canada   │ 53.9333    │ -116.5765 │ confirmed │ 0        │
⋮
│ 514566 │ 2021-10-02 │ NA       │ Zimbabwe │ -19.015438 │ 29.154857 │ recovered │ NA       │
│ 514567 │ 2021-10-03 │ NA       │ Zimbabwe │ -19.015438 │ 29.154857 │ recovered │ NA       │
│ 514568 │ 2021-10-04 │ NA       │ Zimbabwe │ -19.015438 │ 29.154857 │ recovered │ NA       │
│ 514569 │ 2021-10-05 │ NA       │ Zimbabwe │ -19.015438 │ 29.154857 │ recovered │ NA       │
│ 514570 │ 2021-10-06 │ NA       │ Zimbabwe │ -19.015438 │ 29.154857 │ recovered │ NA       │
│ 514571 │ 2021-10-07 │ NA       │ Zimbabwe │ -19.015438 │ 29.154857 │ recovered │ NA       │
│ 514572 │ 2021-10-08 │ NA       │ Zimbabwe │ -19.015438 │ 29.154857 │ recovered │ NA       │

How can I modify the cases column from string to integer? It classifies it as string as it has missing values classify as NA. I was trying to use this:

file = CSV.File(http_response.body, null=“NA”)

but this argument is not available for the CSV.File function.

Any suggestions?

Also, any shorter way to load CSV file from URL?

Thanks!

You can use download():

using CSV, DataFrames
url = "https://raw.githubusercontent.com/RamiKrispin/coronavirus/master/csv/coronavirus.csv"
file = CSV.File(download(url))
df = DataFrame(file)

It seems like you are using some pretty old versions of things. The keyword argument should be missingstring = "NA". But I would suggest updating to the latest versions of packages first.

2 Likes

You can add the types Argument which takes a Dict(colname=> type). But I think once you us missingstring. It’ll parse automatically

I think the version is new and the argument incorrect ( since it is using the new inline string)

Ah good catch. But the DataFrames.jl is old.

1 Like

Thanks all for the answers! I added the missingstring= "NA" argument and it works:

file = CSV.File(download(url), missingstring= "NA")
df = DataFrame(file)

df
515394×7 DataFrame
│ Row    │ date       │ province  │ country  │ lat      │ long     │ type      │ cases   │
│        │ Dates.Date │ String63? │ String63 │ Float64? │ Float64? │ String15  │ Int64?  │
├────────┼────────────┼───────────┼──────────┼──────────┼──────────┼───────────┼─────────┤
│ 1      │ 2020-01-22 │ Alberta   │ Canada   │ 53.9333  │ -116.576 │ confirmed │ 0       │
│ 2      │ 2020-01-23 │ Alberta   │ Canada   │ 53.9333  │ -116.576 │ confirmed │ 0       │
│ 3      │ 2020-01-24 │ Alberta   │ Canada   │ 53.9333  │ -116.576 │ confirmed │ 0       │
│ 4      │ 2020-01-25 │ Alberta   │ Canada   │ 53.9333  │ -116.576 │ confirmed │ 0       │
│ 5      │ 2020-01-26 │ Alberta   │ Canada   │ 53.9333  │ -116.576 │ confirmed │ 0       │
│ 6      │ 2020-01-27 │ Alberta   │ Canada   │ 53.9333  │ -116.576 │ confirmed │ 0       │
⋮
│ 515388 │ 2021-10-03 │ missing   │ Zimbabwe │ -19.0154 │ 29.1549  │ recovered │ missing │
│ 515389 │ 2021-10-04 │ missing   │ Zimbabwe │ -19.0154 │ 29.1549  │ recovered │ missing │
│ 515390 │ 2021-10-05 │ missing   │ Zimbabwe │ -19.0154 │ 29.1549  │ recovered │ missing │
│ 515391 │ 2021-10-06 │ missing   │ Zimbabwe │ -19.0154 │ 29.1549  │ recovered │ missing │
│ 515392 │ 2021-10-07 │ missing   │ Zimbabwe │ -19.0154 │ 29.1549  │ recovered │ missing │
│ 515393 │ 2021-10-08 │ missing   │ Zimbabwe │ -19.0154 │ 29.1549  │ recovered │ missing │
│ 515394 │ 2021-10-09 │ missing   │ Zimbabwe │ -19.0154 │ 29.1549  │ recovered │ missing │
1 Like