Hello,
I am trying to load a CSV file from a Github rep, reformat, and store it as DataFrame obj. Here is what I tried:
using CSV, HTTP, DataFrames
url = "https://raw.githubusercontent.com/RamiKrispin/coronavirus/master/csv/coronavirus.csv"
http_response = HTTP.get(url)
file = CSV.File(http_response.body)
df = DataFrame(file)
Here is the output I am getting:
julia> df
514572×7 DataFrame
│ Row │ date │ province │ country │ lat │ long │ type │ cases │
│ │ Dates.Date │ String63 │ String63 │ String15 │ String15 │ String15 │ String15 │
├────────┼────────────┼──────────┼──────────┼────────────┼───────────┼───────────┼──────────┤
│ 1 │ 2020-01-22 │ Alberta │ Canada │ 53.9333 │ -116.5765 │ confirmed │ 0 │
│ 2 │ 2020-01-23 │ Alberta │ Canada │ 53.9333 │ -116.5765 │ confirmed │ 0 │
│ 3 │ 2020-01-24 │ Alberta │ Canada │ 53.9333 │ -116.5765 │ confirmed │ 0 │
│ 4 │ 2020-01-25 │ Alberta │ Canada │ 53.9333 │ -116.5765 │ confirmed │ 0 │
│ 5 │ 2020-01-26 │ Alberta │ Canada │ 53.9333 │ -116.5765 │ confirmed │ 0 │
│ 6 │ 2020-01-27 │ Alberta │ Canada │ 53.9333 │ -116.5765 │ confirmed │ 0 │
⋮
│ 514566 │ 2021-10-02 │ NA │ Zimbabwe │ -19.015438 │ 29.154857 │ recovered │ NA │
│ 514567 │ 2021-10-03 │ NA │ Zimbabwe │ -19.015438 │ 29.154857 │ recovered │ NA │
│ 514568 │ 2021-10-04 │ NA │ Zimbabwe │ -19.015438 │ 29.154857 │ recovered │ NA │
│ 514569 │ 2021-10-05 │ NA │ Zimbabwe │ -19.015438 │ 29.154857 │ recovered │ NA │
│ 514570 │ 2021-10-06 │ NA │ Zimbabwe │ -19.015438 │ 29.154857 │ recovered │ NA │
│ 514571 │ 2021-10-07 │ NA │ Zimbabwe │ -19.015438 │ 29.154857 │ recovered │ NA │
│ 514572 │ 2021-10-08 │ NA │ Zimbabwe │ -19.015438 │ 29.154857 │ recovered │ NA │
How can I modify the cases
column from string to integer? It classifies it as string as it has missing values classify as NA
. I was trying to use this:
file = CSV.File(http_response.body, null=“NA”)
but this argument is not available for the CSV.File
function.
Any suggestions?
Also, any shorter way to load CSV file from URL?
Thanks!