Loading COVID-19 CSV data

Hi Julians,

I am trying to read the latest COVID-19 data to do analysis, from here:

https://github.com/CSSEGISandData/COVID-19/blob/master/csse_covid_19_data/csse_covid_19_time_series/time_series_19-covid-Confirmed.csv

Using CSV or CSVFiles the data doesn’t load in correctly. The 5th column (and higher) are filled with many missing values yet the dataset uses 0 for the numeric columns (not missing).

using DataFrames, CSV, Dates
function read_data()
    data = Dict{Symbol,DataFrame}()
    for t in [:Confirmed, :Deaths, :Recovered]
        p = "./data/archived_data/archived_time_series/time_series_2019-ncov-$t.csv"
        data[t] = CSV.read(p, copycols=true, dateformat="m/dd/yy")
        #data[t] = DataFrame(load(p))
    end
    return data
end

I’m using Jula v1.3.1 and the current stable versions of CSV (and CSVFiles). Any idea what is the issue?

Thanks,

Glen

I can’t replicate that with CSVFiles.jl. When I use this code to load the data:

using CSVFiles, DataFrames

df = load("https://raw.githubusercontent.com/CSSEGISandData/COVID-19/master/csse_covid_19_data/csse_covid_19_time_series/time_series_19-covid-Confirmed.csv") |> DataFrame

I don’t get any missing values in at least the first 19 columns (I haven’t looked at the other ones).

This is with CSVFiles 0.16.1 and DataFrames 0.20.2.

1 Like

this is what I do

open("time_series_confirmed.csv", "w") do fio
    write(fio, HTTP.request("GET", "https://raw.githubusercontent.com/CSSEGISandData/COVID-19/master/csse_covid_19_data/csse_covid_19_time_series/time_series_19-covid-Confirmed.csv").body |> String)
end
dfc = CSV.File("time_series_confirmed.csv") |> DataFrame
rename!(dfc, Dict(Symbol("Country/Region") => :Country, Symbol("Province/State") => :State));