Trying to read the Credit Card Fraud data set on Kaggle. The first column represents times of transactions counted in milliseconds, which Julia interprets as Int64
. This is fine until it encounters 1e+05
in line 153759, which gets parsed as Float64
and causes an error.
Is there a kwarg to handle this, or a workaround other than specifying a vector of types for all columns?
julia> CSV.read("creditcard.csv")
ERROR: CSV.ParsingException("error parsing a Int64 value on column 1, row 153759; encountered 'e'")
CSVFiles.jl reads the file successfully with the following code:
using FileIO, CSVFiles, DataFrames
df = load("creditcard.csv") |> DataFrame
The Time
column ends up with element type Float64
, but you can then obviously easily change that.
I would have thought that load("creditcard.csv", colparsers=Dict(:Time=>Int))
would force the Time
column to be of type Int
, but that doesn’t seem to work. I’ve opened an issue in the underlying parser repo, lets see what @shashi thinks of this.
1 Like
@davidanthoff
Thanks for the reply; all great solutions.
The most expedient temporary solution turned out to be:
using CSV
df = CSV.read("creditcard.csv"; types=Dict(:Time=>Float64))