CSV.read() faults on exponentially notated integers


#1

Trying to read the Credit Card Fraud data set on Kaggle. The first column represents times of transactions counted in milliseconds, which Julia interprets as Int64. This is fine until it encounters 1e+05 in line 153759, which gets parsed as Float64 and causes an error.

Is there a kwarg to handle this, or a workaround other than specifying a vector of types for all columns?

julia> CSV.read("creditcard.csv")
ERROR: CSV.ParsingException("error parsing a Int64 value on column 1, row 153759; encountered 'e'")


#2

CSVFiles.jl reads the file successfully with the following code:

using FileIO, CSVFiles, DataFrames

df = load("creditcard.csv") |> DataFrame

The Time column ends up with element type Float64, but you can then obviously easily change that.

I would have thought that load("creditcard.csv", colparsers=Dict(:Time=>Int)) would force the Time column to be of type Int, but that doesn’t seem to work. I’ve opened an issue in the underlying parser repo, lets see what @shashi thinks of this.


#3

@davidanthoff

Thanks for the reply; all great solutions.

The most expedient temporary solution turned out to be:

using CSV
df = CSV.read("creditcard.csv"; types=Dict(:Time=>Float64))