CSV.read() faults on exponentially notated integers

r-handsfield · December 23, 2017, 2:15am

Trying to read the Credit Card Fraud data set on Kaggle. The first column represents times of transactions counted in milliseconds, which Julia interprets as Int64. This is fine until it encounters 1e+05 in line 153759, which gets parsed as Float64 and causes an error.

Is there a kwarg to handle this, or a workaround other than specifying a vector of types for all columns?

julia> CSV.read("creditcard.csv")
ERROR: CSV.ParsingException("error parsing a Int64 value on column 1, row 153759; encountered 'e'")

davidanthoff · December 23, 2017, 5:49am

CSVFiles.jl reads the file successfully with the following code:

using FileIO, CSVFiles, DataFrames

df = load("creditcard.csv") |> DataFrame

The Time column ends up with element type Float64, but you can then obviously easily change that.

I would have thought that load("creditcard.csv", colparsers=Dict(:Time=>Int)) would force the Time column to be of type Int, but that doesn’t seem to work. I’ve opened an issue in the underlying parser repo, lets see what @shashi thinks of this.

r-handsfield · December 28, 2017, 2:16am

@davidanthoff

Thanks for the reply; all great solutions.

The most expedient temporary solution turned out to be:

using CSV
df = CSV.read("creditcard.csv"; types=Dict(:Time=>Float64))

Topic		Replies	Views
Csv error reading numbers as string General Usage	16	2294	December 6, 2020
CSV.jl open file error, contain data in scientific notation format General Usage dataframes , csv	4	1303	May 4, 2018
How to specify `CSV.read` column types? General Usage question , type , csv	4	2134	August 7, 2018
New behaviour due to an update of the package CSV when using CSV.read General Usage	5	904	July 21, 2019
Foolproof method for converting to Float64 New to Julia dataframes , convert	10	2004	May 28, 2021

CSV.read() faults on exponentially notated integers

Related topics