New behaviour due to an update of the package CSV when using CSV.read

steph_de_paris · July 19, 2019, 10:20pm

Hello,

When the file tmp.csv just contains the string “INFI_”, the code

using CSV
data = CSV.read("tmp.csv",header=false,delim=';');
data

gives the result

1×1 DataFrames.DataFrame
│ Row │ Column1 │
│ │ String │
├─────┼─────────┤
│ 1 │ INFI_ │

which is what I want.

BUT, when the file tmp.csv just contains the strings “INFI”, the code gives the result

1×1 DataFrames.DataFrame
Row │ Column1 │
│ Float64 │
├─────┼─────────┤
│ 1 │ Inf │

That is to say, there is a cast from a string to a float. Thus, INFI seems to be a reserved words. I have the same behaviour by replacing “INFI” with “INF”.

Is there a simple solution to avoid this behaviour ie to keep “INFI” as a string ? Thanks very much.

NB : Note that this new behaviour is obtained with the version CSV v0.5.9 of the CSV package.
With the version CSV v0.4.3 of the CSV package, I do not have this problem.

Tamas_Papp · July 20, 2019, 5:24am

I would check if there is an existing issue for CSV.jl, and if not, open one. This looks like a bug.

steph_de_paris · July 20, 2019, 3:11pm

Thanks very much, Tamas ! This issue seems to have similarities to the one described in New behaviour due to an update of the package CSV when using CSV.write - #7 by steph_de_paris

If there is no solution, is there another package to read a CSV file with Julia ? (except the one to come back to the CSV v0.4.3 version of the CSV package)

Tamas_Papp · July 20, 2019, 3:15pm

There is

steph_de_paris · July 20, 2019, 3:26pm

OK, thanks again ! By the way, perhaps a solution for my issue could be

CSV.read(file; types=[String])

PS : I cannot test this on my Linux OS because of the restricted access from my home (this is the weekend …).

quinnj · July 21, 2019, 7:49am

Yes, you can always manually specify what the type of a column should be, so doing types=[String], or w/ a Dict by column id or number: types=Dict(1=>String).

What’s going on here is that the Parsers.jl package parses both INF and INFINITY as valid Float64 values, and it looks like it considers any prefix in-between as valid as well, so this can ultimately be fixed in Parsers.jl itself.

Topic		Replies	Views
String7 type with read CSV? New to Julia	8	361	June 23, 2023
Csv error reading numbers as string General Usage	16	2295	December 6, 2020
Bug in CSV.read? General Usage	7	610	March 5, 2020
CSV, white spaces and data type. Force Float on empty fields General Usage csv	23	1293	January 21, 2021
DataFrame has NA, what best to do? General Usage	8	637	December 13, 2018

New behaviour due to an update of the package CSV when using CSV.read

Related topics