New behaviour due to an update of the package CSV when using CSV.read

Hello,

When the file tmp.csv just contains the string “INFI_”, the code

using CSV
data = CSV.read("tmp.csv",header=false,delim=';');
data

gives the result

1×1 DataFrames.DataFrame
│ Row │ Column1 │
│ │ String │
├─────┼─────────┤
│ 1 │ INFI_ │

which is what I want.

BUT, when the file tmp.csv just contains the strings “INFI”, the code gives the result

1×1 DataFrames.DataFrame
Row │ Column1 │
│ Float64 │
├─────┼─────────┤
│ 1 │ Inf │

That is to say, there is a cast from a string to a float. Thus, INFI seems to be a reserved words. I have the same behaviour by replacing “INFI” with “INF”.

Is there a simple solution to avoid this behaviour ie to keep “INFI” as a string ? Thanks very much.

NB : Note that this new behaviour is obtained with the version CSV v0.5.9 of the CSV package.
With the version CSV v0.4.3 of the CSV package, I do not have this problem.

I would check if there is an existing issue for CSV.jl, and if not, open one. This looks like a bug.

Thanks very much, Tamas ! This issue seems to have similarities to the one described in New behaviour due to an update of the package CSV when using CSV.write - #7 by steph_de_paris

If there is no solution, is there another package to read a CSV file with Julia ? (except the one to come back to the CSV v0.4.3 version of the CSV package)

1 Like

There is

OK, thanks again ! By the way, perhaps a solution for my issue could be

CSV.read(file; types=[String])

PS : I cannot test this on my Linux OS because of the restricted access from my home (this is the weekend …).

Yes, you can always manually specify what the type of a column should be, so doing types=[String], or w/ a Dict by column id or number: types=Dict(1=>String).

What’s going on here is that the Parsers.jl package parses both INF and INFINITY as valid Float64 values, and it looks like it considers any prefix in-between as valid as well, so this can ultimately be fixed in Parsers.jl itself.

1 Like