Bug in CSV.read?

I’m trying to represent infinity in Excel, and have tried with 1e18 and 1e19 (since I haven’t found an Inf symbol). I then export this to a CSV file, and upon checking the resulting file in Notepad++, the numbers are represented correctly in the ASCI file.

I then import the file into Julia using CSV.read(). The results are:

  • 1e18 -> 1.0e18 (correct)
  • 1e19 -> 7.766279631452242e17 (incorrect)

Two questions:

  1. Is there a good way to represent infinity in Excel (I know, I know – this is not an Excel group),
  2. Is the bug in CSV.read() similar to interpreting 1e19 in the CSV file as 10^(19), which gives overflow? [Note: 7.76...e17 is different from what I get if I do 10^(19) in Julia…]

Hm… to answer my first question… If I type either of the text strings “Inf” or “inf” in the Excel cell, this appears as Inf in the csv-file, and is correctly interpreted as Float64 infinity in Julia.

So this solves my immediate problem. I am still worried about the problem to handle numbers larger than 1e18, though.

Might be Excel doing something funny? This works:

using DataFrames, CSV
test = DataFrame(x = [1, 1e18, 1e19, 1e20])
CSV.write("test.csv", test)

I concur with @nilshg — I cannot replicate this:

julia> using CSV

julia> tmp = tempname();

julia> write(tmp, "data\n1e19\n");

julia> CSV.read(tmp)
1×1 DataFrame
│ Row │ data    │
│     │ Float64 │
│ 1   │ 1.0e19  │

Please check the contents of the file to see if it is actually written out correctly.

FWIW, using Excel for anything even remotely connected to numbers is the perfect recipe for constantly dealing with issues like this.

1 Like

No, I don’t think it is Excel that is doing something funny. Here is a screen shot of Excel:
In the second row, I have used Scientific number format. In the Fourth row, I have used Number format with 2 decimals.

Here is the resulting content of the CSV file:


And here is the result of importing the content into Julia using CSV.read:

As you see, CSV.read() appears to handle Scientific number format correctly, but not Number format. [And from here, it seems like long numbers in the Number format are handled somewhat arbitrary… in my other example, 1e18 was handled correctly…

Of course, I can format cells with scientific notation as Scientific number format. But in many cases, my columns contain a mixture of small numbers and infinity (Thermodynamic tables), and it is inconvenient to use Scientific number for small numbers.

Okay I can replicate this when creating the same csv file. It seems the decimal points are throwing the read off, when removing them manually from the csv the numbers read in fine.

1 Like

Yes, that is very likely to be a bug. MWE:

julia> using CSV

julia> tmp = tempname();

julia> write(tmp, "data\n1000000000000000000.00\n10000000000000000000.00");

julia> CSV.read(tmp)
2×1 DataFrames.DataFrame
│ Row │ data       │
│     │ Float64    │
│ 1   │ 7.76628e16 │
│ 2   │ 7.76628e17 │

Maybe check existing issues for that package and open a new one if this is not known.