Can't read a CSV file with empty fields into a DataFrame

I am trying to read a CSV file into a DataFrame. It clearly has some empty fields, and I get an error

ERROR: MethodError: Cannot convert an object of type Missing to an object of type String

I have tried to redefine the conversion for missing values:

Base.convert(::Type{String}, ::Missing) = "missing"

But now I get a different error

ERROR: ArgumentError: missing is not a valid key for type String

What is the standard way to deal with the empty string fields of a CSV file?

I canโ€™t replicate

julia> using DataFrames, CSV, Chain;

julia> t = """
       a, b
       "a1", "b1"
       , "b2"
       """;

julia> @chain t begin
           IOBuffer
           CSV.read(DataFrame)
       end
2ร—2 DataFrame
 Row โ”‚ a          b
     โ”‚ String3?  String3
โ”€โ”€โ”€โ”€โ”€โ”ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€
   1 โ”‚ a1        b1
   2 โ”‚ missing   b2

Can you show us a bit of the file you are using?

You can download it here: https://www-nds.iaea.org/nuclearmoments/magn_mom_recomm.csv

It is a file with the data on nuclear isotopes.

Thereโ€™s something funky going on on row 121 of the file:

julia> using CSV, DataFrames, Downloads

julia> CSV.read(Downloads.download("https://www-nds.iaea.org/nuclearmoments/magn_mom_recomm.csv"), DataFrame; limit = 120)
120ร—12 DataFrame
 Row โ”‚ z      n.n+n.z  symbol   energy [keV]  halflife  spin     magnetic dipole [nm]  method    description                        nsr                journal                            indc          
     โ”‚ Int64  Int64    String3  Int64         String15  String7  String31              String15  String                             String31           String                             String15      
โ”€โ”€โ”€โ”€โ”€โ”ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€
   1 โ”‚     0        1  NN                  0  10.2 m    1/2+     -1.9130427(5)         R         Re-evaluated by N.J. Stone (2013โ€ฆ  2014Ol01           Chin. Phys. C38 090001 (2014)      indc-nds-0794
   2 โ”‚     1        1  H                   0  stable    1/2+     +2.792847351(9)       R         Re-evaluated by N.J. Stone (2013โ€ฆ  2014Ol01           Chin. Phys. C38 090001 (2014)      indc-nds-0794

Opening the plain text file it looks like the next row has a value of "PRL 96 in the journal column, so maybe thatโ€™s the culprit that throws parsing off?

Ok. Yesterday in the evening I was trying to load the file as CSV.File("magn_mom_recomm.csv")|>DataFrame and it gave me the cryptic errors I described.

Now, I have tried to do load("magn_mom_recomm.csv")|>DataFrame, and I see the error is about the row 121 indeed. It looks like an extra double quotes, which throws parsing off the course.

Please file an issue with CSV.jl. It should probably be smart enough to throw this as an error

1 Like

try using kwarg quoted = false

PS

if you use the kwarg quotechar set to something other than โ€™ " โ€™ the df is built regularly (it seems)

julia> df=CSV.read("magn_mom_recomm2.csv",quotechar='^',DataFrame) 
2155ร—12 DataFrame
  Row โ”‚ z      n.n+n.z  symbol   "energy [keV]"  halflife        spin       "magnet โ‹ฏ
      โ”‚ Int64  Int64    String3  String15?       String15?       String15?  String3 โ‹ฏ
โ”€โ”€โ”€โ”€โ”€โ”€โ”ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€
    1 โ”‚     0        1  NN       0               "10.2 m"        1/2+       -1.9130 โ‹ฏ
    2 โ”‚     1        1  H        0               stable          1/2+       +2.7928  
    3 โ”‚     1        2  H        0               stable          1+         +0.8574  
    4 โ”‚     1        3  H        0               "12.33 y"       1/2+       +2.9789  
    5 โ”‚     2        3  He       0               stable          1/2+       -2.1276 โ‹ฏ
    6 โ”‚     3        6  Li       0               stable          1+         +0.8220  
    7 โ”‚     3        7  Li       0               stable          3/2-       +3.2564  
    8 โ”‚     3        8  Li       0               "840 ms"        2+         +1.6535  
    9 โ”‚     3        9  Li       0               "178 ms"        3/2-       3.43666 โ‹ฏ
   10 โ”‚     3       11  Li       0               "8.75 ms"       3/2-       +3.6711  
   11 โ”‚     4        9  Be       0               stable          3/2-       -1.1774  
   12 โ”‚     4       11  Be       0               "13.8 s"        1/2+       -1.6816  
   13 โ”‚     5        8  B        0               "0.77 s"        2+         1.0355( โ‹ฏ
   14 โ”‚     5       10  B        718             "0.707 ns"      1+         +0.63(1  
   15 โ”‚     5       10  B        0               stable          3+         1.80046  
   16 โ”‚     5       11  B        0               stable          3/2-       2.68837  
   17 โ”‚     5       12  B        0               "20.2 ms"       1+         +1.003( โ‹ฏ
   18 โ”‚     5       13  B        0               "17.3 ms"       3/2-       +3.1778  
  โ‹ฎ   โ”‚   โ‹ฎ       โ‹ฎ        โ‹ฎ           โ‹ฎ               โ‹ฎ             โ‹ฎ              โ‹ฑ
 2138 โ”‚    93      237  Np       60              "67 ns"         5/2-       +1.85(2  
 2139 โ”‚    93      239  Np       75              "1.39 ns"       5/2-       +2.2(4)  
....
 2154 โ”‚    99      254  Es       84              "39.3 h"        2+         2.90(7)  
 2155 โ”‚   102      253  No       0               "1.62 m"        (9/2-)     -0.53(8  
                                                      6 columns and 2119 rows omitted

omg Iโ€™ve been having issues with this (iโ€™m working with messy text data), and setting quoted=false did the trick

1 Like