CSV problem reading empty string column


#1

How can I read a text data file with an empty string column? The file rarely, if ever, has a string in the column.

data.dat

12/10/2016 00:00:00 0.004 3564172800.000 23:00:00 SPEC:NH3
time, Range_F 1_L 1, Zero_F 1, Range_F 2_L 1, Zero_F 2, Praw, Traw, AD2, Tref, AD6, AD7, StatusW, ValveW, VICI_W, USBByte, NI6024Byte, SPEFile, T Laser 1, V Laser 1, LW Laser 1,  dT1,  X1,  pos1,  ConvergenceWord,  ChillerT,  CV1_Volts
3564172800.622330 ,264.04,-241.44,0.00,0.00,60.06500,306.52790,0.00000,295.70950,-99.99000,-99.99000,12716,0,0,0,0,,-24.82610,851.59140,0.00000,.000 ,3.838e1 ,134.513 ,1 ,0.0000 ,1.5000 
3564172801.622420 ,264.14,-241.61,0.00,0.00,60.11341,306.52580,0.00000,295.71350,-99.99000,-99.99000,12716,0,0,0,0,,-24.82602,850.31560,0.00000,.000 ,3.624e1 ,134.513 ,1 ,0.0000 ,1.5000 
3564172802.622330 ,263.98,-241.47,0.00,0.00,60.11343,306.52690,0.00000,295.71390,-99.99000,-99.99000,12716,0,0,0,0,,-24.82516,851.21010,0.00000,.000 ,3.568e1 ,134.513 ,1 ,0.0000 ,1.5000 
3564172803.622240 ,263.98,-241.55,0.00,0.00,59.96580,306.52460,0.00000,295.71820,-99.99000,-99.99000,12716,0,0,0,0,,-24.82532,850.68480,0.00000,.000 ,3.932e1 ,134.513 ,1 ,0.0000 ,1.5000

Here is my code.

using CSV
src = "data.dat"
header = ["time", "Range_F_1_L_1", "Zero_F_1", "Range_F_2_L_1", "Zero_F_2", "Praw", "Traw", "AD2", "Tref", "AD6", "AD7", "StatusW", "ValveW", "VICI_W", "USBByte", "NI6024Byte", "SPEFile", "T_Laser_1", "V_Laser_1", "LW_Laser_1", "dT1", "X1", "pos1", "ConvergenceWord", "ChillerT", "CV1_Volts"]
coltypes = fill(Float64,length(header))
coltypes[12] = Int
coltypes[17] = Nullable{String}
D = CSV.read(src;delim=",",types=coltypes,header=header,datarow=3)

ERROR: MissingException: encountered a missing value for a non-null column type on row = 1, col = 17
Stacktrace:
 [1] (::CSV.##4#5)(::Int64, ::Int64) at /home/user/.julia/v0.6/CSV/src/parsefields.jl:122
 [2] parsefield at /home/user/.julia/v0.6/CSV/src/parsefields.jl:233 [inlined]
 [3] parsefield at /home/user/.julia/v0.6/CSV/src/parsefields.jl:321 [inlined]
 [4] parsefield at /home/user/.julia/v0.6/CSV/src/parsefields.jl:129 [inlined] (repeats 2 times)
 [5] streamfrom at /home/user/.julia/v0.6/CSV/src/Source.jl:206 [inlined]
 [6] macro expansion at /home/user/.julia/v0.6/DataStreams/src/DataStreams.jl:507 [inlined]
 [7] stream!(::CSV.Source{Base.AbstractIOBuffer{Array{UInt8,1}},Missings.Missing}, ::Type{DataStreams.Data.Field}, ::DataFrames.DataFrameStream{Tuple{Array{Float64,1},Array{Float64,1},Array{Float64,1},Array{Float64,1},Array{Float64,1},Array{Float64,1},Array{Float64,1},Array{Float64,1},Array{Float64,1},Array{Float64,1},Array{Float64,1},Array{Int64,1},Array{Float64,1},Array{Float64,1},Array{Float64,1},Array{Float64,1},Array{Nullable{String},1},Array{Float64,1},Array{Float64,1},Array{Float64,1},Array{Float64,1},Array{Float64,1},Array{Float64,1},Array{Float64,1},Array{Float64,1},Array{Float64,1}}}, ::DataStreams.Data.Schema{true,Tuple{Float64,Float64,Float64,Float64,Float64,Float64,Float64,Float64,Float64,Float64,Float64,Int64,Float64,Float64,Float64,Float64,Nullable{String},Float64,Float64,Float64,Float64,Float64,Float64,Float64,Float64,Float64}}, ::Int64, ::NTuple{26,Base.#identity}, ::DataStreams.Data.##15#16, ::Array{Any,1}, ::Type{Ref{(:time, :Range_F_1_L_1, :Zero_F_1, :Range_F_2_L_1, :Zero_F_2, :Praw, :Traw, :AD2, :Tref, :AD6, :AD7, :StatusW, :ValveW, :VICI_W, :USBByte, :NI6024Byte, :SPEFile, :T_Laser_1, :V_Laser_1, :LW_Laser_1, :dT1, :X1, :pos1, :ConvergenceWord, :ChillerT, :CV1_Volts)}}) at /home/user/.julia/v0.6/DataStreams/src/DataStreams.jl:579
 [8] #stream!#17(::Bool, ::Dict{Int64,Function}, ::Function, ::Array{Any,1}, ::Array{Any,1}, ::Function, ::CSV.Source{Base.AbstractIOBuffer{Array{UInt8,1}},Missings.Missing}, ::Type{DataFrames.DataFrame}) at /home/user/.julia/v0.6/DataStreams/src/DataStreams.jl:455
 [9] (::DataStreams.Data.#kw##stream!)(::Array{Any,1}, ::DataStreams.Data.#stream!, ::CSV.Source{Base.AbstractIOBuffer{Array{UInt8,1}},Missings.Missing}, ::Type{DataFrames.DataFrame}) at ./<missing>:0
 [10] #read#43(::Bool, ::Dict{Int64,Function}, ::Bool, ::Array{Any,1}, ::Function, ::String, ::Type{T} where T) at /home/user/.julia/v0.6/CSV/src/Source.jl:312
 [11] (::CSV.#kw##read)(::Array{Any,1}, ::CSV.#read, ::String, ::Type{T} where T) at ./<missing>:0 (repeats 2 times)

Despite using Nullable{String} it still gives me an error. Is this related to this comment?


I defined the column types so I assumed it wouldn’t be a problem.


#2

I needed to remove spaces from src and this worked:

src = """12/10/2016 00:00:00 0.004 3564172800.000 23:00:00 SPEC:NH3
time, Range_F 1_L 1, Zero_F 1, Range_F 2_L 1, Zero_F 2, Praw, Traw, AD2, Tref, AD6, AD7, StatusW, ValveW, VICI_W, USBByte, NI6024Byte, SPEFile, T Laser 1, V Laser 1, LW Laser 1,  dT1,  X1,  pos1,  ConvergenceWord,  ChillerT,  CV1_Volts
3564172800.622330,264.04,-241.44,0.00,0.00,60.06500,306.52790,0.00000,295.70950,-99.99000,-99.99000,12716,0,0,0,0,,-24.82610,851.59140,0.00000,.000,3.838e1,134.513,1,0.0000,1.5000
3564172801.622420,264.14,-241.61,0.00,0.00,60.11341,306.52580,0.00000,295.71350,-99.99000,-99.99000,12716,0,0,0,0,,-24.82602,850.31560,0.00000,.000,3.624e1,134.513,1,0.0000,1.5000
3564172802.622330,263.98,-241.47,0.00,0.00,60.11343,306.52690,0.00000,295.71390,-99.99000,-99.99000,12716,0,0,0,0,,-24.82516,851.21010,0.00000,.000,3.568e1,134.513,1,0.0000,1.5000
3564172803.622240,263.98,-241.55,0.00,0.00,59.96580,306.52460,0.00000,295.71820,-99.99000,-99.99000,12716,0,0,0,0,,-24.82532,850.68480,0.00000,.000,3.932e1,134.513,1,0.0000,1.5000
"""
D = CSV.read(IOBuffer(src);delim=",",types=coltypes,header=header,datarow=3)

PS. pls move header definition above 3rd line! :slight_smile:


#3

It turned out to be an issue with the type I was using.

coltypes = Any[Float64 for i=1:length(header)]
coltypes[12] = Int
coltypes[17] = Union{Missing,String}