Read file with CSV.read

Hello everyone,

I’m having trouble reading a simple file with CSV.read.
I have the following data:

1.0 2.462558e-04 11 -1.18791031e-04 +1.18791031e-04 +8.96777973e+02 +3.88470836e+02
1.0 2.462558e-04 12 +1.18790872e-04 -1.18790872e-04 -8.96777979e+02 -3.88470836e+02 
1.0 2.462558e-04 21 +1.18790871e-04 -1.18790871e-04 +8.40080497e+02 +3.20800442e+02
1.0 2.462558e-04 22 -1.18791028e-04 +1.18791028e-04 -8.40080491e+02 -3.20800447e+02

which I put in a file test.dat
when I run

CSV.read("test.dat" ; datarow=1, delim=' ')

I get

ERROR: ArgumentError: data row (1) must come after header row (1)
Stacktrace:
 [1] #Source#12(::String, ::CSV.Options, ::Int64, ::Int64, ::Array{DataType,1}, ::Bool, ::Bool, ::Int64, ::Int64, ::Int64, ::Bool, ::Type{T} where T) at /home/omerchiers/.julia/v0.6/CSV/src/Source.jl:49
 [2] (::Core.#kw#Type)(::Array{Any,1}, ::Type{CSV.Source}) at ./<missing>:0
 [3] #Source#11(::Char, ::UInt8, ::UInt8, ::String, ::Int64, ::Int64, ::Array{DataType,1}, ::Bool, ::Bool, ::DateFormat{Symbol("yyyy-mm-dd"),Tuple{Base.Dates.DatePart{'y'},Base.Dates.Delim{Char,1},Base.Dates.DatePart{'m'},Base.Dates.Delim{Char,1},Base.Dates.DatePart{'d'}}}, ::Int64, ::Int64, ::Int64, ::Bool, ::Type{T} where T, ::String) at /home/omerchiers/.julia/v0.6/CSV/src/Source.jl:25
 [4] (::Core.#kw#Type)(::Array{Any,1}, ::Type{CSV.Source}, ::String) at ./<missing>:0
 [5] #read#29(::Bool, ::Dict{Int64,Function}, ::Array{Any,1}, ::Function, ::String, ::Type{T} where T) at /home/omerchiers/.julia/v0.6/CSV/src/Source.jl:294
 [6] (::CSV.#kw##read)(::Array{Any,1}, ::CSV.#read, ::String, ::Type{T} where T) at ./<missing>:0 (repeats 2 times)

If instead I change the file to

# Nothing here
1.0 2.462558e-04 11 -1.18791031e-04 +1.18791031e-04 +8.96777973e+02 +3.88470836e+02
1.0 2.462558e-04 12 +1.18790872e-04 -1.18790872e-04 -8.96777979e+02 -3.88470836e+02 
1.0 2.462558e-04 21 +1.18790871e-04 -1.18790871e-04 +8.40080497e+02 +3.20800442e+02
1.0 2.462558e-04 22 -1.18791028e-04 +1.18791028e-04 -8.40080491e+02 -3.20800447e+02

And type in the repl

CSV.read("test.dat" ; datarow = 2, delim=' ')

I get:

4Γ—2 DataFrames.DataFrame
β”‚ Row β”‚ #Nothing    β”‚ here         β”‚
β”œβ”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚ 1   β”‚ 1.0         β”‚ 0.000246256  β”‚
β”‚ 2   β”‚ 11.0        β”‚ -0.000118791 β”‚
β”‚ 3   β”‚ 0.000118791 β”‚ 896.778      β”‚
β”‚ 4   β”‚ 388.471     β”‚ 1.0          β”‚

Neither of which is what I want obviously.
I’m on julia v0.6.2
CSV 0.1.5
DataFrames 0.10.1

As a side note, I did Pkg.update(), but somehow the system does not update to DataFrames 0.11

Many thanks in advance,
Olivier

This works for me with your sample data:

julia> CSV.read("foo.csv"; header=false, delim=' ', types=fill(Float64,7))
4Γ—7 DataFrames.DataFrame
β”‚ Row β”‚ Column1 β”‚ Column2     β”‚ Column3 β”‚ Column4      β”‚ Column5      β”‚
β”œβ”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚ 1   β”‚ 1.0     β”‚ 0.000246256 β”‚ 11.0    β”‚ -0.000118791 β”‚ 0.000118791  β”‚
β”‚ 2   β”‚ 1.0     β”‚ 0.000246256 β”‚ 12.0    β”‚ 0.000118791  β”‚ -0.000118791 β”‚
β”‚ 3   β”‚ 1.0     β”‚ 0.000246256 β”‚ 21.0    β”‚ 0.000118791  β”‚ -0.000118791 β”‚
β”‚ 4   β”‚ 1.0     β”‚ 0.000246256 β”‚ 22.0    β”‚ -0.000118791 β”‚ 0.000118791  β”‚

β”‚ Row β”‚ Column6  β”‚ Column7  β”‚
β”œβ”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚ 1   β”‚ 896.778  β”‚ 388.471  β”‚
β”‚ 2   β”‚ -896.778 β”‚ -388.471 β”‚
β”‚ 3   β”‚ 840.08   β”‚ 320.8    β”‚
β”‚ 4   β”‚ -840.08  β”‚ -320.8   β”‚
1 Like

This worked for me (Julia 0.6.2) :

julia> using CSV
INFO: Recompiling stale cache file C:\Users\mcallistst\.julia\lib\v0.6\CSV.ji fo
r module CSV.

julia> readdlm("test.dat",' ')
4x7 Array{Float64,2}:
 1.0  0.000246256  11.0  -0.000118791   0.000118791   896.778   388.471
 1.0  0.000246256  12.0   0.000118791  -0.000118791  -896.778  -388.471
 1.0  0.000246256  21.0   0.000118791  -0.000118791   840.08    320.8
 1.0  0.000246256  22.0  -0.000118791   0.000118791  -840.08   -320.8

julia> CSV.read("test.dat",delim = ' ')
3x7 DataFrames.DataFrame. Omitted printing of 2 columns
β”‚ Row β”‚ 1.0 β”‚ 2.462558e-04 β”‚ 11 β”‚ -1.18791031e-04 β”‚ +1.18791031e-04 β”‚
β”œβ”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚ 1   β”‚ 1.0 β”‚ 0.000246256  β”‚ 12 β”‚ 0.000118791     β”‚ -0.000118791    β”‚
β”‚ 2   β”‚ 1.0 β”‚ 0.000246256  β”‚ 21 β”‚ 0.000118791     β”‚ -0.000118791    β”‚
β”‚ 3   β”‚ 1.0 β”‚ 0.000246256  β”‚ 22 β”‚ -0.000118791    β”‚ 0.000118791     β”‚

julia> CSV.read("test.dat",delim = ' ',datarow=1)
4x7 DataFrames.DataFrame. Omitted printing of 2 columns
β”‚ Row β”‚ Column1 β”‚ Column2     β”‚ Column3 β”‚ Column4      β”‚ Column5      β”‚
β”œβ”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚ 1   β”‚ 1.0     β”‚ 0.000246256 β”‚ 11      β”‚ -0.000118791 β”‚ 0.000118791  β”‚
β”‚ 2   β”‚ 1.0     β”‚ 0.000246256 β”‚ 12      β”‚ 0.000118791  β”‚ -0.000118791 β”‚
β”‚ 3   β”‚ 1.0     β”‚ 0.000246256 β”‚ 21      β”‚ 0.000118791  β”‚ -0.000118791 β”‚
β”‚ 4   β”‚ 1.0     β”‚ 0.000246256 β”‚ 22      β”‚ -0.000118791 β”‚ 0.000118791  β”‚

1 Like

Thanks for the reply,

Unfortunately, this does not work either on my machine:

CSV.read("test.dat"; header=false, delim=' ', types=fill(Float64,7))
4Γ—7 DataFrames.DataFrame
β”‚ Row β”‚ Column1 β”‚ Column2     β”‚ Column3     β”‚ Column4      β”‚ Column5      β”‚ Column6      β”‚ Column7  β”‚
β”œβ”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚ 1   β”‚ 1.0     β”‚ 0.000246256 β”‚ 11.0        β”‚ -0.000118791 β”‚ 0.000118791  β”‚ 896.778      β”‚ 388.471  β”‚
β”‚ 2   β”‚ 1.0     β”‚ 0.000246256 β”‚ 12.0        β”‚ 0.000118791  β”‚ -0.000118791 β”‚ -896.778     β”‚ -388.471 β”‚
β”‚ 3   β”‚ #NULL   β”‚ 1.0         β”‚ 0.000246256 β”‚ 21.0         β”‚ 0.000118791  β”‚ -0.000118791 β”‚ 840.08   β”‚
β”‚ 4   β”‚ 320.8   β”‚ 1.0         β”‚ 0.000246256 β”‚ 22.0         β”‚ -0.000118791 β”‚ 0.000118791  β”‚ -840.08  β”‚

On top of that, I would like to have Ints, for the third column.

Many thanks for the reply,

But none, of the solutions, except the readdlm version, works :

Solution 1:

CSV.read("test.dat",delim = ' ')
ERROR: CSV.CSVError("error parsing a `Int64` value on column 3, row 2; encountered '.'")
Stacktrace:
 [1] checknullend at /home/omerchiers/.julia/v0.6/CSV/src/parsefields.jl:56 [inlined]
 [2] parsefield at /home/omerchiers/.julia/v0.6/CSV/src/parsefields.jl:127 [inlined]
 [3] parsefield at /home/omerchiers/.julia/v0.6/CSV/src/parsefields.jl:107 [inlined]
 [4] streamfrom(::CSV.Source, ::Type{DataStreams.Data.Field}, ::Type{Nullable{Int64}}, ::Int64, ::Int64) at /home/omerchiers/.julia/v0.6/CSV/src/Source.jl:195
 [5] streamto!(::DataFrames.DataFrame, ::Type{DataStreams.Data.Field}, ::CSV.Source, ::Type{Nullable{Int64}}, ::Type{Nullable{Int64}}, ::Int64, ::Int64, ::DataStreams.Data.Schema{true}, ::Base.#identity) at /home/omerchiers/.julia/v0.6/DataStreams/src/DataStreams.jl:173
 [6] stream!(::CSV.Source, ::Type{DataStreams.Data.Field}, ::DataFrames.DataFrame, ::DataStreams.Data.Schema{true}, ::DataStreams.Data.Schema{true}, ::Array{Function,1}) at /home/omerchiers/.julia/v0.6/DataStreams/src/DataStreams.jl:187
 [7] #stream!#5(::Array{Any,1}, ::Function, ::CSV.Source, ::Type{DataFrames.DataFrame}, ::Bool, ::Dict{Int64,Function}) at /home/omerchiers/.julia/v0.6/DataStreams/src/DataStreams.jl:151
 [8] stream!(::CSV.Source, ::Type{DataFrames.DataFrame}, ::Bool, ::Dict{Int64,Function}) at /home/omerchiers/.julia/v0.6/DataStreams/src/DataStreams.jl:145
 [9] #read#29(::Bool, ::Dict{Int64,Function}, ::Array{Any,1}, ::Function, ::String, ::Type{T} where T) at /home/omerchiers/.julia/v0.6/CSV/src/Source.jl:299
 [10] (::CSV.#kw##read)(::Array{Any,1}, ::CSV.#read, ::String, ::Type{T} where T) at ./<missing>:0 (repeats 2 times)

Option 2:

CSV.read("test.dat",delim = ' ',datarow=1)
ERROR: ArgumentError: data row (1) must come after header row (1)
Stacktrace:
 [1] #Source#12(::String, ::CSV.Options, ::Int64, ::Int64, ::Array{DataType,1}, ::Bool, ::Bool, ::Int64, ::Int64, ::Int64, ::Bool, ::Type{T} where T) at /home/omerchiers/.julia/v0.6/CSV/src/Source.jl:49
 [2] (::Core.#kw#Type)(::Array{Any,1}, ::Type{CSV.Source}) at ./<missing>:0
 [3] #Source#11(::Char, ::UInt8, ::UInt8, ::String, ::Int64, ::Int64, ::Array{DataType,1}, ::Bool, ::Bool, ::DateFormat{Symbol("yyyy-mm-dd"),Tuple{Base.Dates.DatePart{'y'},Base.Dates.Delim{Char,1},Base.Dates.DatePart{'m'},Base.Dates.Delim{Char,1},Base.Dates.DatePart{'d'}}}, ::Int64, ::Int64, ::Int64, ::Bool, ::Type{T} where T, ::String) at /home/omerchiers/.julia/v0.6/CSV/src/Source.jl:25
 [4] (::Core.#kw#Type)(::Array{Any,1}, ::Type{CSV.Source}, ::String) at ./<missing>:0
 [5] #read#29(::Bool, ::Dict{Int64,Function}, ::Array{Any,1}, ::Function, ::String, ::Type{T} where T) at /home/omerchiers/.julia/v0.6/CSV/src/Source.jl:294
 [6] (::CSV.#kw##read)(::Array{Any,1}, ::CSV.#read, ::String, ::Type{T} where T) at ./<missing>:0 (repeats 2 times)

Could this be a problem of my DataFrames version?
I could use readdlm in the meanwhile, but I prefer the cleaner CSV option, since it is the one that will be supported in the long run.

Many thanks in advance.
Olivier

I think this works with https://github.com/davidanthoff/CSVFiles.jl:

using FileIO, CSVFiles, DataFrames

 load("data.csv", spacedelim=true, header_exists=false) |> DataFrame

Make sure you do a Pkg.update() first, the underlying parser GitHub - queryverse/TextParse.jl: A bunch of fast text parsing tools only recently got support for white space delimited files.

1 Like

DataFrames won’t be upgraded until all packages you have installed that depend on it support version 0.11. Until then, better use readtable or CSVFiles. See DataFrames 0.11 released for more details.

1 Like

Thanks to all of you for your help!
Olivier

Thanks! very helpful to read the data as a DataFrame!