Csv error reading numbers as string

Hello all,

I have always used csv.read for my csv files but now after the new julia update , I have troubles with csv.read
it sometimes reads numbers as integers and sometime doesnot
how do I avoid this problem ?

thanks

You can specify types for the columns

types = Dict(
    :columnA => String,
    :columnB => Float64
)

data = DataFrame(CSV.File("data.csv"; types))

More info in the documentation:

Getting started

Providing-Types

Typemap

4 Likes

it worked one time the this the error I got

Can you post an MWE of your error? The error message seems to suggest that you passed a keyword argument type_buses which doesn’t exist.

type_buses= Dict(
    :bus => Int16,
    :type => String,
    :bh => Float64,
    :v => Float64,
    :delta => Float64,
    :pg => Float64,
    :qg => Float64,
    :pd => Float64,
    :qd => Float64,
    :pgmax => Float64,
    :pgmin => Float64,
    :qgmax => Float64,
    :qgmin => Float64,
)

this is how I defined types_buses from my code and I also add it in the command of CSV.file

BUSES_DATAFRAME = DataFrame(CSV.File("D:/M A S T E R S S S S S S/chapter_4/imp/14bus/B14.csv"; type_buses))

types is a keyword argument, not a positional argument, so you need to name it. Here’s a full MWE:

julia> using CSV, DataFrames

julia> df = DataFrame(rand(2, 3), :auto);

julia> CSV.write("out.csv", df);

julia> type_buses = Dict(:x1 => Float32, :x2 => String, :x3 => Float64);

julia> CSV.read("out.csv", DataFrame; types = type_buses)
2×3 DataFrame
 Row │ x1        x2                    x3       
     │ Float32   String                Float64  
─────┼──────────────────────────────────────────
   1 │ 0.539608  0.050591523119873916  0.750995
   2 │ 0.396338  0.48258399391743123   0.463351

Note that I write types = type_buses to specify the kwarg. In the example above, things worked because the dictionary had the same name as the kwarg:

julia> types = type_buses;

julia> CSV.read("out.csv", DataFrame; types)
2×3 DataFrame
 Row │ x1        x2                    x3       
     │ Float32   String                Float64  
─────┼──────────────────────────────────────────
   1 │ 0.539608  0.050591523119873916  0.750995
   2 │ 0.396338  0.48258399391743123   0.463351

This fails if the name of the dict doesn’t match the kwarg name:

julia> CSV.read("out.csv", DataFrame; type_buses)
ERROR: MethodError: no method matching CSV.File(::CSV.Header{false, Parsers.Options{false, true, true, false, Missing, UInt8, Nothing}, Vector{UInt8}}; debug=false, typemap=Dict{Type, Type}(), type_buses=Dict{Symbol, DataType}(:x2 => String, :x3 => Float64, :x1 => Float32))
Closest candidates are:
  CSV.File(::CSV.Header; finalizebuffer, startingbyteposition, endingbyteposition, limit, threaded, typemap, tasks, lines_to_check, maxwarnings, debug) at /home/nils/.julia/packages/CSV/la2cd/src/file.jl:221 got unsupported keyword argument "type_buses"

Which is the error you’re seeing.

I think what you’re seeing here is a side effect of the new named tuple auto expansion introduced in 1.5 (?), so this might not even work on older versions.

In any case I’d say it’s always best to be explicit with kwargs and spell them out - after all, that’s what they are for.

1 Like

And more to your original question: I haven’t seen any regressions in CSV.jl’s capability of detecting number types automatically in recent releases, and if there are any, that might well be a bug.

Can you share the file for which an older version of CSV.jl successfully detected the correct type, but the latest version fails to do so?

I am now more confused :worried:…excuse me can you make it simpler

Sorry, that wasn’t the intention of course - the above is a self contained example, so you can execute it line for line and if there’s something you don’t understand feel free to check back in.

Alternatively, just change the function call you posted above:

DataFrame(CSV.File("D:/M A S T E R S S S S S S/chapter_4/imp/14bus/B14.csv"; type_buses))

to

DataFrame(CSV.File("D:/M A S T E R S S S S S S/chapter_4/imp/14bus/B14.csv"; types = type_buses))

and things should work.

I should also note that CSV.read(file, DataFrame) is the same as DataFame(CSV.File(file)) just in case that added to your confusion!

1 Like

Yes it worked…Thanks a ton…I have one problem remaining…Line 15 which is the last line in the csv file is not read properly. I find all values corresponding to that line as missing in the DF

This is very hard to debug without the file itself, although I would assume in this case that you are trying to force types with passing types = type_buses, and the last line can’t be read in the format you are specifying (which is probably the reason why CSV.read didn’t infer the types originally).

What happens if you just do:

CSV.read("D:/M A S T E R S S S S S S/chapter_4/imp/14bus/B14.csv", DataFrame)

when I do this, I find that all numbers are read as stringssss :frowning:

my file has a header and numbers …nothing else Can I share part of it here

From this it seems there is no line 15? You have a header, and 14 rows of data, why are you expecting 15 rows in your DataFrame?

3 Likes

yes you are right…it seems like I became short sight after spending hours in front of the computer to solve this issue…Now its solved Thanks a million :slight_smile: :slight_smile: :smiley:

2 Likes