Accessing data in a CSV row?

i’m at a loss. i’m using CSV just as row reader, and i find i can’t actually access any of the data in the row.

ERROR: LoadError: MethodError: no method matching getindex(::CSV.Row{CSV.File{false,true,Parsers.BufferedIO{IOStream},Parsers.Delimited{false,true,Parsers.Quoted{Parsers.Strip{Parsers.Sentinel{typeof(Parsers.defaultparser),Parsers.Trie{0x00,false,missing,2,Tuple{}}}}},Parsers.Trie{0x00,false,missing,8,Tuple{Parsers.Trie{0x2c,true,missing,8,Tuple{}}}}},NamedTuple{(),Tuple{}}}}, ::Int64)

doesn’t work

for x in row
ERROR: LoadError: MethodError: no method matching iterate(::CSV.Row{CSV.File{false,true,Parsers.BufferedIO{IOStream},Parsers.Delimited{false,true,Parsers.Quoted{Parsers.Strip{Parsers.Sentinel{typeof(Parsers.defaultparser),Parsers.Trie{0x00,false,missing,2,Tuple{}}}}},Parsers.Trie{0x00,false,missing,8,Tuple{Parsers.Trie{0x2c,true,missing,8,Tuple{}}}}},NamedTuple{(),Tuple{}}}})

doesn’t work.

Then I realized I don’t actually know how to figure out what a row is and how i can access the elements of a row. I even went to github and looked at the code and i still can’t figure it out…

You access individual values via getproperty, so for example:

# read in csv file w/ two columns named "columnA" and "columnB"
file = CSV.File(filename)
for row in file
    println(row.columnA, row.columnB)

Oh, and you can see all the column names by doing propertynames(row), so you could iterate through all the values of a row by doing:

for column in propertynames(row)
    value = getproperty(row, column)

yes. well. silly me didn’t assign column names. so i thought if i didn’t assign column names, then i would get back something that would simply respond to numeric indices.

so i added the column labels and solved that problem, although not before i tried row[“name”] and found that didn’t work either.

btw, is there anyway to force a missing value to be interpreted as “”, i.e. the empty string ?

thank you!

p.s. how do you handle missing value when trying to assign types ?
if i know column1 is a float and column 2 might be missing OR a string, i would think i could do
types=[Float64, Union{Missing,String}]
but that doesn’t seem to work.

If your file doesn’t have column names, they’ll be automatically generated like Column1,Column2,Column3 etc. You can also just pass them in yourself by doing CSV.File(filename; header=["Column1", "Column2"], data=1).

For treating missing values as empty string, you could do:

using CSV, Tables
file = CSV.File(filename) |> Tables.transform(Column2=x->coalesce(x, ""))
for row in file
    # do stuff with row

What was the error when you tried types=[Float64, Union{Missing, String}]? I think that should work, but there might be a bug. In general, you can use the allowmissing keyword argument to control how things should be treated for an entire file: allowmissing=:all is the default and will make every column a Union{T, Missing} type; alternatively, you can do allowmissing=:auto and the type will only be Union{T, Missing} if null values are detected. The third option is allowmissing=:none which will error if any missing values are detected in the file.

ERROR: LoadError: MethodError: Cannot convert an object of type Missing to an object of type String

I used allowmissing=:auto. The documentation doesn’t say which is default.


p.s. oh, good grief. while creating a MWE, i found that Union{Missing,String} DOES work. i have no idea why it’s crashing on my real file. I’m investigating.

Yeah, let me know if you can boil it down w/ a specific file. We can create an issue at the CSV.jl repo w/ the file or I can provide you w/ a way to share the file w/ me privately.

very sneaky. buried in the file was another missing value in one of the other string columns (i knew one of the columns had missing values, but there were just a scattering of missing values in another column and much farther down a very large file, so i didn’t seem them the first time i checked through the file).

problem solved, everything is as it should be :slight_smile:

thanks for your help!

How can access whole row in a single statement and convert to a single string if necessary?

Can you be more specific? I’m not quite sure what you’re asking for here.