DataFrame has NA, what best to do?


#1

I read in a dataframe with CSV.read that contains some “NA” elements. CSV.read categorized those columns with NA’s as String. So it’s easy for me to remove the rows with the NA’s by comparing the string of every element with “NA”. But then the columns remain of type String. How can I convert them into Float64? Or is there a better to do this?


#2
x = ["1.61", "1", "2"]
x_float = parse.(Float64, x)

#3

Thank you. That was very simple.

I was trying to give types=Dict(:Col1=Union{Float64, Missing}) to CSV.read, and then use ismissing to check every element. That seems to work, but I get a bunch of warnings like
warning: failed parsing Float64 on row=234, col=3, error=INVALID: SENTINEL, DELIMITED, INVALID_DELIMITER Not sure what that means.


#4

can you try to use CSVFiles.jl and FileIO.jl instead? Perhaps they may be more robust?

You can also try missingstrings = "NA" in the CSV.read?


#5

Thank you very much again. missingstrings = "NA" got rid of those warnings.

Can you elaborate on why CSVFiles.jl and FileIO.jl might be more robust?


#6

CSVFiles is not tied to any particular backend. So i think of it as having curated the right csv reader. So more likely to be robust vs someone choosing a csv reader and is new to Julia.


#7

I don’t think this is quite right. It’s tied toFileIO.jl and uses TextParse.jl as its backend. In my experience, both packages work very well for most situations though they have slightly different interfaces - sometimes I think one of them doesn’t do something only to find out later I was using the wrong keyword.

There are a small number (decreasing all the time) of things that one does better than the other, so it is often worth giving both a shot. I tend to use CSV for everything for the pedantic reason that I don’t like using two packages for one functionality.


#8

Whats to stop CSVfiles from switching to another backend without the user noticing? Why did it choose textparse? Cos it tested a few of them and choose it. Of course it has to choose a backend but which one is almost something the user doesnt have to know


#9

That’s true - perhaps I misunderstood your point.