String31 in dataframe

Hello, I have a 2 columns data frame. One column is Int64, which is fine. The other should be float64 I think, because it has numbers such as: 0.1254, 0.8458 etc. However, I have strange results when plotting and I realyze that typeof() returns “string31” for that float column. I guess there is some data points that are interpreted as string. Now, the problem is that the column contains 2 millions data points. How do I find the offending value(s)? I don’t even know where to start.
Thank you

Check out the types section of the documentation. You probably want something like types = [Int64, Float64] as a keyword argument. Failures will be returned as nothing, I think.

Sorry, maybe I was not clear, what I want is not to change type or get rid of them (or at least not at first), I want to find the “offending” values to look at them. They might be errors, or they might not and be significant. This is what bugs me.

Check out which ones in the resulting data frame parse to nothing and see what row they are in.

You can also parse as a String, then call tryparse on the column to make a new one that simulates what CSV.jl is doing, then filter all the observations that have nothing in the float variable.

1 Like

thanks, here is what I did (while you were answering). I thought that probably it wouldn’t succeed in parsing (okay it was just a bet).

df2.VAF2=parse.(Float64, df2.VAF)
ArgumentError: cannot parse String31(" 0.21659,0.294931") as Float64

I have TWO values separated by “,”. This makes actual sense, and means I have a decision to take upstream in my pipeline! I know what’s causing this.

Thanks :slight_smile: