Hi, I’m learning my way around DataFrames.jl and CSV.jl and am having some difficulty understanding the behaviour of the following code;
using DataFrames, CSV
x = rand(5, 5)
A = DataFrame(x, :auto)
A[2:4, 2:4] .= missing
This code raises the error 'MethodError: Cannot 'convert' an object of type Missing to an object of type Float64'.
Elsewhere with real data I’m attempting to replace a block of erroneous entries in a dataframe with nothing to allow for easy interpolation using Impute.interp(). I’ve found it’s possible to do so using nested for loops, but I’m wondering why the broadcasting approach doesn’t work, since it seems more idiomatic.
I don’t see how it would work better with for loops: the problem is that A has columns of type Float64 which cannot contain the missing value. You can fix it by calling allowmissing!(A) before the last line.
(I guess you meant missing instead of nothing? If you really want to use nothing I think you’ll have to construct the columns manually with type Union{Nothing,Float64}.)
Yep, missing is what I had in mind. Thankyou. Is there any advantage to returning the type of the columns of A to Float64 once the missing values have been imputed?
How exactly did you intend to do so? If the Vector has missing values then it cannot be of type Vector{Float64}, it must be of type Vector{Union{Missing, Float64}}. An alternative would be using NaN instead of missing, as NaN is a Float64 value, but NaN and missing have different meanings and behaviors.
That’s very simple. Impute.interp() replaces each of the missing values with a real value, determined by linearly interpolating between the known values that bound the missing region. The end result is a dataframe that contains no missing entries.