Still another type conversion problem

I have created a DataFrame with a column “C” full of missing values which I intend to fill in afterwards with Floats64.

I try to convert column “C” to Union{Float64, Missing} but Julia does not convert anything.

The result is that I cannot fill-in column “C” with my calculated values.

Can you provide a workaround?

I can see that many other people have had this problem in the last years.

Julia, or more specifically broadcasting, was not successfully informed that the column type should change.

julia> df[!, "C"]
3-element Vector{Missing}:
 missing
 missing
 missing

julia> convert.(Union{Float64, Missing}, df[!, "C"])
3-element Vector{Missing}:
 missing
 missing
 missing

The convert.(...) line is equivalent to broadcast(convert, ...), and that infers the output array type from the elementwise convert. Since all the missing are converted to missing, it inferred the output array type is Missing.

You can instantiate a vector with your intended element type, and while there is in-place broadcasting, in this case you can directly push values from another vector:

julia> append!(Union{Float64, Missing}[], df[!, "C"])
3-element Vector{Union{Missing, Float64}}:
 missing
 missing
 missing

There is in fact a convert method for arrays that does this for you, and this one just forwards to the Array constructor that takes an Array input:

julia> convert(Vector{Union{Float64, Missing}}, df[!, "C"])
3-element Vector{Union{Missing, Float64}}:
 missing
 missing
 missing

julia> Vector{Union{Float64, Missing}}(df[!, "C"])
3-element Vector{Union{Missing, Float64}}:
 missing
 missing
 missing

Now that will work for the DataFrame:

julia> df[!, "C"] = convert(Vector{Union{Float64, Missing}}, df[!, "C"])
3-element Vector{Union{Missing, Float64}}:
 missing
 missing
 missing

julia> df
3×3 DataFrame
 Row │ A      B       C
     │ Int64  String  Float64?
─────┼─────────────────────────
   1 │     1  a        missing
   2 │     2  b        missing
   3 │     3  c        missing

julia> df[1,"C"] = 4.6
4.6

You likely want to make a similar adjustment at the instantiation of df so you don’t have a Vector{Missing} in the first place, but I can’t comment on what isn’t shown. In the future, share your code as text that readers can copy and run successfully. This dataframe was simple enough to reproduce from the screenshot, but that won’t usually be the case so the minimal working examples help a lot.

3 Likes

Excellent explanation, Benny … and very complete !!!

I’m sorry for not having shared my code, even if it is no longer necessary, here it is:

using DataFrames

# Create a sample DataFrame
df = DataFrame(A = 1:3, B = ["a", "b", "c"])

# Add a new column "C"
df[:, "C"] .= missing

# Print the modified DataFrame
println(df)

On creating column “C” you suggest that I can already define the type that I want for my column. This seems to work for me, unless you have other suggestions:

using DataFrames

# Create a sample DataFrame
df = DataFrame(A = 1:3, B = ["a", "b", "c"])

# Add a new column "C"
df[:, "C"] = Vector{Union{Float64, Missing}}(missing, nrow(df))

# Print the modified DataFrame
println(df)

The syntax

Vector{Union{Float64, Missing}}(missing, nrow(df))

… works perfectly, but if I try to use it somehow differently, it does not work

Vector{Union{Float64, Missing}}(7.3, 3)

I get an error:

MethodError: no method matching Vector{Union{Missing, Float64}}(::Float64, ::Int64)
The type `Vector{Union{Missing, Float64}}` exists, but no method is defined for this combination of argument types when trying to construct it.

I’ll try to understand why it doesn’t work.

Thank you again, Benny

Basically it doesn’t work because there wasn’t an Array method defined for variable arguments. It’s actually very logical for a constructor method, but array literal syntax (implemented by a variety of functions like getindex, typed_hcat, etc) already exists for this purpose:

julia> Union{Float64, Missing}[7.3, 3]
2-element Vector{Union{Missing, Float64}}:
 7.3
 3.0

As for why there’s array literals instead of just calling constructor methods, the syntax allows easier instantiation of multidimensional arrays, and Julia was created to support these from the start, much like MATLAB.

The Missings.jl package, which is also used and exported by DataFrames.jl, has a convenience function that lets you specify the other types and mirrors Base’s zeros and ones for the dimensions:

julia> missings(Float64, 3)
3-element Vector{Union{Missing, Float64}}:
 missing
 missing
 missing

I’d generally lean toward these specific convenience functions. The array constructor methods are plenty capable, but that’s kind of the problem; it’s hard to tell what they can or can’t do sometimes, and wider capability requires more writing, which gets repetitive in specific contexts.

1 Like