Trapped in a int -> missing -> float loop

joa-quim · December 12, 2022, 11:14pm

Let’s say that I have a dataframe like

 df = DataFrame(a=[ 1, 4, missing], b= [1, 4, 5], c=[1., 2, 3])
3×3 DataFrame
 Row │ a        b      c
     │ Int64?   Int64  Float64
─────┼─────────────────────────
   1 │       1      1      1.0
   2 │       4      4      2.0
   3 │ missing      5      3.0

and want to extract all data as matrix of doubles, but can’t.

Matrix{Float64}(df)
ERROR: ArgumentError: cannot convert a DataFrame containing missing values to Matrix{Float64} (found for column a)

The problem seems to be related to fact that one cannot (or better, I cannot find a way to)

x = [1, 2, missing];

x[3] = NaN
ERROR: InexactError: Int64(NaN)

 Float64.(x)
ERROR: MethodError: no method matching Float64(::Missing)

So, we cant replace the missing by a float because the vector is of Integer type and can’t convert to float because of the missing

Any way out of this trap (other that making column copies and loop-with-ifs my way out of this)?

rafael.guerra · December 12, 2022, 11:19pm

Matrix(df) is not working?

joa-quim · December 12, 2022, 11:24pm

Sorry, forgot a detail. I need to get read of the missing’s and Matrix(df) keeps them

Matrix(df)
3×3 Matrix{Union{Missing, Float64}}:
 1.0       1.0  1.0
 4.0       4.0  2.0
  missing  5.0  3.0

rafael.guerra · December 12, 2022, 11:25pm

Use coalesce.

joa-quim · December 12, 2022, 11:30pm

Hmm, how?

coalesce(Matrix(df), missing)
3×3 Matrix{Union{Missing, Float64}}:
 1.0       1.0  1.0
 4.0       4.0  2.0
  missing  5.0  3.0

and again sorry, not yet full info. I need to replace the missing’s by NaN because result is intended to be sent to C lib.

joa-quim · December 12, 2022, 11:34pm

OK, contrieved but I can do

replace!(Matrix(df), missing => NaN)
3×3 Matrix{Union{Missing, Float64}}:
   1.0  1.0  1.0
   4.0  4.0  2.0
 NaN    5.0  3.0

rafael.guerra · December 12, 2022, 11:34pm

Try
coalesce.(df, NaN)

joa-quim · December 12, 2022, 11:35pm

Thanks, that’s better as the result is directly a simple plain matrix

Matrix(coalesce.(df, NaN))
3×3 Matrix{Float64}:
   1.0  1.0  1.0
   4.0  4.0  2.0
 NaN    5.0  3.0

blackeneth · December 13, 2022, 6:05am

This is why I’ve said in that “missing” is an integer.

Whereas NaN is often used as the missing for floating point.

They represent the same concept but don’t interoperate.

I don’t know if it would be useful or harmful to define conversions like:

Float64(missing) == NaN
(and similar for NaN64, NaN32, NaN16)

Int64(NaN) == missing
(and similar for Int32, Int16)

isnan(missing) == true

ismissing(NaN) == true

uniment · December 13, 2022, 7:53am

Interesting idea. I started this comment disagreeing with you, but as I was typing I realized I was wrong.

NaN and missing are indeed the same concept.

NaN is the result of 0/0 and Inf/Inf. In real life we’d solve such problems using L’Hôpital’s rule and get the numeric value, but the computer doesn’t always get to do this so the numeric value is unknown.

NaN is a misnomer: not a number. But it actually is a number; just a number whose value we don’t know.

Which is exactly what missing means when in numeric contexts.

nilshg · December 13, 2022, 10:40am

Probably not helpful rehashing this here, but there’s been a lot of discussion on this when missing first came around and since then, and the reason is exists is precisely because it is meant to be distinct from NaN (and nothing). Here’s an old SO question:

But if you look around you’ll certainly find loads more.

Topic		Replies	Views
Debugging a script to replace NaN or -999 values with missing values General Usage question	6	361	October 22, 2021
I have a DataFrame with multiple columns of type Union{Missing, String}. What is the most concise manner of converting the non-missing values in Float? General Usage	2	586	January 29, 2021
Broadcasting nothing to DataFrame entries raises MethodError New to Julia	5	266	August 15, 2021
Replace missing with 0.0 in dataframe General Usage question	10	4888	December 9, 2019
Vector of missing and float General Usage question , missing-values	10	930	March 20, 2023

Trapped in a int -> missing -> float loop

Related topics