How to replace all values in an array with missing values?

leon · January 30, 2022, 2:36pm

I have an array A with the size of 360 x 180 x 100. The current missing value indicator is NaN. My goal is to replace all NaN values with missing values.

Below is my code:
A[isnan.(A)] .= missing;

Unfortunately, I got a ton of errors as below:

ERROR: LoadError: MethodError: Cannot `convert` an object of type Missing to an object of type Float64
Closest candidates are:
  convert(::Type{T}, ::Base.TwicePrecision) where T<:Number at /Applications/Julia-1.7.app/Contents/Resources/julia/share/julia/base/twiceprecision.jl:262
  convert(::Type{T}, ::AbstractChar) where T<:Number at /Applications/Julia-1.7.app/Contents/Resources/julia/share/julia/base/char.jl:185
  convert(::Type{T}, ::CartesianIndex{1}) where T<:Number at /Applications/Julia-1.7.app/Contents/Resources/julia/share/julia/base/multidimensional.jl:136
  ...
Stacktrace:
 [1] fill!(A::SubArray{Float64, 1, Vector{Float64}, Tuple{Vector{Int64}}, false}, x::Missing)
   @ Base ./multidimensional.jl:1062
 [2] copyto!
   @ ./broadcast.jl:921 [inlined]
 [3] materialize!
   @ ./broadcast.jl:871 [inlined]
 [4] materialize!(dest::SubArray{Float64, 1, Vector{Float64}, Tuple{Vector{Int64}}, false}, bc::Base.Broadcast.Broadcasted{Base.Broadcast.DefaultArrayStyle{0}, Nothing, typeof(identity), Tuple{Base.RefValue{Missing}}})
   @ Base.Broadcast ./broadcast.jl:868

lmiq · January 30, 2022, 2:42pm

An array of floats does not accept by default elements of type Missing. What you get is the same as this:

julia> x = zeros(3);

julia> x[1] = missing
ERROR: MethodError: Cannot `convert` an object of type Missing to an object of type Float64
...

You would need that the array was set to accept missing values, for example:

julia> x = zeros(Union{Missing,Float64},3)
3-element Vector{Union{Missing, Float64}}:
 0.0
 0.0
 0.0

julia> x[1] = missing
missing

julia> x
3-element Vector{Union{Missing, Float64}}:
  missing
 0.0
 0.0

I guess that has performance implications (bad ones), but that may or may not be relevant for your specific case.

In other words, you need to define a new array, for example with something like:

julia> x
3-element Vector{Float64}:
 NaN
   0.7031732110417592
   0.03867946256935528

julia> x_new = [ isnan(val) ? missing : val for val in x ]
3-element Vector{Union{Missing, Float64}}:
  missing
 0.7031732110417592
 0.03867946256935528

leon · January 30, 2022, 2:57pm

Many thanks!

So if I read that array from somewhere else, I’m out of luck with this approach?

I really hope Julia could fix this fundamental issue.

lmiq · January 30, 2022, 3:00pm

I think there is no way to “fix” that, because you are trying to assign to the array a type of value that the array originally cannot support. That is not “fixable” without reallocating the array, which is what the comprehension does there. (you can assign the same label to new array, thus letting the previous one free to be garbage-collected, i. e.

A = [ isnan(x) ? missing : x for x in A ]

lawless-m · January 30, 2022, 3:02pm

This is not a fundamental issue, it is not considered broken, there will be no “fix”.

How did you read that array ? A language which creates an array that uses NaN to represent a missing value exposes that language as broken.

Anyway, the various modules which read in data such as CSV represent missing values as missing. DataFrames respect missing.

leon · January 30, 2022, 3:20pm

Many thanks!

I still got the same error, but was able to modify it as below to make it work:

A = replace!(x -> isnan(x) ? missing : x, Array{Union{Float64, Missing}}(A) );

To Julia designers, I still think it would be so much nice to allow people do this by writing a code like this: A[isnan.(A)] .= missing;

oheil · January 30, 2022, 3:21pm

A=convert(Vector{Union{Float64,Missing}},A)

lmiq · January 30, 2022, 3:27pm

There was a typo in my answer, fixed now. Should work

Jeff_Emanuel · January 30, 2022, 4:27pm

You can do that if you create all of your arrays as Vector{Any}, but that will defeat the performance benefits that Julia offers, as well as the type safety that keeps you from shooting yourself in the foot.

pdeffebach · January 30, 2022, 4:30pm

There is the function allowmissing(x) in Missings.jl for this.

I would just use replace

julia> x = [1, 2, 3, NaN]
4-element Vector{Float64}:
   1.0
   2.0
   3.0
 NaN

julia> replace(x, NaN => missing)
4-element Vector{Union{Missing, Float64}}:
 1.0
 2.0
 3.0
  missing

lawless-m · January 30, 2022, 5:30pm

try to think of it another way, there is nothing special about missing, it is just a value of ::Type{Missing}

A = Float64[0, 1.0, Nan, 20.1]
A[isnan.(A)] .= "It's a Nan!"

Why would we want to allow people to do that ?

oheil · January 30, 2022, 5:37pm

There has been quite some discussion about this. The special about missing is, that it is not just a miss-interpreted data point or some dirty input, but a missing observation in the statistical sense. (I didn’t search for the former discussions.)
There has been a package “Missing” if I remember right. The tenor was that this has importance so it has been made part of Julia. Not the package but a “missing” concept.

pdeffebach · January 30, 2022, 5:41pm

That’s not quite what OP is saying. missing is certainly special from a statistical perspective and through it’s propagation, accomplished through overloading basic operators.

But it is implemented very transparently, you could make you own missing type very easily with just

struct MyMissing end

and defining +, -, ==, etc.

OP means that not allowing automatic promotion in an array via indexing is part of missing being “nothing special”. Allwing

setindex!(x::AbstractArray, i, m::Missing)

to promote the array type would

Be very unsafe and make lots of users unhappy
Require making missing very special in it’s implementation such that it isn’t implemented as a normal struct.

lawless-m · January 30, 2022, 5:49pm

Yes, I know what Missing represents but we are talking about Types, in that regard it is just another Type, just like NULL in SQL.

And just like NULL in SQL, if your column is an INT and you don’t have ALLOW NULLS, you cannot insert it, your column must be Union{INT, NULL}.

oheil · January 30, 2022, 5:53pm

Ok, understood.
In this sense, Missing is just a Type like any other.

StefanKarpinski · January 30, 2022, 6:13pm

You may be interested in this talk by the famous programming language designer, Sir Tony Hoare. He is the one who originally allowed object references to be null in ALGOL, a design which was then copied by C, C++, Java, C# and so on. He refers to this as his “billion dollar mistake”. Why? Because of this choice, the type systems in these languages can never ensure that a value is actually present—it might always be null. This causes no end of problems and crashes and makes it nearly impossible to write reliable code in these languages. If you’ve ever run a program that crashes with a segmentation fault or a null pointer exception, then you’ve been the victim of this problem. It also makes using all reference types slower than necessary because the compiler also needs to constantly check for things being null and do something different in that case. So everyone is a victim.

Does this issue sound familiar? It should: it’s the exact thing you’re asking for but replacing null with missing. You are asking to be allowed store missing anywhere a Float64 is expected—and presumably the same for other types as well, like Int or String or any user-defined type. This is like Sir Hoare’s billion dollar mistake but worse because at least in C et al. a floating-point value is a primitive which cannot be null—only reference types (i.e. “objects”) can be null. What’s the problem? If Vector{Int}, for example, can contain missing values, then how can we express a vector of actual non-missing Int values? We would lose the ability to do so. Which would force everyone in all situations to program defensively against missings, much as everyone is forced to program defensively against null pointers in the aforementioned languages. Instead, Julia has different types: Vector{Int} for a vector of actual non-missing integer values and Vector{Union{Int, Missing}} for a vector of possibly missing integer values.

In short, avoiding repeating Hoare’s billion dollar mistake (and surely it’s been far more expensive than a mere billion dollars at this point) was very much intentional and not something we’re going to change.

lmiq · January 30, 2022, 6:17pm

Note that all the options don’t mutate the original array, but create a new array. IMHO the simpler syntax is that of the comprehension, but all do the same:

julia> A = zeros(3); B = A;

julia> A = replace!(x -> isnan(x) ? missing : x, Array{Union{Float64, Missing}}(A) );

julia> A === B
false

julia> A[1] = missing
missing

julia> B
3-element Vector{Float64}:
 0.0
 0.0
 0.0

Be careful with that. And all the other options do the same, so you are not mutating your original array in any case, which may cause confusion.

StefanKarpinski · January 30, 2022, 6:44pm

If you know you’re going to want to insert missings in a vector and you want to mutate it, it’s best to arrange for it to have an element-type Union{T, Missing} beforehand. Then inserting missing values by mutation will be possible.

leon · January 31, 2022, 2:14am

Many thanks for the great discussion!

On a related topic, I notice many people use the below, i.e., the largest possible number) as missing values: 9969209968386869046778552952102584320.

How do I do that in Julia? Just replace my NaN with this value?

StefanKarpinski · January 31, 2022, 4:03am

Wouldn’t leaving the NaNs as is be better? You happen to be working with a type that has a value (2^53 of them even) that was meant for this. You can assign the largest finite float value but it won’t be treated in any way special unless you write code that checks for it. Also, I’m not aware of anyone using that value as a sentinel for floats. They sometimes use the smallest representable integer value as a sentinel for signed integer types but for floats they use NaNs, which already mostly behave the way one wants.

Topic		Replies	Views
Missing or NaN General Usage	26	12338	August 1, 2018
Replacing missing values in a matrix is super slow General Usage question	38	1300	December 17, 2021
Why does arrayref throw? New to Julia question	89	2965	November 18, 2023
Occasionally NaNs when using similar() New to Julia	37	2288	November 21, 2023
Is accessing an `undef` array undefined behavior? General Usage	47	1258	July 23, 2023

How to replace all values in an array with missing values?

Related topics