How to replace all values in an array with missing values?

I have an array A with the size of 360 x 180 x 100. The current missing value indicator is NaN. My goal is to replace all NaN values with missing values.

Below is my code:
A[isnan.(A)] .= missing;

Unfortunately, I got a ton of errors as below:

ERROR: LoadError: MethodError: Cannot `convert` an object of type Missing to an object of type Float64
Closest candidates are:
  convert(::Type{T}, ::Base.TwicePrecision) where T<:Number at /Applications/Julia-1.7.app/Contents/Resources/julia/share/julia/base/twiceprecision.jl:262
  convert(::Type{T}, ::AbstractChar) where T<:Number at /Applications/Julia-1.7.app/Contents/Resources/julia/share/julia/base/char.jl:185
  convert(::Type{T}, ::CartesianIndex{1}) where T<:Number at /Applications/Julia-1.7.app/Contents/Resources/julia/share/julia/base/multidimensional.jl:136
  ...
Stacktrace:
 [1] fill!(A::SubArray{Float64, 1, Vector{Float64}, Tuple{Vector{Int64}}, false}, x::Missing)
   @ Base ./multidimensional.jl:1062
 [2] copyto!
   @ ./broadcast.jl:921 [inlined]
 [3] materialize!
   @ ./broadcast.jl:871 [inlined]
 [4] materialize!(dest::SubArray{Float64, 1, Vector{Float64}, Tuple{Vector{Int64}}, false}, bc::Base.Broadcast.Broadcasted{Base.Broadcast.DefaultArrayStyle{0}, Nothing, typeof(identity), Tuple{Base.RefValue{Missing}}})
   @ Base.Broadcast ./broadcast.jl:868
1 Like

An array of floats does not accept by default elements of type Missing. What you get is the same as this:

julia> x = zeros(3);

julia> x[1] = missing
ERROR: MethodError: Cannot `convert` an object of type Missing to an object of type Float64
...

You would need that the array was set to accept missing values, for example:

julia> x = zeros(Union{Missing,Float64},3)
3-element Vector{Union{Missing, Float64}}:
 0.0
 0.0
 0.0

julia> x[1] = missing
missing

julia> x
3-element Vector{Union{Missing, Float64}}:
  missing
 0.0
 0.0

I guess that has performance implications (bad ones), but that may or may not be relevant for your specific case.

In other words, you need to define a new array, for example with something like:

julia> x
3-element Vector{Float64}:
 NaN
   0.7031732110417592
   0.03867946256935528

julia> x_new = [ isnan(val) ? missing : val for val in x ]
3-element Vector{Union{Missing, Float64}}:
  missing
 0.7031732110417592
 0.03867946256935528

2 Likes

Many thanks!

So if I read that array from somewhere else, I’m out of luck with this approach?

I really hope Julia could fix this fundamental issue.

I think there is no way to “fix” that, because you are trying to assign to the array a type of value that the array originally cannot support. That is not “fixable” without reallocating the array, which is what the comprehension does there. (you can assign the same label to new array, thus letting the previous one free to be garbage-collected, i. e.

A = [ isnan(x) ? missing : x for x in A ]
6 Likes

This is not a fundamental issue, it is not considered broken, there will be no “fix”.

How did you read that array ? A language which creates an array that uses NaN to represent a missing value exposes that language as broken.

Anyway, the various modules which read in data such as CSV represent missing values as missing. DataFrames respect missing.

3 Likes

Many thanks!

I still got the same error, but was able to modify it as below to make it work:

A = replace!(x -> isnan(x) ? missing : x, Array{Union{Float64, Missing}}(A) );

To Julia designers, I still think it would be so much nice to allow people do this by writing a code like this: A[isnan.(A)] .= missing;

1 Like
A=convert(Vector{Union{Float64,Missing}},A)
1 Like

There was a typo in my answer, fixed now. Should work

1 Like

You can do that if you create all of your arrays as Vector{Any}, but that will defeat the performance benefits that Julia offers, as well as the type safety that keeps you from shooting yourself in the foot.

4 Likes

There is the function allowmissing(x) in Missings.jl for this.

I would just use replace

julia> x = [1, 2, 3, NaN]
4-element Vector{Float64}:
   1.0
   2.0
   3.0
 NaN

julia> replace(x, NaN => missing)
4-element Vector{Union{Missing, Float64}}:
 1.0
 2.0
 3.0
  missing
7 Likes

try to think of it another way, there is nothing special about missing, it is just a value of ::Type{Missing}

A = Float64[0, 1.0, Nan, 20.1]
A[isnan.(A)] .= "It's a Nan!"

Why would we want to allow people to do that ?

2 Likes

There has been quite some discussion about this. The special about missing is, that it is not just a miss-interpreted data point or some dirty input, but a missing observation in the statistical sense. (I didn’t search for the former discussions.)
There has been a package “Missing” if I remember right. The tenor was that this has importance so it has been made part of Julia. Not the package but a “missing” concept.

2 Likes

That’s not quite what OP is saying. missing is certainly special from a statistical perspective and through it’s propagation, accomplished through overloading basic operators.

But it is implemented very transparently, you could make you own missing type very easily with just

struct MyMissing end

and defining +, -, ==, etc.

OP means that not allowing automatic promotion in an array via indexing is part of missing being “nothing special”. Allwing

setindex!(x::AbstractArray, i, m::Missing)

to promote the array type would

  1. Be very unsafe and make lots of users unhappy
  2. Require making missing very special in it’s implementation such that it isn’t implemented as a normal struct.

Yes, I know what Missing represents but we are talking about Types, in that regard it is just another Type, just like NULL in SQL.

And just like NULL in SQL, if your column is an INT and you don’t have ALLOW NULLS, you cannot insert it, your column must be Union{INT, NULL}.

2 Likes

Ok, understood.
In this sense, Missing is just a Type like any other.

1 Like

You may be interested in this talk by the famous programming language designer, Sir Tony Hoare. He is the one who originally allowed object references to be null in ALGOL, a design which was then copied by C, C++, Java, C# and so on. He refers to this as his “billion dollar mistake”. Why? Because of this choice, the type systems in these languages can never ensure that a value is actually present—it might always be null. This causes no end of problems and crashes and makes it nearly impossible to write reliable code in these languages. If you’ve ever run a program that crashes with a segmentation fault or a null pointer exception, then you’ve been the victim of this problem. It also makes using all reference types slower than necessary because the compiler also needs to constantly check for things being null and do something different in that case. So everyone is a victim.

Does this issue sound familiar? It should: it’s the exact thing you’re asking for but replacing null with missing. You are asking to be allowed store missing anywhere a Float64 is expected—and presumably the same for other types as well, like Int or String or any user-defined type. This is like Sir Hoare’s billion dollar mistake but worse because at least in C et al. a floating-point value is a primitive which cannot be null—only reference types (i.e. “objects”) can be null. What’s the problem? If Vector{Int}, for example, can contain missing values, then how can we express a vector of actual non-missing Int values? We would lose the ability to do so. Which would force everyone in all situations to program defensively against missings, much as everyone is forced to program defensively against null pointers in the aforementioned languages. Instead, Julia has different types: Vector{Int} for a vector of actual non-missing integer values and Vector{Union{Int, Missing}} for a vector of possibly missing integer values.

In short, avoiding repeating Hoare’s billion dollar mistake (and surely it’s been far more expensive than a mere billion dollars at this point) was very much intentional and not something we’re going to change.

20 Likes

Note that all the options don’t mutate the original array, but create a new array. IMHO the simpler syntax is that of the comprehension, but all do the same:

julia> A = zeros(3); B = A;

julia> A = replace!(x -> isnan(x) ? missing : x, Array{Union{Float64, Missing}}(A) );

julia> A === B
false

julia> A[1] = missing
missing

julia> B
3-element Vector{Float64}:
 0.0
 0.0
 0.0

Be careful with that. And all the other options do the same, so you are not mutating your original array in any case, which may cause confusion.

If you know you’re going to want to insert missings in a vector and you want to mutate it, it’s best to arrange for it to have an element-type Union{T, Missing} beforehand. Then inserting missing values by mutation will be possible.

7 Likes

Many thanks for the great discussion!

On a related topic, I notice many people use the below, i.e., the largest possible number) as missing values: 9969209968386869046778552952102584320.

How do I do that in Julia? Just replace my NaN with this value?

Wouldn’t leaving the NaNs as is be better? You happen to be working with a type that has a value (2^53 of them even) that was meant for this. You can assign the largest finite float value but it won’t be treated in any way special unless you write code that checks for it. Also, I’m not aware of anyone using that value as a sentinel for floats. They sometimes use the smallest representable integer value as a sentinel for signed integer types but for floats they use NaNs, which already mostly behave the way one wants.

4 Likes