I have an array A with the size of 360 x 180 x 100. The current missing value indicator is NaN. My goal is to replace all NaN values with missing values.
Below is my code: A[isnan.(A)] .= missing;
Unfortunately, I got a ton of errors as below:
ERROR: LoadError: MethodError: Cannot `convert` an object of type Missing to an object of type Float64
Closest candidates are:
convert(::Type{T}, ::Base.TwicePrecision) where T<:Number at /Applications/Julia-1.7.app/Contents/Resources/julia/share/julia/base/twiceprecision.jl:262
convert(::Type{T}, ::AbstractChar) where T<:Number at /Applications/Julia-1.7.app/Contents/Resources/julia/share/julia/base/char.jl:185
convert(::Type{T}, ::CartesianIndex{1}) where T<:Number at /Applications/Julia-1.7.app/Contents/Resources/julia/share/julia/base/multidimensional.jl:136
...
Stacktrace:
[1] fill!(A::SubArray{Float64, 1, Vector{Float64}, Tuple{Vector{Int64}}, false}, x::Missing)
@ Base ./multidimensional.jl:1062
[2] copyto!
@ ./broadcast.jl:921 [inlined]
[3] materialize!
@ ./broadcast.jl:871 [inlined]
[4] materialize!(dest::SubArray{Float64, 1, Vector{Float64}, Tuple{Vector{Int64}}, false}, bc::Base.Broadcast.Broadcasted{Base.Broadcast.DefaultArrayStyle{0}, Nothing, typeof(identity), Tuple{Base.RefValue{Missing}}})
@ Base.Broadcast ./broadcast.jl:868
I guess that has performance implications (bad ones), but that may or may not be relevant for your specific case.
In other words, you need to define a new array, for example with something like:
julia> x
3-element Vector{Float64}:
NaN
0.7031732110417592
0.03867946256935528
julia> x_new = [ isnan(val) ? missing : val for val in x ]
3-element Vector{Union{Missing, Float64}}:
missing
0.7031732110417592
0.03867946256935528
I think there is no way to “fix” that, because you are trying to assign to the array a type of value that the array originally cannot support. That is not “fixable” without reallocating the array, which is what the comprehension does there. (you can assign the same label to new array, thus letting the previous one free to be garbage-collected, i. e.
You can do that if you create all of your arrays as Vector{Any}, but that will defeat the performance benefits that Julia offers, as well as the type safety that keeps you from shooting yourself in the foot.
There has been quite some discussion about this. The special about missing is, that it is not just a miss-interpreted data point or some dirty input, but a missing observation in the statistical sense. (I didn’t search for the former discussions.)
There has been a package “Missing” if I remember right. The tenor was that this has importance so it has been made part of Julia. Not the package but a “missing” concept.
That’s not quite what OP is saying. missing is certainly special from a statistical perspective and through it’s propagation, accomplished through overloading basic operators.
But it is implemented very transparently, you could make you own missing type very easily with just
struct MyMissing end
and defining +, -, ==, etc.
OP means that not allowing automatic promotion in an array via indexing is part of missing being “nothing special”. Allwing
setindex!(x::AbstractArray, i, m::Missing)
to promote the array type would
Be very unsafe and make lots of users unhappy
Require making missing very special in it’s implementation such that it isn’t implemented as a normal struct.
You may be interested in this talk by the famous programming language designer, Sir Tony Hoare. He is the one who originally allowed object references to be null in ALGOL, a design which was then copied by C, C++, Java, C# and so on. He refers to this as his “billion dollar mistake”. Why? Because of this choice, the type systems in these languages can never ensure that a value is actually present—it might always be null. This causes no end of problems and crashes and makes it nearly impossible to write reliable code in these languages. If you’ve ever run a program that crashes with a segmentation fault or a null pointer exception, then you’ve been the victim of this problem. It also makes using all reference types slower than necessary because the compiler also needs to constantly check for things being null and do something different in that case. So everyone is a victim.
Does this issue sound familiar? It should: it’s the exact thing you’re asking for but replacing null with missing. You are asking to be allowed store missing anywhere a Float64 is expected—and presumably the same for other types as well, like Int or String or any user-defined type. This is like Sir Hoare’s billion dollar mistake but worse because at least in C et al. a floating-point value is a primitive which cannot be null—only reference types (i.e. “objects”) can be null. What’s the problem? If Vector{Int}, for example, can contain missing values, then how can we express a vector of actual non-missing Int values? We would lose the ability to do so. Which would force everyone in all situations to program defensively against missings, much as everyone is forced to program defensively against null pointers in the aforementioned languages. Instead, Julia has different types: Vector{Int} for a vector of actual non-missing integer values and Vector{Union{Int, Missing}} for a vector of possibly missing integer values.
In short, avoiding repeating Hoare’s billion dollar mistake (and surely it’s been far more expensive than a mere billion dollars at this point) was very much intentional and not something we’re going to change.
Note that all the options don’t mutate the original array, but create a new array. IMHO the simpler syntax is that of the comprehension, but all do the same:
julia> A = zeros(3); B = A;
julia> A = replace!(x -> isnan(x) ? missing : x, Array{Union{Float64, Missing}}(A) );
julia> A === B
false
julia> A[1] = missing
missing
julia> B
3-element Vector{Float64}:
0.0
0.0
0.0
Be careful with that. And all the other options do the same, so you are not mutating your original array in any case, which may cause confusion.
If you know you’re going to want to insert missings in a vector and you want to mutate it, it’s best to arrange for it to have an element-type Union{T, Missing} beforehand. Then inserting missing values by mutation will be possible.
Wouldn’t leaving the NaNs as is be better? You happen to be working with a type that has a value (2^53 of them even) that was meant for this. You can assign the largest finite float value but it won’t be treated in any way special unless you write code that checks for it. Also, I’m not aware of anyone using that value as a sentinel for floats. They sometimes use the smallest representable integer value as a sentinel for signed integer types but for floats they use NaNs, which already mostly behave the way one wants.