Converting to an array that accepts 'missing' type

So I have a matrix with mostly 0/1/2 values, but missing values are usually set = 5 for other programs. How can I set these values to missing? It seems simple but I cannot find an answer with julia docs (Missing Values · The Julia Language). And searching discourse I’ve found very similar things but when I apply them it doesn’t work.

A = [ 0.0 1.0 2.0
       0.0 2.0 1.0
       1.0 5.0 1.0]

# set element 3, 2 to missing
A[3, 2] = missing     # throws error

# set all elements > 2.5 = missing (should only be 5's that represent missing)
A[A .> 2.5] = missing   # as one would do in R, but throws an error

Any help? I don’t understand this Array{Union{Missing, String}} notation I see in the docs or how to make a matrix accept missing values.

1 Like

It seems allowmissing is what you’re after:

julia> A = allowmissing(A)
3×3 Matrix{Union{Missing, Float64}}:
 0.0  1.0  2.0
 0.0  2.0  1.0
 1.0  5.0  1.0

julia> A[3,2] = missing
missing

or since the above allocates anyway, you can go with:

julia> A = map(x -> x > 2.5 ? missing : x, A)
3×3 Matrix{Union{Missing, Float64}}:
 0.0  1.0       2.0
 0.0  2.0       1.0
 1.0   missing  1.0
1 Like

Thank you so much, I’m not sure why this was so difficult to find. ChatGPT kept giving me weird things and I kept going to Discourse and didn’t find what I needed. This worked great. Thanks!

After training on this forum, the next version of ChatGPT will hopefully get it.

I hope so! It gives me a lot of wrong answers due to older versions of Julia.

Linking this related post.

We need to load Missings.jl to use allowmissing() (or DataFrames which seems to re-export it).

2 Likes

Why is this better than?

 0.0  1.0       2.0
 0.0  2.0       1.0
 1.0  NaN  1.0

NaN is a Float64 so your array will be contiguous in memory but missing is not a Float64 so who (me :slight_smile: ) knows how that array is stored.

1 Like

Is best when you want to set missing by condition, not by index. This is perfectly general and works for values other than missing (eg nothing), no need to learn specialized functions like allowmissing.

Setting a value of potentially-different type by index is probably most straightforward with Accessors:

julia> using Accessors

julia> A_new = @set A[3, 2] = missing
3×3 Matrix{Union{Missing, Float64}}:
 0.0  1.0       2.0
 0.0  2.0       1.0
 1.0   missing  1.0

Again, perfectly general and works for any type.

3 Likes

Perhaps something like this:

map(x -> x.I == (3,2) ? missing : A[x], CartesianIndices(A))

How silly is to do this:

(A = convert(Matrix{Union{Missing, eltype(A)}}, A))[3,2] = missing 

This is a tangent, but the quoted statement often occurs in other contexts: A function is supplied to transform a value, but it would be super convenient to use the index as a parameter to the transformation.

Something like:

mapind((I, x) -> ( I == (3,2) || x > 2.5 ? missing : A[x] ), A)

Essentially,

mapind( f, A, ...) "==" map(f, enumerate(A), ...)

Many of the uses of enumerate are from exactly this case in other functions too). But enumerate needs to add multidim indices instead of the default linear indices (i.e. zip(CartesianIndices(A), A) instead of enumerate(A) )

The simplest solution I see is to use:

map( ((I,v),) -> ...., pairs(A) )

which is pretty neat. So maybe it is enough.

A MWE for the case is creating a matrix of complex values with norm relative to distance from matrix center and random phase:

N = 5
phase = rand(N,N)
map( ((I,v),) -> exp(im*v)*hypot((Tuple(I).-(N-N÷2))...), collect(pairs(phase)) )

but this isn’t so nice.

NaN is a Float64 so your array will be contiguous in memory but missing is not a Float64 so who (me :slight_smile: ) knows how that array is stored.

According to the docs, arrays with missing values are still fast and contiguous in memory, since the missing information is stored alongside the main data rather than in place.

Using NaN is a bit of a hack since thats not what NaN is supposed to represent.

missings are fast in the simplest cases, but generally they can have huge overhead wrt NaNs — see a simple example at Is there any reason to use NaN instead of missing? - #4 by aplavin.
If you know that you have floating point data, nans are perfectly fine, and they are also convenient to work with in julia.

3 Likes

Imagine the data is to be transmitted to an external library written in C, Fortran, whatever, like a plotting lib for example. It can’t because those libs know nothing abut missings. So the missings have to be replaced by a float. That (I think) implies either a copy or memory shuffling.