Converting to an array that accepts 'missing' type

austin-putz · February 6, 2023, 12:10am

So I have a matrix with mostly 0/1/2 values, but missing values are usually set = 5 for other programs. How can I set these values to missing? It seems simple but I cannot find an answer with julia docs (Missing Values · The Julia Language). And searching discourse I’ve found very similar things but when I apply them it doesn’t work.

A = [ 0.0 1.0 2.0
       0.0 2.0 1.0
       1.0 5.0 1.0]

# set element 3, 2 to missing
A[3, 2] = missing     # throws error

# set all elements > 2.5 = missing (should only be 5's that represent missing)
A[A .> 2.5] = missing   # as one would do in R, but throws an error

Any help? I don’t understand this Array{Union{Missing, String}} notation I see in the docs or how to make a matrix accept missing values.

Dan · February 6, 2023, 12:16am

It seems allowmissing is what you’re after:

julia> A = allowmissing(A)
3×3 Matrix{Union{Missing, Float64}}:
 0.0  1.0  2.0
 0.0  2.0  1.0
 1.0  5.0  1.0

julia> A[3,2] = missing
missing

or since the above allocates anyway, you can go with:

julia> A = map(x -> x > 2.5 ? missing : x, A)
3×3 Matrix{Union{Missing, Float64}}:
 0.0  1.0       2.0
 0.0  2.0       1.0
 1.0   missing  1.0

austin-putz · February 6, 2023, 12:20am

Thank you so much, I’m not sure why this was so difficult to find. ChatGPT kept giving me weird things and I kept going to Discourse and didn’t find what I needed. This worked great. Thanks!

Dan · February 6, 2023, 12:21am

After training on this forum, the next version of ChatGPT will hopefully get it.

austin-putz · February 6, 2023, 12:48am

I hope so! It gives me a lot of wrong answers due to older versions of Julia.

rafael.guerra · February 6, 2023, 12:58am

Linking this related post.

We need to load Missings.jl to use allowmissing() (or DataFrames which seems to re-export it).

joa-quim · February 6, 2023, 1:14am

Why is this better than?

 0.0  1.0       2.0
 0.0  2.0       1.0
 1.0  NaN  1.0

NaN is a Float64 so your array will be contiguous in memory but missing is not a Float64 so who (me ) knows how that array is stored.

aplavin · February 6, 2023, 9:14pm

Is best when you want to set missing by condition, not by index. This is perfectly general and works for values other than missing (eg nothing), no need to learn specialized functions like allowmissing.

Setting a value of potentially-different type by index is probably most straightforward with Accessors:

julia> using Accessors

julia> A_new = @set A[3, 2] = missing
3×3 Matrix{Union{Missing, Float64}}:
 0.0  1.0       2.0
 0.0  2.0       1.0
 1.0   missing  1.0

Again, perfectly general and works for any type.

rafael.guerra · February 6, 2023, 10:05pm

Perhaps something like this:

map(x -> x.I == (3,2) ? missing : A[x], CartesianIndices(A))

rafael.guerra · February 7, 2023, 12:43am

How silly is to do this:

(A = convert(Matrix{Union{Missing, eltype(A)}}, A))[3,2] = missing

Dan · February 7, 2023, 2:09am

This is a tangent, but the quoted statement often occurs in other contexts: A function is supplied to transform a value, but it would be super convenient to use the index as a parameter to the transformation.

Something like:

mapind((I, x) -> ( I == (3,2) || x > 2.5 ? missing : A[x] ), A)

Essentially,

mapind( f, A, ...) "==" map(f, enumerate(A), ...)

Many of the uses of enumerate are from exactly this case in other functions too). But enumerate needs to add multidim indices instead of the default linear indices (i.e. zip(CartesianIndices(A), A) instead of enumerate(A) )

The simplest solution I see is to use:

map( ((I,v),) -> ...., pairs(A) )

which is pretty neat. So maybe it is enough.

A MWE for the case is creating a matrix of complex values with norm relative to distance from matrix center and random phase:

N = 5
phase = rand(N,N)
map( ((I,v),) -> exp(im*v)*hypot((Tuple(I).-(N-N÷2))...), collect(pairs(phase)) )

but this isn’t so nice.

hexaeder · February 7, 2023, 8:40am

NaN is a Float64 so your array will be contiguous in memory but missing is not a Float64 so who (me ) knows how that array is stored.

According to the docs, arrays with missing values are still fast and contiguous in memory, since the missing information is stored alongside the main data rather than in place.

Using NaN is a bit of a hack since thats not what NaN is supposed to represent.

aplavin · February 7, 2023, 9:08am

missings are fast in the simplest cases, but generally they can have huge overhead wrt NaNs — see a simple example at Is there any reason to use NaN instead of missing? - #4 by aplavin.
If you know that you have floating point data, nans are perfectly fine, and they are also convenient to work with in julia.

joa-quim · February 7, 2023, 12:07pm

Imagine the data is to be transmitted to an external library written in C, Fortran, whatever, like a plotting lib for example. It can’t because those libs know nothing abut missings. So the missings have to be replaced by a float. That (I think) implies either a copy or memory shuffling.

Topic		Replies	Views
Allowing a preexisting vector to contain missings later on New to Julia	7	282	February 21, 2021
Proper way to initiate an array with `missing` General Usage question	10	2419	March 26, 2022
Insert missing/nan elements in BitMatrix General Usage	2	379	April 28, 2021
How to replace all values in an array with missing values? General Usage question , arrays , nan , missing-values	43	6685	April 15, 2023
Efficient way to transform Array{Union{Missing,Float64}} to Array{Float64}? General Usage	11	4200	September 6, 2019

Converting to an array that accepts 'missing' type

Related topics