So I have a matrix with mostly 0/1/2 values, but missing values are usually set = 5 for other programs. How can I set these values to missing? It seems simple but I cannot find an answer with julia docs (Missing Values · The Julia Language). And searching discourse I’ve found very similar things but when I apply them it doesn’t work.
A = [ 0.0 1.0 2.0
0.0 2.0 1.0
1.0 5.0 1.0]
# set element 3, 2 to missing
A[3, 2] = missing # throws error
# set all elements > 2.5 = missing (should only be 5's that represent missing)
A[A .> 2.5] = missing # as one would do in R, but throws an error
Any help? I don’t understand this Array{Union{Missing, String}} notation I see in the docs or how to make a matrix accept missing values.
Thank you so much, I’m not sure why this was so difficult to find. ChatGPT kept giving me weird things and I kept going to Discourse and didn’t find what I needed. This worked great. Thanks!
Is best when you want to set missing by condition, not by index. This is perfectly general and works for values other than missing (eg nothing), no need to learn specialized functions like allowmissing.
Setting a value of potentially-different type by index is probably most straightforward with Accessors:
This is a tangent, but the quoted statement often occurs in other contexts: A function is supplied to transform a value, but it would be super convenient to use the index as a parameter to the transformation.
Something like:
mapind((I, x) -> ( I == (3,2) || x > 2.5 ? missing : A[x] ), A)
Essentially,
mapind( f, A, ...) "==" map(f, enumerate(A), ...)
Many of the uses of enumerate are from exactly this case in other functions too). But enumerate needs to add multidim indices instead of the default linear indices (i.e. zip(CartesianIndices(A), A) instead of enumerate(A) )
The simplest solution I see is to use:
map( ((I,v),) -> ...., pairs(A) )
which is pretty neat. So maybe it is enough.
A MWE for the case is creating a matrix of complex values with norm relative to distance from matrix center and random phase:
NaN is a Float64 so your array will be contiguous in memory but missing is not a Float64 so who (me ) knows how that array is stored.
According to the docs, arrays with missing values are still fast and contiguous in memory, since the missing information is stored alongside the main data rather than in place.
Using NaN is a bit of a hack since thats not what NaN is supposed to represent.
missings are fast in the simplest cases, but generally they can have huge overhead wrt NaNs — see a simple example at Is there any reason to use NaN instead of missing? - #4 by aplavin.
If you know that you have floating point data, nans are perfectly fine, and they are also convenient to work with in julia.
Imagine the data is to be transmitted to an external library written in C, Fortran, whatever, like a plotting lib for example. It can’t because those libs know nothing abut missings. So the missings have to be replaced by a float. That (I think) implies either a copy or memory shuffling.