Short:
I have written a small package (30 LOC) that introduces a number that that lets you treat “sentinel values” as missing
s in Julia GitHub - meggart/SentinelMissings.jl (not yet registered)
Long:
I really like the new Julia solution to missing values by optimizing the Union{T,Missing}
, but in some cases it has limitations. For example, the file format I am mostly dealing with (NetCDF files or Zarr datasets) have the convention to mark missing values through some sentinel value which is defined in the file’s attributes.
Since the element type does not resemble the C number types anymore, one can not simply pass a pointer to a Julia Array{Union{Missing,Float64}}
to a C routine that expects a double*
to write some data into. In this case I really want to avoid creating a copy of the array, because the arrays in the file can be quite large and one might run out of memory when copying the array.
Other limitations I came across were that an Array{Union{Missing,Float64}}
can not be Mmapped or passed to Blosc for compression etc.
To make dealing with this easier I have written this small package and ask if there are already other attempts at implementing this functionality, if you know another package where this functionality might fit or if you find this useful at all.
A typical workflow would be:
x = [1 2 3;
4 5 6;
-1 -1 10]
xs = as_sentinel(x,-1)
3×3 reinterpret(SentinelMissings.SentinelMissing{Int64,-1}, ::Array{Int64,2}):
1 2 3
4 5 6
missing missing 10
Note that this does not copy the array, but operating on the reinterpret version behaves as if
the values inside were missing
s and it operates quite well with Array{Union{T,Missing}}
types, e.g.:
a = [5.0 2.0 missing]
xs .= xs .+ a
3×3 reinterpret(SentinelMissings.SentinelMissing{Int64,-1}, ::Array{Int64,2}):
9 6 missing
12 9 missing
missing missing missing
while the memory is still shared with x
, which could be an Mmapped array or an
Array you share to a C library.
x
3×3 Array{Int64,2}:
9 6 -1
12 9 -1
-1 -1 -1