Missing values in character or string

Leon6 · December 16, 2023, 12:12am

I’m trying to read values out of a NetCDF file. Unfortunately, the missing values seem to be mistakenly stored as characters or strings by the data creator.

Here is my code to read data out of the nc file:

ds1 = NCDataset("path/xxx.nc");
A_nc  = ds1["A"][:,:,:,:]
close(ds1);

size(A_nc) = (360, 180, 28, 192)
typeof(A_nc) = Array{Union{Missing, Float64}, 4}

Here is the error message:
┌ Warning: variable ‘A’ has a numeric type but the corresponding missing_value (-999.9) is a character or string. Comparing, e.g. an integer and a string (1 == “1”) will always evaluate to false. See the function NCDatasets.cfvariable how to manually override the missing_value attribute.
└ @ CommonDataModel ~/.julia/packages/CommonDataModel/pO4st/src/cfvariable.jl:122

What is a good solution to address this issue?

I tried the below:
DIC_nc[DIC_nc .== '-999.9'] .= NaN;
But get this new error:
**ERROR:** LoadError: syntax: character literal contains multiple characters

It’s interesting that when I tried to open the NetCDF file in Matlab, the class function shows the value is ‘double’, instead of character or string as claimed by Julia.

mkitti · December 16, 2023, 1:25am

It seems like you just got a warning and no errors when obtaining A_nc.

In Julia the missing values seems to have been replaced by missing. Use ismissing to locate them.

julia> A_nc = [0.1 0.2; 0.3 missing; 0.4 0.5]
3×2 Matrix{Union{Missing, Float64}}:
 0.1  0.2
 0.3   missing
 0.4  0.5

julia> A_nc[ismissing.(A_nc)] .= NaN
1-element view(reshape(::Matrix{Union{Missing, Float64}}, 6), [5]) with eltype Union{Missing, Float64}:
 NaN

julia> A_nc
3×2 Matrix{Union{Missing, Float64}}:
 0.1    0.2
 0.3  NaN
 0.4    0.5

The second error you encountered concerns the invalid syntax '-999.0'. In Julia, single quotes are only used for a character, not multiple characters in a string.

julia> '-999.0'
ERROR: syntax: character literal contains multiple characters
Stacktrace:
 [1] top-level scope
   @ none:1

julia> "-999.0"
"-999.0"

julia> '8'
'8': ASCII/Unicode U+0038 (category Nd: Number, decimal digit)

The original error is really an issue with the NetCDF file itself.

Leon6 · December 16, 2023, 6:57am

Many thanks for the reply.

Unfortunately, it is not working. Despite the fact that it shows
typeof(A_nc) = Array{Union{Missing, Float64}, 4},
When I use ‘@show’ to display the values, I see a bunch of -999.9 instead of missing.
As a result, the script below does nothing:
A_nc[ismissing.(A_nc)] .= NaN

The script below is indeed able to replace -999.9s in the array with NaNs:
A_nc[A_nc .== -999.9] .= NaN,
but the ‘typeof’ result remains the same:
typeof(A_nc) = Array{Union{Missing, Float64}, 4}

Leon6 · December 16, 2023, 7:04am

Below is the result of ‘@show’ for the array, after I replace -999.9 with NaNs:

A_nc[:, 50, 1, 1] = Union{Missing, Float64}[2059.642853474127, 2060.3117549314975, 2060.9158207445385, NaN, NaN, NaN, NaN, 2047.648519975586]

Leon6 · December 16, 2023, 11:48am

I’m able to find a solution online:

using MappedArrays
A_nc = of_eltype(Float64, A_nc);

Please see also: Feature request: convert between Array{Union{T, Missing}, N} and Array{T, N} without copying · Issue #26681 · JuliaLang/julia · GitHub

pdeffebach · December 16, 2023, 1:35pm

You might like my package MissingsAsFalse.jl which provides a convenient syntax for missing comparisons. See here.

Also consider isequal.(x, -99.9) instead of ==.

Leon6 · December 17, 2023, 6:05am

Just found an easier way to solve this issue:
A_nc = nomissing(A_nc, NaN);

joa-quim · December 17, 2023, 11:59am

Using missings in float arrays is a bad idea IMO.

I think all had worked for you if you had used GMT.

G = gmtread(“xxx.nc”)

Leon6 · December 17, 2023, 2:39pm

Sadly, in my institution, we’re forced to use missing, citing reasons that NaN is not numerical and thus not supported by all programs.

joa-quim · December 17, 2023, 2:53pm

Sorry??? NaNs are numeric. And that’s what probably was in the original data. But ofc you do as need.

pdeffebach · December 17, 2023, 3:12pm

It’s the other way around… NaN will be supported everywhere with basically the same semantics. missing is julia-specific and very useful, but maybe not in this case…

Leon6 · December 17, 2023, 4:24pm

Thank you, Joaquim and Pdeffebach.

It’s interesting to hear about this. How specifically are NaNs stored as numerical values?

joa-quim · December 17, 2023, 4:55pm

NaNs are bit patterns that by convention are recognized as 32 or 64 floating point numbers (there are several NaNs), but the point is, in memory they take exactly the same space as any other number of its type (I mean Float32 or Float64). Julia missings (which I don’t know what they really are) on the other hand are not floating point numbers, so when NaNs are replaced with missing a copy of the array has to be made as all numbers are no longer storable contiguously in memory. I have no idea how Julia mange the Union{Missing, Float} but there are many post in the forum mentioning how the presence of missings instead of NaNs make processing way slower. That is why I said * Using missings in float arrays is a bad idea*.

mkitti · December 17, 2023, 4:56pm

Demonstration.

julia> bitstring(0.0)
"0000000000000000000000000000000000000000000000000000000000000000"

julia> bitstring(1.0)
"0011111111110000000000000000000000000000000000000000000000000000"

julia> bitstring(2.0)
"0100000000000000000000000000000000000000000000000000000000000000"

julia> bitstring(0.1)
"0011111110111001100110011001100110011001100110011001100110011010"

julia> bitstring(NaN)
"0111111111111000000000000000000000000000000000000000000000000000"

julia> bitstring(-NaN)
"1111111111111000000000000000000000000000000000000000000000000000"

Leon6 · December 17, 2023, 5:23pm

bitstring(missing)

ERROR: ArgumentError: Missing not a primitive type
Stacktrace:
[1] bitstring(x::Missing)
@ Base ./intfuncs.jl:842
[2] top-level scope
@ REPL[1]:1

mkitti · December 17, 2023, 5:24pm

Yes. That’s exactly the point.

pdeffebach · December 17, 2023, 5:26pm

missing and NaN have different meanings and different purposes. As missing is not a number, (or a primitive type), bitstring is not defined for it.

missings are useful for “Don’t Know” survey responses, or when no value is applicable. Advice to “never” use missing is misplaced because often missing semantics are exactly what you want.

In your context, though, when a netcdf data set, you definitely want to use NaN.

Alexander-Barth · February 19, 2024, 9:15pm

Besides the nomissing function, there is now also the keyword argument maskingvalue = NaN, per dataset or per variable to use a different values to mark missing data:

In your case, you can have directly an array of Float64s.

The warning:

┌ Warning: variable ‘A’ has a numeric type but the corresponding missing_value (-999.9) is a character or string. Comparing, e.g. an integer and a string (1 == “1”) will always evaluate to false. See the function NCDatasets.cfvariable how to manually override the missing_value attribute.

is triggered when a NetCDF file include the string “-999.9” rather than the floating point number -999.9 for the missing_value attribute (as the warning says).
See this issue for context:

github.com/Alexander-Barth/NCDatasets.jl

missing_value is a string for numeric NetCDF array (getindex fail with v0.12.0)

opened 03:59PM - 11 Mar 22 UTC

closed 10:09PM - 15 Mar 22 UTC

gaelforget

**Describe the bug** Indexing a Dataset variable by name, in a case that used… to work until at least v0.11.18, now returns an error with v0.12.0. Could have to do with the introduction of `SymbolOrString` but unclear. **To Reproduce** ``` run(`wget ftp://ftp.aoml.noaa.gov/pub/phod/lumpkin/hourly/v1.04/netcdf/gps/drifter_10050130.nc`) ds=Dataset("drifter_10050130.nc") haskey(ds,"longitude") ds["longitude"] ``` **Expected behavior** `ds["longitude"]` should work. The example quoted here has been part of the test suite for OceanRobots.jl for a while. **Environment** - operating system, CPU architecture, Julia version : macOS, linux, windows -- see https://github.com/gaelforget/OceanRobots.jl/actions/runs/1968836506 - NCDatasets version: v0.12.0 **Full output** ``` ds["longitude"] ERROR: MethodError: no method matching Float32(::String) Closest candidates are: (::Type{T})(::AbstractChar) where T<:Union{AbstractChar, Number} at /Applications/Julia-1.7.app/Contents/Resources/julia/share/julia/base/char.jl:50 (::Type{T})(::Base.TwicePrecision) where T<:Number at /Applications/Julia-1.7.app/Contents/Resources/julia/share/julia/base/twiceprecision.jl:255 (::Type{T})(::Complex) where T<:Real at /Applications/Julia-1.7.app/Contents/Resources/julia/share/julia/base/complex.jl:44 ... Stacktrace: [1] _broadcast_getindex_evalf @ ./broadcast.jl:670 [inlined] [2] _broadcast_getindex @ ./broadcast.jl:643 [inlined] [3] getindex @ ./broadcast.jl:597 [inlined] [4] copy @ ./broadcast.jl:875 [inlined] [5] materialize(bc::Base.Broadcast.Broadcasted{Base.Broadcast.DefaultArrayStyle{0}, Nothing, Type{Float32}, Tuple{Base.RefValue{String}}}) @ Base.Broadcast ./broadcast.jl:860 [6] getindex(ds::NCDataset{Nothing}, varname::String) @ NCDatasets ~/.julia/packages/NCDatasets/Iaww4/src/cfvariable.jl:373 [7] top-level scope @ REPL[8]:1 ```

It would be good to contact the author of the datasets. In my tests (in 2022), also python’s-netCDF4 fails to load such files.

rocco_sprmnt21 · February 20, 2024, 9:25am

I haven’t read the rest of the discussion (yet) and I don’t know the array, but I would say that the error lies in the attempt to “force” a string into one (only) character

DIC_nc[DIC_nc .== "-999.9"] .= NaN;

Might work (Not tested)

Topic		Replies	Views
NetCDF file created in Matlab is read differently by Julia? General Usage question	5	491	October 4, 2021
Cannot `convert` an object of type Missing to an object of type Float64 General Usage question	3	1003	October 8, 2022
Error from conversion of missing values to NaNs General Usage question , missing-values , netcdf	12	471	January 4, 2024
How to read numerical values out of a NetCDF file and convert them into logical values? General Usage question , type , convert	3	597	October 4, 2021
Missing or NaN General Usage	26	12246	August 1, 2018

Missing values in character or string

Related topics