I’m trying to read values out of a NetCDF file. Unfortunately, the missing values seem to be mistakenly stored as characters or strings by the data creator.
Here is the error message: ┌ Warning: variable ‘A’ has a numeric type but the corresponding missing_value (-999.9) is a character or string. Comparing, e.g. an integer and a string (1 == “1”) will always evaluate to false. See the function NCDatasets.cfvariable how to manually override the missing_value attribute. └ @ CommonDataModel ~/.julia/packages/CommonDataModel/pO4st/src/cfvariable.jl:122
What is a good solution to address this issue?
I tried the below: DIC_nc[DIC_nc .== '-999.9'] .= NaN;
But get this new error: **ERROR:** LoadError: syntax: character literal contains multiple characters
It’s interesting that when I tried to open the NetCDF file in Matlab, the class function shows the value is ‘double’, instead of character or string as claimed by Julia.
The second error you encountered concerns the invalid syntax '-999.0'. In Julia, single quotes are only used for a character, not multiple characters in a string.
Unfortunately, it is not working. Despite the fact that it shows
typeof(A_nc) = Array{Union{Missing, Float64}, 4},
When I use ‘@show’ to display the values, I see a bunch of -999.9 instead of missing.
As a result, the script below does nothing: A_nc[ismissing.(A_nc)] .= NaN
The script below is indeed able to replace -999.9s in the array with NaNs: A_nc[A_nc .== -999.9] .= NaN,
but the ‘typeof’ result remains the same: typeof(A_nc) = Array{Union{Missing, Float64}, 4}
It’s the other way around… NaN will be supported everywhere with basically the same semantics. missing is julia-specific and very useful, but maybe not in this case…
NaNs are bit patterns that by convention are recognized as 32 or 64 floating point numbers (there are several NaNs), but the point is, in memory they take exactly the same space as any other number of its type (I mean Float32 or Float64). Julia missings (which I don’t know what they really are) on the other hand are not floating point numbers, so when NaNs are replaced with missing a copy of the array has to be made as all numbers are no longer storable contiguously in memory. I have no idea how Julia mange the Union{Missing, Float} but there are many post in the forum mentioning how the presence of missings instead of NaNs make processing way slower. That is why I said * Using missings in float arrays is a bad idea*.
missing and NaN have different meanings and different purposes. As missing is not a number, (or a primitive type), bitstring is not defined for it.
missings are useful for “Don’t Know” survey responses, or when no value is applicable. Advice to “never” use missing is misplaced because often missing semantics are exactly what you want.
In your context, though, when a netcdf data set, you definitely want to use NaN.
Besides the nomissing function, there is now also the keyword argument maskingvalue = NaN, per dataset or per variable to use a different values to mark missing data:
In your case, you can have directly an array of Float64s.
The warning:
┌ Warning: variable ‘A’ has a numeric type but the corresponding missing_value (-999.9) is a character or string. Comparing, e.g. an integer and a string (1 == “1”) will always evaluate to false. See the function NCDatasets.cfvariable how to manually override the missing_value attribute.
is triggered when a NetCDF file include the string “-999.9” rather than the floating point number -999.9 for the missing_value attribute (as the warning says).
See this issue for context:
It would be good to contact the author of the datasets. In my tests (in 2022), also python’s-netCDF4 fails to load such files.
I haven’t read the rest of the discussion (yet) and I don’t know the array, but I would say that the error lies in the attempt to “force” a string into one (only) character