Error from conversion of missing values to NaNs

A is a value I read out of an NetCDF file (32 bit). Its size is 360x180x30.
Here is my code to convert its missing values into NaNs:
A[ismissing.(A)] .= NaN;

Why does the above script work perfectly fine, but every time when A is composed of integers, it is causing errors like the below:

**ERROR:** LoadError: InexactError: Int32(NaN)
Stacktrace:
[1] **Int32**
@ ./float.jl:879 [inlined]
[2] **convert**
@ ./number.jl:7 [inlined]
[3] **convert**
@ ./missing.jl:69 [inlined]
[4] **fill!(**A::SubArray{Union{Missing, Int32}, 1, Base.ReshapedArray{Union{Missing, Int32}, 1, Array{Union{Missing, Int32}, 4}, Tuple{}}, Tuple{Vector{Int64}}, false}, x::Float64**)**
@ Base ./multidimensional.jl:1084
[5] **copyto!**
@ ./broadcast.jl:934 [inlined]
[6] **materialize!**
@ ./broadcast.jl:884 [inlined]
[7] **materialize!(**dest::SubArray{Union{Missing, Int32}, 1, Base.ReshapedArray{Union{Missing, Int32}, 1, Array{Union{Missing, Int32}, 4}, Tuple{}}, Tuple{Vector{Int64}}, false}, bc::Base.Broadcast.Broadcasted{Base.Broadcast.DefaultArrayStyle{0}, Nothing, typeof(identity), Tuple{Float64}}**)**
@ Base.Broadcast ./broadcast.jl:881

Because NaN is exclusively a floating-point value — you can’t store that in an integer array.

If you have a Vector{Int32}, then it is stored in memory as a sequence of consecutive 32-bit signed integer (Int32) values. Every possible bit pattern of an Int32 represents some integer — there is no bit pattern that represents a NaN without changing the type of the elements.

What you could do is

A = float(A)
A[ismissing.(A)] .= NaN

to ensure that A is a floating-point container. Or alternatively:

Anan = map(x -> ismissing(x) ? NaN : Float64(x), A)

which has the advantage that the container type of Anan will be concretely Vector{Float64}, not Vector{Union{Float64,Missing}} — the compiler will know that it can no longer contain missing values and optimize accordingly.

PS. Please quote your code: PSA: how to quote code with backticks

3 Likes

Thank you so much. This solution works for me.

So NaNs do have their limitations? The other day, we were discussing how good NaNs are when it comes to missing values.

And at that day I also told you that NaNs are for floating points only. But why do you have missings when reading a netCDF file? netCDF have no missings and quite likely had NaNs to start with.

Grids of Int type normally have a flag value to indicate nodata. Int32 are also not common in geographical grids.

Yes, people still use Int32 and missing values in NetCDF files.

Below is an example:
https://www.ncei.noaa.gov/thredds-ocean/catalog/ncei/woa/temperature/decav/1.00/catalog.html?dataset=ncei/woa/temperature/decav/1.00/woa18_decav_t00_01.nc

That is impossible. missing is Julia only concept. netCDF files can not have them

As you can see here, most of the layers are 32-bits floats

julia> println(gdalinfo("woa18_decav_t00_01.nc"))
Driver: netCDF/Network Common Data Format
Files: woa18_decav_t00_01.nc
Size is 512, 512
Metadata:
  NC_GLOBAL#cdm_data_type=Grid
  NC_GLOBAL#comment=global climatology as part of the World Ocean Atlas project
  NC_GLOBAL#contributor_name=Ocean Climate Laboratory
  NC_GLOBAL#contributor_role=Calculation of climatologies
  NC_GLOBAL#Conventions=CF-1.6, ACDD-1.3
  NC_GLOBAL#creator_email=NCEI.info@noaa.gov
  NC_GLOBAL#creator_institution=National Centers for Environmental Information
  NC_GLOBAL#creator_name=Ocean Climate Laboratory
  NC_GLOBAL#creator_type=group
  NC_GLOBAL#creator_url=http://www.ncei.noaa.gov
  NC_GLOBAL#date_created=2019-07-28
  NC_GLOBAL#date_modified=2019-07-28
  NC_GLOBAL#geospatial_lat_max=90
  NC_GLOBAL#geospatial_lat_min=-90
  NC_GLOBAL#geospatial_lat_resolution=1.00 degrees
  NC_GLOBAL#geospatial_lat_units=degrees_north
  NC_GLOBAL#geospatial_lon_max=180
  NC_GLOBAL#geospatial_lon_min=-180
  NC_GLOBAL#geospatial_lon_resolution=1.00 degrees
  NC_GLOBAL#geospatial_lon_units=degrees_east
  NC_GLOBAL#geospatial_vertical_max=5500
  NC_GLOBAL#geospatial_vertical_min=0
  NC_GLOBAL#geospatial_vertical_positive=down
  NC_GLOBAL#geospatial_vertical_resolution=SPECIAL
  NC_GLOBAL#geospatial_vertical_units=m
  NC_GLOBAL#id=woa18_decav_t00_01.nc
  NC_GLOBAL#institution=National Centers for Environmental Information (NCEI)
  NC_GLOBAL#keywords=Oceans< Ocean Temperature > Water Temperature
  NC_GLOBAL#keywords_vocabulary=ISO 19115
  NC_GLOBAL#license=These data are openly available to the public. Please acknowledge the use of these data with the text given in the acknowledgment attribute.
  NC_GLOBAL#metadata_link=https://www.nodc.noaa.gov/OC5/woa18/
  NC_GLOBAL#naming_authority=gov.noaa.ncei
  NC_GLOBAL#nodc_template_version=NODC_NetCDF_Grid_Template_v2.0
  NC_GLOBAL#processing_level=processed
  NC_GLOBAL#project=World Ocean Atlas Project
  NC_GLOBAL#publisher_email=NCEI.info@noaa.gov
  NC_GLOBAL#publisher_institution=National Centers for Environmental Information
  NC_GLOBAL#publisher_name=National Centers for Environmental Information (NCEI)
  NC_GLOBAL#publisher_type=institution
  NC_GLOBAL#publisher_url=http://www.ncei.noaa.gov/
  NC_GLOBAL#references=Locarnini, R. A., A. V. Mishonov, O. K. Baranova, T. P. Boyer, M. M. Zweng, H. E. Garcia, J. R. Reagan, D. Seidov, K. W. Weathers, C. R. Paver, I. V. Smolyar, 2019: World Ocean Atlas 2018, Volume 1: Temperature.  A. V. Mishonov, Technical Ed., NOAA Atlas NESDIS 81
  NC_GLOBAL#sea_name=World-Wide Distribution
  NC_GLOBAL#standard_name_vocabulary=CF Standard Name Table v49
  NC_GLOBAL#summary=Climatological mean temperature for the global ocean from in situ profile data
  NC_GLOBAL#time_coverage_duration=P63Y
  NC_GLOBAL#time_coverage_end=2017-12-31
  NC_GLOBAL#time_coverage_resolution=P01Y
  NC_GLOBAL#time_coverage_start=1955-01-01
  NC_GLOBAL#title=World Ocean Atlas 2018 : sea_water_temperature Annual 1955-2017 1.00 degree
Subdatasets:
  SUBDATASET_1_NAME=NETCDF:"woa18_decav_t00_01.nc":lat_bnds
  SUBDATASET_1_DESC=[180x2] lat_bnds (32-bit floating-point)
  SUBDATASET_2_NAME=NETCDF:"woa18_decav_t00_01.nc":lon_bnds
  SUBDATASET_2_DESC=[360x2] lon_bnds (32-bit floating-point)
  SUBDATASET_3_NAME=NETCDF:"woa18_decav_t00_01.nc":depth_bnds
  SUBDATASET_3_DESC=[102x2] depth_bnds (32-bit floating-point)
  SUBDATASET_4_NAME=NETCDF:"woa18_decav_t00_01.nc":climatology_bounds
  SUBDATASET_4_DESC=[1x2] climatology_bounds (32-bit floating-point)
  SUBDATASET_5_NAME=NETCDF:"woa18_decav_t00_01.nc":t_an
  SUBDATASET_5_DESC=[1x102x180x360] sea_water_temperature (32-bit floating-point)
  SUBDATASET_6_NAME=NETCDF:"woa18_decav_t00_01.nc":t_mn
  SUBDATASET_6_DESC=[1x102x180x360] sea_water_temperature (32-bit floating-point)
  SUBDATASET_7_NAME=NETCDF:"woa18_decav_t00_01.nc":t_dd
  SUBDATASET_7_DESC=[1x102x180x360] sea_water_temperature number_of_observations (32-bit integer)
  SUBDATASET_8_NAME=NETCDF:"woa18_decav_t00_01.nc":t_sd
  SUBDATASET_8_DESC=[1x102x180x360] t_sd (32-bit floating-point)
  SUBDATASET_9_NAME=NETCDF:"woa18_decav_t00_01.nc":t_se
  SUBDATASET_9_DESC=[1x102x180x360] sea_water_temperature standard_error (32-bit floating-point)
  SUBDATASET_10_NAME=NETCDF:"woa18_decav_t00_01.nc":t_oa
  SUBDATASET_10_DESC=[1x102x180x360] sea_water_temperature (32-bit floating-point)
  SUBDATASET_11_NAME=NETCDF:"woa18_decav_t00_01.nc":t_gp
  SUBDATASET_11_DESC=[1x102x180x360] t_gp (32-bit integer)
Corner Coordinates:
Upper Left  (    0.0,    0.0)
Lower Left  (    0.0,  512.0)
Upper Right (  512.0,    0.0)
Lower Right (  512.0,  512.0)
Center      (  256.0,  256.0)

The the nodata (called _FillValue in netCDF) is 9.96921e36

2 Likes

Linking this reference page for more information on the topic.

1 Like

This is a good discussion, and we can all learn things about netCDF.
However - please be nice to each other. I know this is difficult but a few :grinning: help a lot.

3 Likes

Thank you. Below is one example of Int32 values in the NetCDF file above (Variable: t_gp). It seems that the FillValues for such integers are empty spaces?


[VAR_NAME] t_oa
[VAR_ID] 14
[ATTR] standard_name: “sea_water_temperature”
[ATTR] long_name: “statistical mean value minus the objectively analyzed mean value for sea_water_temperature.”
[ATTR] _FillValue: “9.969209968386869e+36”
[DIM] lon: 360
[DIM] lat: 180
[DIM] depth: 102


[VAR_NAME] t_gp
[VAR_ID] 15
[ATTR] long_name: “The number of grid-squares within the smallest radius of influence around each grid-square which contain a statistical mean for sea_water_temperature.”
[ATTR] _FillValue: " "
[DIM] lon: 360
[DIM] lat: 180
[DIM] depth: 102

Which package are you using to read the netcdf file? In NCDatasets we automatically replace all values equal to the attribute to _FillValues (and missing_value) to missing. You can use ds["varname"].var if you do not want this transformation. More information is available here:

https://alexander-barth.github.io/NCDatasets.jl/stable/dataset/#CommonDataModel.cfvariable

https://alexander-barth.github.io/NCDatasets.jl/stable/variables/

2 Likes

Looking at the opendap link (OPeNDAP Dataset Query Form) the _FillValue attribute of the variable t_gp is -32767.

In NCDatasets all this metadata is reported in the show function (often implicitly called).

To replace the missing by another value like NaN I often use the nomissing (from NCDatasets).

1 Like

Thanks for sharing. Basically something like the below:

a = nomissing(da)

Retun the values of the array da of type Array{Union{T,Missing},N} (potentially containing missing values) as a regular Julia array a of the same element type and checks that no missing values are present.

a = nomissing(da,value)

Retun the values of the array da of type Array{Union{T,Missing},N} as a regular Julia array a by replacing all missing value by value.

1 Like