NCDatasets: copy the properties of a variable

Is there a facility to copy a netCDF variable including its axes (dimensions) and attributes at once?

A scenario:

ds_org = Dataset("original.nc", "r")
org = ds["orgvar"]
Dataset("new.nc", "c") do ds_new
  defVar(ds_new, "newvar", like = org) # newvar is just like orgvar
  new = ds_new["newvar"]
  new[:,:,:,:] = org ./ 100 # cm -> meters
  new.attrib["units"] = "meters" # cm -> meters
end
close(ds_org)

It would be nice if a code like the above would copy everything associated with the variable, including the associated coordinate variables, from the original netCDF file to the new.

(For a simple manipulation like the above, I would use NCO, but my real calculation is much more complicated.)

What’s the easiest way to copy the attributes and associated coordinate variables?

I’m not sure what’s the best way to do this. The variables and attributes are both not distinct from the dataset they come from, see for instance the type definitions here.

This functionality is currently not wrapped in a higher level interface, but perhaps nc_copy_var would make this easy?

This would be good to have, but a little more complicated than you describe here. It will need to recursively include other variables like spatial_ref and bounds variables linked to any of the coordinate variables.

As a alternative you can use Rasters.jl to get a Raster from a layer of a netcdf RasterStack and write it to a new netcdf file. This will make a copy of the orgvar variable and any attached coordinate and bounds variables:

write("new.nc", Raster("original.nc"; name=:orgvar))

You can change the metadata as well if you need to, though it could be easier.

Thank you, folks, for the ideas.

In the meanwhile, I wrote the program for my analysis and come to think that, as long as we assume the CF convention, it wouldn’t be so difficult to write a general variable exporter, although it would still be time-consuming to carefully read the CF convention and consider all possible cases.

I guess that nc_copy_var, if wrapped in a high-level interface, would simplify part of the code.

I was pleasantly surprised at Raster. It’s almost there. It would be perfect if NCDatasets had an “import Raster” feature:

orgvar = Raster("original.nc"; variable="orgvar") # I don't know if this works.
Dataset("new.nc", "c") do ds
  importRaster(ds, orgvar; name=:newvar, importValues=false)
  newvar = ds["newvar"]
  newvar[:, :, :, :] = big_calculation(orgvar)

For your information, my program (which isn’t general enough and which assumes that there are no “bounds”) looks like this:

function get_orgvar() # get the original variable, its axes, and its attributes
  ds = Dataset("original.nc", "r")
  orgvar = ds["orgvar"]
  axes = Vector{Any}(undef, 0)
  for nam in dimnames(psi)
    ax = ds[nam]
    push!(axes, (ax, nam))
  end
  return orgvar,  axes
end

orgvar, axes = get_orgvar()

Dataset("new.nc", "c") do ds # create the new variable
  dims = Array{String}(undef,0)
  for (ax,name) in axes
    len = size(ax,1)
    defDim(ds, name, len)
    defVar(ds, name, Float64, (name,), attrib = ax.attrib)
    ds[name][:] = ax[:]
    push!(dims, name)
  end
  let
    varname = "newvar"
    defVar(ds, varname, Float64, dims, attrib = orgvar.attrib)
    var = ds[varname]
    var.attrib["units"] = "meters"
    var[:,:,:,:] = big_calculation(orgvar)
  end

Not too bad, but it’s tedious.

Rasters.jl wraps NCDatasets, so you will never be able to import a Raster into it, as it would be a circular dependency.

The end goal of Rasters.jl is that you will do the majority of most raster workflows you would do in NCDatasets, ArchGDAL and other backends like GRIBB etc just in Rasters.jl, and it will take less code and be essentially the same for all backends. Mostly that is already the case. As far as I can tell everything you are doing is contained in these lines:

# Load your variable as a Raster and divide it by 100
newvar = Raster("original.nc"; name=:orgvar) ./ 100
# Change the name:
newvar = rebuild(newvar; name=:newvar)
# Update the units in the metadata dict
metadata(newvar)["units"] = "meters"
# Create a new netcdf file
write("new.nc", newvar)

Although I cant test it. It’s always better to link to a downloadable file we can all use so your code is a MWE (minimum working example).,

Edit: notice that attributes are called metadata and where you specify the variable we specify name. This is because none of this is NCDatasets.jl specific. Its just generic DimensionalData.jl syntax, which applies to all backends in Rastesrs.jl, and to other non spatial packages as well.

We also use Symbol for variable/raster names rather than strings.

1 Like

Here are different ways where NCDatasets can help you with this:

using NCDatasets

fname = download("https://www.unidata.ucar.edu/software/netcdf/examples/sresa1b_ncar_ccsm3-example.nc")

# option 1

ds_org = NCDataset(fname)
fname2 = "new.nc"
isfile(fname2) && rm(fname2)

org = ds["ua"]
Dataset(fname2, "c") do ds_new
    new = defVar(ds_new, "ua", org[:,:,:,:]/100,
                 dimnames(org),
                 attrib = org.attrib)

    new.attrib["units"] = "new units" # overwrite units
end


# option 2
fname3 = "new2.nc"
isfile(fname3) && rm(fname3)

Dataset(fname3, "c") do ds_new
    write(ds_new,ds_org,include=["ua"] # possibly other variables, there is also the option `exclude`
          )

    ds_new["ua"][:,:,:,:] = org/100 # overwrite variable
    ds_new["ua"].attrib["units"] = "new units" # overwrite units
end

close(ds_org)

Option 1 is similar to the code that you already have but it can be a bit shorter.
For some situations ncgen is also helpful.

1 Like

@Raf Thank you for your help! I’ve actually started to write a new program using Rasters as you describe and I’m almost there. Currently, I’ve got stuck at two problems which I haven’t been able to solve.

Problem 1: I create two new variables which share the same axes and most of attributes with the original:

orgvar = Raster("original.nc"; name=:orgvar)
new1 = similar(orgvar)
new2 = similar(orgvar)
new1[:, :, :, :] = somecalculation()
new2[:, :, :, :] = anothercalculation()
metadata(new1)["units"] = "m/s" # overwrite "units"
write("orgvar.nc", orgvar)
write("new1.nc", new1)
write("new2.nc", new2)

This program almost works: I get correct netCDF files with correct values. But, the problem is that the metadata are shared between the original and new variables so that if you change an attribute of one variable, the same attribute is changed for the other variables. So, how does one deep-copy the metadata? so that changes don’t propagate to the other variables?

Problem 2: How does one save multiple variables in one netCDF file? I’ve tried this

write("newdata.nc", orgvar, new1, new2)

but the resultant file is a MATLAB mat file (as reported by macOS’s file command) and not a netCDF file (ncdump doesn’t work on it).

By the way, is this Rasters.jl · Rasters.jl the only documentation of Rasters ?

This looks like a bug in DimensionalData.jl where the metadata is not deep copied in similar. For now, do rebuild(similar(orgvar), metadata=Dict()).

See: Deepcopy dims and metadata in `similar` · Issue #400 · rafaqz/DimensionalData.jl · GitHub

To save multiple variables to a single netcdf file, use a RasterStack, which is a collection of rasters sharing some or all dimensions. This should work:

stack = RasterStack(orgvar, new1, new2)
write("newdata.nc", stack)

(With the matlab file you are calling a write method outside of Rasters.jl by passing multiple arguments. Looks like I will need to define that for Raster so it cant happen, and throw an error)

For documentation there is also the DimensionalData.jl documentation, which covers a lot of these things at a lower level.

1 Like

Thanks! That appears to work but I don’t seem to be able to add metadata to the new Raster and save it. Below is a self-contained example. There you do see a new piece of metadata added to the new variable, but when you save the variable, the resultant netCDF file doesn’t include the new metadata.

using Downloads
using Rasters

const url = "https://www.unidata.ucar.edu/software/netcdf/examples/sresa1b_ncar_ccsm3-example.nc"
infile = Downloads.download(url, "tmp.nc")
#infile = "tmp.nc"

ua = Raster(infile; name = "ua")
v = rebuild(similar(ua[:,:,1,1]); name=:v, metadata=Dict())
metadata(v)["units"] = "someunits"
let attr = metadata(v)
  display(attr)
end
write("tmp2.nc", v)

Oh right sorry. Currently it needs to be flagged as netcdf metadata to actually get written, because there are all different types of metadata and I havent written the compatibility layer. Metadata needs a little work to be user friendly.

Use metadata=Metadata{Rasters.NCDfile}("a" => 1, "b" => 2)

Also might be good to start making github issues about things, theyre easier for me to track.

1 Like

Thank you for your help again!

Perhaps should I move to your github to continue this discussion?

Great! With that help, I’ve finally obtained a netCDF file that I wanted!

I’ve also found minor strange behaviors. See the following example.

Issue 1: The metadata is saved both as attributes of the variable and as global attributes.

Issue 2: If I used Strings as key for both attributes, I get this error:

ERROR: LoadError: MethodError: Cannot `convert` an object of type Symbol to an object of type String

In any case, I’ve found that Rasters is the best solution to my kind of use cases.

Self-contained example:

using DimensionalData: Metadata
using Downloads
using Rasters

const NC_meta = Metadata{Rasters.NCDfile}

const url = "https://www.unidata.ucar.edu/software/netcdf/examples/sresa1b_ncar_ccsm3-example.nc"
infile = Downloads.download(url, "tmp.nc")
#infile = "tmp.nc"

ua = Raster(infile; name = "ua")
#v = rebuild(similar(ua[:,:,1,1]); name=:v, metadata=NC_meta())
v = rebuild(similar(ua[:,:,1,1]); name=:v, metadata=NC_meta(:comment => "sample"))
metadata(v)[:units] = "someunits"
let attr = metadata(v)
  display(attr)
end
write("tmp2.nc", v)

That’s great.

What your’e bumping up against is a use case that I haven’t had (saving netcdf with custom metadata), and no-one else has made issues for. There are a lot of small things to do like this in Rasters.jl and more generally in the spatial ecosystem.

The best way to get them to happen is to make clear github issues defining how something should work, and pointing out how it doesn’t currently, or what is unsatisfactory about it. Include MWEs and suggested syntax after the change. If you can PR that’s even better, but a well written issue is a very valuable contribution.

1 Like