Read multi .nc files along new dimension

Dear all,

I have some netcdf files containg temperature and precipitation, with size 360 x 180. There isn’t inside the .nc file the dimension of time, but it is on the name of each file (2012.01.nc, 2012.02.nc, … ). I want to open all files and read all variables along the dimension of time.

I already read this issue https://github.com/Alexander-Barth/NCDatasets.jl/issues/124 (the same) but it is not currently possible to set aggdim a new dimension that do not exist in files (xarray can do it).

Here How to initialize multidimensional arrays for using Julia's ncread() on the same variable in different files? - Stack Overflow i found a way to do it for each variable each time.

Did you find any more general way to achive this multi-read?

Cheers,
Inna

Is it possible for you to edit the files? Then you could write a function that opens a netCDF called “2012.01.nc”, and use that name to create a new variable called “time” with the value 2012-01-01, and save the file. If you run such a function on all files then you should be able to use aggdim="time" afterwards.

Hi, thanks.
It is possible to edit, but temperature and precipitation has only Lon & Lat as dimensions. When i define the time variable and set aggdim="time", can’t merge in time my 2 variables.

However, if i firstly define a new variable “aod”, with dimensions Lon x Lat x Time, then i can merge. So the new task is how can i append a new dimension to an already existed variable with already defined dimensions ? ( I want to edit the dimensions of temperature as Lon x Lat x Time without rewriting again the whole netcdf file)

Ah ok. I assumed that it could use scalar values variables called “time”. I updated the NCDatasets issue, because that would be quite helpful already.

So the new task is how can i append a new dimension to an already existed variable with already defined dimensions ?

I don’t think you can add a dimension in place. You could create a variable based on what you have, but with the extra dimension, and then just copy over the data.

You can use YAXArrays.jl. When you know that your files have all the same dimensions and sizes you can do the following:

using YAXArrays, NetCDF, Dates

"""
   getdate(x,reg = r"[0-9]{8}T[0-9]{6}", df = dateformat"yyyymmddTHHMMSS")
Return a DateTime object from a string where the time stamp is found by `reg`
and it is parsed according to the `df` dateformat.
"""
function getdate(x,reg = r"[0-9]{8}T[0-9]{6}", df = dateformat"yyyymmddTHHMMSS")
   m = match(reg,x).match
   date =DateTime(m,df)
end

filelist = readdir(pwd()) # This gets you a list of the files in the current directory. 
timestamps = getdate.(filelist)

cubelist = Cube.(filelist)

timeaxis = RangeAxis("Time", timestamps)
 cube = concatenatecubes(cubelist, timeaxis)


With this you would get a YAXArray with which you can then do your data analysis. Or you can use savecube(cube, "pathtosave.nc") to store the data on a single netcdf on disk.

1 Like

Hi,interesting way with YAXArrays.jl but when i Cube.(filelist) , i am getting the error:

ERROR: NetCDF file /home/inna/Desktop/tes.2018.06.nc does not have a variable named lon 

I found a way :

using NCDatasets, Glob
paths = sort(glob("home/inna/Desktop/tes*.nc","/"))
a=[Dataset(i) for i in paths] # an array with many datasets
a[1]["tmp"][:,:,:] # the first dataset with one specific variable

Thanks for your suggestion.

Can you post the full error message?
You can also try to load the netcdf file as a Dataset from YAXArray by:

ds = open_dataset("pathtofile")

It would be interesting to see also this result.

Late to the party, but RasterSeries in Rasters.jl is designed for this use-case. Use the regex from @Fliks, then:

using Rasters
series = RasterSeries(filelist, Ti(timestamps))
# Retreive something
series[Ti=At(DateTime(2001, 1, 1)]

If you need to join the files into a single Raster variable with a time dimension, use Rasters.combine:

combined_raster = combine(series, Ti)
write("combined.nc", combined_raster)

You can do the same thing with a series of tif or whatever, and still write to netcdf.

3 Likes

Hi!

Is that solution supposed to work with an arbitrary dimension? Let’s say I have 2 files member0.nc and member1.nc that I want to combine along the mydim = Dim{:member}([0, 1]). I’m doing:

series = RasterSeries(files, mydim)
combined_raster = Rasters.combine(series, Dim{:member})

When doing so, I’m getting a strange

LoadError: BoundsError: attempt to access NTuple{7, Int64} at index [8]

Is it supposed to work and if so, would you have any idea what I’m doing wrong? Thanks!

Hi @tcarion, I’m not sure what is happening there, and don’t really have enough information to reproduce. Please make a github issue for this for Rasters.jl, and include a link to the files and full stack trace of the error, and I’ll help you resolve it.

If your MWE includes all of this as a working script (including downloading the file) I’ll be able to help more easily.

1 Like

Hi @Raf, thank you, the issue is open:
https://github.com/rafaqz/Rasters.jl/issues/234

1 Like