Is there interest in having a Julian API for the NCDC's Climate Data Online (CDO)

package
proposal

#1

For my work, I have used the National Oceanic and Atmospheric Administration (NOAA) weather and climate data extensively. I have automated several pipelines using their API in R. There is also R’s opensci/rnoaa interface as another alternative. I was wondering if there was interest from the community for an interface in Julia. The package would implement an interface for two API: NCDC’s Climate Data Online (CDO) v2 and NCEI’s Historical Observing Metadata Repository. Before I go ahead and develop it, I would like to assess the community’s interest on it. It could also be a nice GSOC project (I could mentor for it).


#2

Absolutely, I use ecmwf data via python at the moment. been wanting to have a look at NCEI’s at some point.


#3

Note that there’s an incipient JuliaAO group that could act as an umbrella here (not that you need it).


#4

The development should be pretty straight forward from my experience using the API by now. I am aiming to do things a bit different from rnoaa and rather than just provide a convenient way to send the requests, actually process these. For example, if you want data for more than the allowed time coverage, automatically split those in the different requests and provide the aggregated result. For queries that go over the limit of values for a single request, construct the various queries needed to get the full information and return the aggregated result. Other things such as conveniently handle missing data, find nearby weather stations given some coordinates, etc. Organizations are critical when development is by various people (probably not the case here), but it does make it easier to maintain and helps make projects easier to discover. I might migrate the project when is ready to publish it for convenience.


#5

I am also currently downloading ECMWF data (ERA5 reanalysis) via Python but doing all calculations in Julia. I’m not familiar with what is available in the CDO but I intend to find out, so count me as interested.


#6

I started playing with it at NCEI.jl. Basically, most of the features are implemented there, but the code is not optimized. I decided to wait until 0.7 for the new JSON packages (see LazyJSON.jl). Once the dependencies support Julia 0.7-, I will optimize the code which should make it a lot faster. For a look at the interface and reporting feature requests or comments open an issue there. I will register it with the new Pkg3 when that work is done.


#7

Yes, I would be interested too.


#8

That would be awesome!

I could certainly use your work in my current ClimateTools.jl package which is a collection of commonly-used tools in Climate Science:

  • Reading netCDF files (wich custom polygon --> avoids loading the whole netCDF files, which as usually huge!)
  • Calculating climate indicators
  • Quantile-quantile bias correction
  • Interpolation of climate models values on other lat-lon coordinates
  • Operations such as climate deltas, temporal and spatial subset
  • Mapping results
  • etc…

Still a WIP, but having the possibility to request data over the internet would be awesome. I’ll certainly keep an eye on it.


#9

I definitely would - i have a NOAAData.jl package (https://github.com/pazzo83/NOAAData.jl) that pulls GHCND and GSOM data (still a work in progress). Maybe that could be incorporated as well.


#10

I am working with climate data a lot and investigating a possibility to use julia for this. I am looking for a package similar to pydata xarray. AxisArrays.jl is a step in right direction, but it is missing tools which read the data from different sources in different formats, like xarray.Dataset. I am not sure CDO is exactly what I need, but I definitely will watch this project.


#11

For what it’s worth, I’m using AxisArrays.jl in my package. Very convenient for accessing spatial and temporal subset as well as merging with consistent dimensions.


#12

Hello. Your package is very good, I am learning julia by browsing your code, but I can’t use it because it is too task specific. For example, you hardcode data attributes into ClimGrid structure, but my data sets have different attributes.

The advantage of xarray is that it does not focus on a specific task. It is a general purpose, but still very useful tool. Even within a climate field tasks are so different and data sets come in so different formats, that it is good to have a general purpose tool like xarray, on top of which one can build an application specific to his/her task with little effort.

Sorry for criticism, don’t take it personally.


#13

No worry! The current implementation is indeed task specific right now (I work in the climate analysis field). I didn’t had the time to spend too much time on this aspect as this is really time consuming, trying to read different format. The aim is to have something more generic though in the end (which would probably lead to a single package for I/O and another one for analysis).

For instance, right now the nc2julia function is indeed developed against CMIP5 simulations.

If you have ideas about how it could be more generic, I’d be interested to hear them!

edit - or if you can point me out towards the kind of file that you need to read, I can look it up.


#14

which would probably lead to a single package for I/O and another one for analysis

This would make sense imo.

If you have ideas about how it could be more generic, I’d be interested to hear them!

When I make myself more familiar with julia, and decide that it is worth to use it, I might contribute. I was thinking about a generic data set reader which a user could easily overload, like

function open_NOAA_data(path)
ds=open_generic_dataset(path)
apply_landsea_mask(ds)
flip_north_south(ds)
shiftgrid(ds)
....
return ds

Then a user would do:

ds=open_NOAA_data(path)
list_variables(ds)
print_attributes(ds)
SST=read_data(ds,"SST",slice)

Not sure if this makes sense though.


#15

Not sure you want to be using any climate data, it isn’t safe to handle that sort of material in this political climate.

But actually, good idea for a package (or two).


#16

Regardless of political climate, our economy depends on climate studies too much to disregard it. Somebody will create relevant packages any way. The question is how to design them such that they suit a broader spectrum of tasks without sacrificing convenience.


#17

Hello, just so you know. I’ve been thinking about all those different data format among climate simulations For example, CORDEX (regional simulations) and CMIP5 (global models) does not share exactly the same netCDF attributes. For instance, regional simulations needs a global drivers at their boundaries.

Hence, I’m thinking to rename the ClimGrid type to a bunch of different Julia struct, with methods defined on them. This would lead, at the end, to a new package most probably, but mostly to efficient struct on specific data.

This also means that a WeatherStation would be also defined and something like a WeatherNetwork type, comprised of WeatherStations. But on the front of weather stations, I’ll follow work done by others who use them more often than me, like the packages listed in this thread (e.g. NCEI.jl, NOAAData.jl, etc.). Such data is very important for bias correction of the climate simulations.

Anyway, if you’d like to have specific data format in ClimateTools, just send me a link to a typical data format so that I can play with it and see how they can be incorporated. Right now, development of ClimateTools is dependent on my need only and would very much like to add new features, climate indicators, etc…


#18

Does anyone in this thread have tools for reading GRIB files? I currently have to resort to matlab and a package I found for it but that doesn’t feel right…I’d rather get the data directly into julia. thanks!


#19

If you’re ok with using GDAL and it’s GRIB driver, you could try going through GDAL.jl or the higher level ArchGDAL.jl.

http://www.gdal.org/frmt_grib.html


#20

I’ll check that out, many thanks!