Grid Interpolation of Data from Polar-Orbiting Satellites

Hi all,

I’ve been coding for a little over two years now, but it has mostly been to make plots and perform simple calculations. Most other times, I have colleagues who also code in Julia and have codes/functions/packages that simplified my coding experience. So I’m not that familiar with the technical jargon and I apologize beforehand if my explanation about my issue doesn’t make sense.

I’m trying to compile a database of hurricane-centered, passive microwave-based products from polar-orbiting satellites. Below is an example of one case, but I’m generalizing my code to apply to other cases. What I’ve done so far is converted the pixel location, originally in the form of latitude and longitude of the satellite pixel, to distance from the storm center. Below is an example of the surface precipitation field from Hurricane Matthew (2016) before and after I interpolated the distance.

Before I explain my issue, I have to first explain the data structure. The variables from the satellite product are given in terms of m x n dimension, where m corresponds to the cross track pixels and n corresponds to the along track scan. The cross-track pixels are perpendicular to the satellite trajectory as it traverses the earth, where the satellite scans along an arc. The along track scan is essentially how many cross-track scans are made, and is parallel to the trajectory of the satellite. All the variables from this dataset is given in this dimension, including the lat/lon (which I have converted to x and y having the same dimension).

So, if I want to plot the same data as above using PyPlot, but focusing on one or the other dimension, here’s what the code would look like:

# Remove missing/fill values
var[findin(var, -9999.0)] = NaN;

plt.subplot(1,2,1)
# Focus on cross-track pixels from index 20 to 22 for all along-track scan
plt.contourf(x[20:22, :], y[20:22, :], var[20:22, :], levels=(0:70))

plt.subplot(1,2,2)
# Focus on along-track scans from index 170:190 for all cross-track pixels
plt.contourf(x[:, 170:190], y[:, 170:190], var[:, 170:190], levels=(0:70))

Producing the plot:

The next step I would like to take is to interpolate the x and y values (again, distance relative to the storm center) onto a Cartesian grid from -800 km to 800 km along both x and y, with a constant grid spacing. I’ve looked at both the Julia Interpolations.jl package, and the Dierckx.jl package, but neither seem to be able to do what I want to do.

For the Interpolations package, to create an interpolation object, the input x and y values must each be a 1-d array, while the var values must have dimension [length(x), length(y)]. As you can see from the data structure that I am working with, that is not possible, since each index m and n in either x, y, or var is unique, because of the way the satellite scans.

I tried to use the Dierckx package, where a 2-d interpolation could be conducted if all x, y, and var have the same dimensions. So, I re-shaped my x, y, and var into 1-d arrays with length (m times n) and tried the Dierckx interpolation (copy and pasted from the examples given on the GitHub page). Below is what I have:

x_new = reshape(x, length(x))
y_new = reshape(y, length(y))
var_new = reshape(var, length(var))

spl = Spline2D(x_new, y_new, var_new; kx=3, ky=3, s=0.0)

But that crashed my Jupyter Notebook kernel (at least I think it did, 5 minutes went by without any indication that it finished running). I tried running the above code on a smaller subset of x, y, and var, and I got the error message, “The required storage space exceeds the available storage space: nxest or nyest too small, or s too small. Try increasing s.” and I’m not sure what that means. I increased s to 4000, but I still got the same answer. I have to admit that attempt was a desperate shot in the dark and I don’t really understand how the smoothing factor works.

On top of all this, the satellite data has missing values (-9999.0), and I’m not sure how the different interpolations package would handle that or NaNs. After interpolating the data onto a Cartesian grid with constant grid spacing, I would also like to fill the regions not covered by the satellite scan with fill values, so that I would have a 1600 x 1600 km grid with either a missing value or the actual value at each grid point.

Would anyone have an idea as to how I could go about doing this? I would greatly appreciate any help I can get.

1 Like

If all else fails, you might get some mileage out of a very blocky linear interpolation (written in C, so would need to be rewritten in julia or else compiled and called as C; colleagues may be helpful if the latter option is of interest):

#define LATS       0                         /* satellite latitude index */
#define LONS       1                         /* satellite longitudes index */
#define MISSING    -9e9                      /* generic missing value */
#define DELMISS    -8e9                      /* generic missing value comparison */

void interp(int param, int lats, int lons, float lat, float lon, float *val, float *latind, float *lonind)
{
    int a, b, c, d, quad;
    double ax, bx, cx, dx, gx, hx, px, ay, by, cy, dy, gy, hy, py;
    double vala, valb, valc, vald, delx, dely, sum, wgt;

    px = lon ; py = lat;                                                      /* first find the quadrilateral that */
    *val = *latind = *lonind = MISSING;                                       /* bounds the target lat,lon point then */
    for (a = 0; a < lats-1; a++)                                              /* bi-linearly interpolate, and as with */
      for (b = 0; b < lons-1; b++)                                            /* the grid indices (a,b), the desired */
        if (*latind < DELMISS) {                                              /* subgrid indices (dely,delx) increase */
          ax = data[LONS][a  ][b  ] ; ay = data[LATS][a  ][b  ];              /* downward and to the right (subgrid */
          bx = data[LONS][a+1][b  ] ; by = data[LATS][a+1][b  ];              /* indices should be in the range [0,1]) */
          cx = data[LONS][a  ][b+1] ; cy = data[LATS][a  ][b+1];
          dx = data[LONS][a+1][b+1] ; dy = data[LATS][a+1][b+1];
          if (ay > by && cy > dy && cx > ax && dx > bx) {                     /* (ignore poorly defined gridboxes) */
            vala = bx + (ax-bx) / (ay-by) * (py-by);
            valb = dx + (cx-dx) / (cy-dy) * (py-dy);
            valc = by + (dy-by) / (dx-bx) * (px-bx);
            vald = ay + (cy-ay) / (cx-ax) * (px-ax);
            if (px >= vala && px <= valb && py >= valc && py <= vald) {
              vala = (cx-ax) * (dy-by) - (cy-ay) * (dx-bx);
              valb = (py-ay) * (dx-bx) + (px-bx) * (cy-ay) - (px-ax) * (dy-by) - (py-by) * (cx-ax);
              valc = (px-ax) * (py-by) - (py-ay) * (px-bx);
              vald = valb * valb - 4.0 * vala * valc;
              delx = (-valb - pow(vald,0.5)) / 2.0 / vala;
              gx = ax + (cx-ax) * delx;
              gy = ay + (cy-ay) * delx;
              hx = bx + (dx-bx) * delx;
              hy = by + (dy-by) * delx;
              dely = (px-gx) / (hx-gx);
              if (delx < 0 || delx > 1 || dely < 0 || dely > 1) {
                fprintf(stderr,"ERROR : interpolating desired point outside quadrilateral\n");
                printf("(%lf,%lf) (%lf,%lf) (%lf,%lf) (%lf,%lf) bounds (%lf,%lf)\n",ay,ax,by,bx,cy,cx,dy,dx,py,px);
                printf("but relative indices (from upper left) are (%lf,%lf) (i.e. outside [0,1])\n",dely,delx);
                exit(-1);
              }
              *latind = (float)a + (float)dely;
              *lonind = (float)b + (float)delx;

              ax = data[param][a  ][b  ] ; ay = (1.0 - dely) * (1.0 - delx);  /* then take a linear average of valid */
              bx = data[param][a+1][b  ] ; by =        dely  * (1.0 - delx);  /* gridpoint parameter values (but set the */
              cx = data[param][a  ][b+1] ; cy = (1.0 - dely) *        delx ;  /* result to missing if the desired location */
              dx = data[param][a+1][b+1] ; dy =        dely  *        delx ;  /* is within the quadrant of a missing value) */
              if (ax > DELMISS && bx > DELMISS && cx > DELMISS && dx > DELMISS)
                *val = ax*ay + bx*by + cx*cy + dx*dy;
              else {
                sum = wgt = 0.0;
                if (ax > DELMISS) {sum += ax*ay ; wgt += ay;}
                if (bx > DELMISS) {sum += bx*by ; wgt += by;}
                if (cx > DELMISS) {sum += cx*cy ; wgt += cy;}
                if (dx > DELMISS) {sum += dx*dy ; wgt += dy;}
                if (wgt > 0) {
                  *val = sum / wgt;
                  if (delx < 0.5) {if (dely < 0.5) quad = 1 ; else quad = 2;}
                  else            {if (dely < 0.5) quad = 3 ; else quad = 4;}
                  if ((quad == 1 && ax < DELMISS) || (quad == 2 && bx < DELMISS) ||
                      (quad == 3 && cx < DELMISS) || (quad == 4 && dx < DELMISS))
                    *val = MISSING;
                }
              }
            }
          }
        }
}
2 Likes

@naufalwx I don’t have a solution for you yet, but hopefully it will exist in the future within GeoStats.jl. There you can find various interpolation methods that operate on general spatial domains (regular grids, point sets, unstructured meshes, …). Currently I have only implemented RegularGrid and PointSet domain types, you would have to define your own domain for the satellite data. I don’t have time at the moment to implement it myself, but pull requests are very welcome.

Feel free to reach out on our gitter channel if you need guidance on how to implement a custom domain type.

1 Like

I think you want essentially the same mapping that plot uses, and probably a filter for interpolation

function xscale(px, byx)
  return round(Int64, ((px .- byx[1]) ./ (byx[2]-byx[1]))*1599)+1;
  end;

xplotmap = zeros(1600, 1600)
xparam = [minimum(x), maximum(x)];
yparam = [minimum(y), maximum(y)];
##xplotmap[xscale(x[:], xparam), xscale(y[:], yparam)] = var[:]; 
for iter = 1:length(x[:]) 
  xplotmap[xscale(x[iter], xparam), xscale(y[iter], yparam)] = var[iter];
  end;
1 Like

Thank you everyone for your inputs and suggestions! My apologies for getting back to this so late. I was caught up with a bunch of other things. I’ll need some time to go through your suggestions thoroughly and to try them out. But I greatly appreciate your inputs!

I found a function in Scipy that works for me - griddata. Does Julia have a similar function that works with scattered data as this function does?

Here’s what I did, for the same data structure as above:


# Import Scipy
@pyimport scipy.interpolate as iplt

# Reshape x, y, and var into 1-D arrays. After this, the data structure is considered "scattered"
x_r = reshape(x, length(x))
y_r = reshape(y, length(y))
var_r = reshape(var_r, length(var_r))

# Remove fill values of -9999.0 and replace with NaN's
var_r[findin(var_r, -9999.0)] = NaN

# Generate values for the new grid
grid_x = zeros(801,801)
grid_y = zeros(801,801)

for i in 1:801

# Similar to meshgrid. The grid is from -800 to 800 km with 2 km grid spacing.
grid_x[i,:] = collect(-800:2:800)
grid_y[:,i] = collect(-800:2:800)

end

# Linearly interpolate data. "fill_value" here just puts a fill_value where there is no interpolation to be done
grid_var = iplt.griddata((x_r, y_r), var_r, (grid_x, grid_y), method="linear", fill_value=NaN)

The plots below show the data, un-interpolated and interpolated.

@naufalwx can you share the data so that I can try to reproduce the interpolation with GeoStats.jl? I am still confused about what you want to achieve. It looks like we can do it already with the current version of the project.

Please feel free to reach out in our gitter channel: JuliaEarth/GeoStats.jl - Gitter

Following the documentation of Scipy’s griddata, they first generate a tesselation for the given input knots and then interpolate for the target points. I think the building blocks to do this in Julia are there, you can make use of

to create the tesselation. The package also provides a locate function to tell you which triangle your target point is in. The only remaining step would then be the linear interpolation inside the triangle.

In a single step by GMT’s triangulate that is available via GMT.jl

I don’t think there is even a need to perform tesselation in this case. The grid he is plotting is apparently a masked regular grid? If that is the case, this could be literally solved in 4 lines of GeoStats.jl

Anyways, let’s wait for a reply, it may take a couple of days again…

Hi everyone,

Thank you for getting back to me so soon. Here is an example file: https://github.com/naufalwx/TC-Microwave-Database/blob/master/Matthew_test.nc

My initial plan was to try to do a bilinear interpolation. But, I don’t know how, given the data structure. So I looked at packages in other languages and figured out I could treat the data points as scattered data points, and use another interpolation method instead of bilinear, which brought me to scipy’s griddata.

@naufalwx can you explain how the data can be loaded in Julia? Tried FileIO but it failed.

@juliohm It’s a NetCDF file. So you should just be able to use ncread. The three variables on there are labeled “sfc_precip”, “x”, and “y” :slight_smile:

Sorry but this file has no valid data in neither of its 3 vars: x, y, z

Could you maybe share a small script to load the data? Help us help you :slight_smile:

Definitely :slight_smile: I am able to read-in the file in the same kernel that I wrote the file, but if I open a new kernel and try to read-in the same file, it failed. Give me just a moment to figure it out and I’ll send out another notification when I’ve resolved this issue. Sorry for the delay, but thank you for your continued interest in helping me! :slight_smile:

I’ve resolved the issue. The updated file should be on the GitHub link. To read-in the file:


using NetCDF;

x = ncread("Matthew_test.nc", "x");
y = ncread("Matthew_test.nc", "y");
precip = ncread("Matthew_test.nc", "sfc_precip");

# If you want to look at the variable attributes
ncinfo("Matthew_test.nc")

Just make sure that the file is in the right directory :slight_smile: All files should have the dimension of (221, 366). There are NaNs in “sfc_precip”.

Thanks, now I get the format. There is a feature request open for this NetCDF convention: https://github.com/juliohm/GeoStats.jl/issues/22 @Balinus was working on it, but I don’t know he moved forward with the implementation.

The issue will be addressed in future releases.

1 Like
using GMT

# Assume that the Matthew_test.nc is at current dir

# Load the x,y, z arrays from file
x = gmt("read -Tg Matthew_test.nc?x");
y = gmt("read -Tg Matthew_test.nc?y");
z = gmt("read -Tg Matthew_test.nc?sfc_precip");

# Check the range of the coordinates. Need to know x_min, x_max, y_min, y_max
# Those values are the 5th and 6th of
x.range
y.range

# Compute a grid. Don't know original point separation so from dimensions and x values
# esrtimated that they are about 4 km.
G = gmt("nearneighbor -R-1428/1215/-2729/2326 -I4 -S20", [x.z[:] y.z[:] z.z[:]]);

# Quick image
grdimage(G, shade="+ne0.8+a100", proj="X10c/0", frame="a", fmt="jpg", show=true)

3 Likes

GMT visualizations look amazing as usual.