I am using the wonderful
GeoStats package and associated
JuliaEarth tools. For my project I have to compute the variogram from a sample of data. The tricky thing is that the
EmpiricalVariogram() function in julia asks for the
maxlags to compute the variogram.
My question is what units are the lags measured in? In the demo notebook, the data is simply in Cartesian coordinates and so computing the lag represents meters or feet, etc. However, in my case I have data in Latitude, Longitude or WGS84 coordinates, or perhaps UTM coordinates. In the case of latitude and longitude, the units are usually degrees and so would the lag be specified in degrees–like 0.02 or 0.1 degrees, with a
maxlag = 1.0. Or does the lag represent some other metric, like the number of bins apart two observations are?
Similarly, in UTM coordinates, which are usually meters, does the
nlags mean the number of meters apart two observations are? Or is there a different unit that I am just not understanding?
Any assistance is appreciated. Thanks again for all the hard work on the Geospatial packages.
maxlag will have physical units of length–whatever your data’s coordinates are expressed in. (This will “work” if your data are in lon/lat coordinates, but you should really project them to UTM or whatever, since one degree of longitude represents a varying physical distance as you move from the equator to the pole.)
nlags does not have units–it’s just the number of bins the variogram point cloud gets averaged into (i.e., the number of points/bars in the figure here).
@ElOceanografo ahh, so that helps a lot. Thanks for the clarification. That is what I was missing. I have coordinates in both UTM and lat/long, but you are saying that GeoStats will compute distances in both methods. So for
maxlag the unit is in degrees for lat/long and in meters for UTM. Further,
maxlag represents half of the maximum possible lag between two points. So if an image is 100x100 pixels, then the maximum lag between any two points is 100, and half of that is 50 right. I can configure the
maxlag, but otherwise it would be half of the total possible lag in whatever units the data is in.
I think I understand
nlags now. So these are just the bins that represent the distance between groups of observations. I guess
maxlag will chop the total distance into bins, and then
nlags represents how many of those bins to use in computing the variogram.
Does that sounds about right. Thanks again.
Well, for a 100 x 100 grid, the maximum lag is actually
sqrt(100^2 + 100^2) (Pythagoras), but yes, that’s the idea.
If you have a spatial variable z measured at n locations, you have n(n-1) pairs of datapoints to compare to each other. Each one of these pairs is separated by a spatial distance or “lag” h, and the measurements of the spatial process at those two points differs by some value \Delta = z_i - z_j. If you make a scatterplot of h vs. |\Delta|^2, with one point for each pair of measurements, it would conceptually look like this:
Typically you divide the x-axis into a series of discrete intervals and take the average value of |\Delta|^2 within each of those bins to calculate the empirical variogram. In
EmpiricalVariogram() from GeoStats,
maxlag sets the upper limit of the x-axis on this plot, and
nlags determines how many bins to divide the interval
[0, maxlag] into.
@ElOceanografo Yep, okay so that makes sense. I really appreciate your sharing that information. Now I understand what those arguments mean. I will see if I can help to update the documentation for the package based on your comments.