Announcement: GeoStats.jl

package
announcement
statistics

#1

Dear all,

I am happy to announce GeoStats.jl, a package with high-performance implementations of geostatistical algorithms for the Julia programming language. It is in its initial development, and currently only contains Kriging estimation methods.

There is lots to be done in order to get estimation/simulation algorithms working on arbitrary domains (regular grids, unstructured meshes, etc.). In particular, I am planning to work on a hierarchy of types (similar to that of VTK) for spatial domains as my next step when Julia v0.6+ is out. If you have suggestions on how to achieve this hierarchy or if you see some interplay between this package and other packages for spatial data, please reach out.

The long-term goal is to have a general framework in which one can formulate geostatistical problems on any spatial domain and solve them efficiently (possibly in parallel) on modern hardware.

Related packages are:

Contributions and suggestions are more than welcome.

Thanks,
-Júlio

P.S.: Big thanks to @cormullion for the amazing Luxor.jl package, which I used to create the GeoStats.jl logo with a few lines of Julia.


Is there a typealias for Arrays of dimension less than N?
#2

Nice work!

Right now I don’t plan to integrate kriging (I now get memories from my master thesis!) with my package, but other type of interpolations will certainly be useful! Such as splines.

ClimateTools.jl

edit - Although, there is Dierckx.jl for splines.


#3

Thanks @Balinus! Please let me know if you feel like using it at some point. Having these NetCFD formats integrated would be interesting too. I don’t have much background on this file format, but you seem to have experience with them.

Feel free to submit PRs :slight_smile:

Best,


#4

Hi @Balinus, I just want to highlight a difference here for future readers about Kriging and Spline interpolation as these have very different properties.

  • Kriging is about finding an estimator of minimum error. This is achieved by minimizing the variance of the estimator, which leads to a convex optimization problem. The solution is therefore provably optimal given this objective.

  • Splines are interpolations that are constructed to look “smooth”. By enforcing these smoothness constraints, we arrive at an optimization problem that is sometimes equivalent to Kriging.

If your goal is to make accurate estimations about a real (or physical) property, then Kriging is your best friend. If your goal is to interpolate for visualization purposes only, then Splines are a safe option. Nowadays, Interpolations.jl is the best package for splines in my opinion.

For a more detailed comparison, please check this paper.


#5

Totally agree.

My main problem with kriging is that it is quite difficult to automate. In my field, we sometimes needs to do interpolations for daily values over 140 years (so ~50 000 iterations). I once did it with Kriging and I would’nt recommend it for such purpose. Although, this is one of the best way to interpolate a physical field and would recommend it for smaller sample, or if estimated value at specific points is important. Kriging is very flexible and can be steered in the right direction for multiple purpose.


#6

@Balinus what exactly was the automation issue you had? Kriging interpolation has two steps as described in that paper: 1) definition of spatial structure and 2) estimation. Spline interpolation on the other hand only has step 2). With splines you completely ignore the true spatial structure and always use the same one, that is why it is “automated” :slight_smile:

If step 1) is your headache, there are automatic methods for fitting spatial structure via optimization. In this notebook, I illustrate the manual fitting process, which is usually more robust. However, future releases of GeoStats.jl will include an optimizer for finding the sill, range and nugget from the data.


#7

The problem were arising when trying interpolation of daily precipitation. The semi-variogram calculation was problematic in the absence of precipitation (or very small values).

I admit it was 10-12 years ago and I was at the master level. There is probably ways to circumvent this obstacle.


#8

I want to write down another important connection that is missing in the Machine Learning literature.

Gaussian processes (the method) and Simple Kriging are essentially two names for the same concept, and this distinction only exists due to historical reasons. Matheron and other important geostatisticians have generalized Gaussian processes to random fields with non-zero mean and for situations where the mean is unknown. GeoStats.jl includes Gaussian processes as a special case as well as other more practical Kriging variants, see the Gaussian processes example.