Screening Interest in a unified Vector/Raster Package, akin to R Terra

Great thread. I’ll add my thoughts here as well as I have been stewing on this stuff for awile:

Let me start with what I think is great and thanking the few that have done so much in such a short period of time:

  1. JuliaGeo’s performance is truly impressive thanks to the hard work of a few. This is unique benefit of the ecosystem that I cherish with my big data workflows.
  2. JuliaGeo’s composability between packages. The hard work that went into GeoInterface and GeoFormatTypes has made composability/interoperability unparalleled (no supporting evidence :wink: )
  3. I personally believe that JuliaGeo’s adding packages like lego blocks to build the toolset that’s most relevant for one’s workflow is the future. Building a monolithic package can silo users and stunt creativity. The future of geospatial analysis is not replicating the past, it’s supporting past functionality with an eye towards innovation. Julia (language and culture) excels at inter-discipline operability which I believe is a power worth embracing. I have a Raster and I can just use Images.jl without either knowing about the other. Keep that up and others will have trouble replicating these cross-discipline capabilities that is our superpower.
  4. DimensionalData.jl is a work of beauty and now underlies our two major raster packages Rasters.jl and YAXArray.jl. DimensionalData embodies the principles of literate programming making our code easy to intuit.
  5. With the recent maturation of GeomertryOps.jl we now have a solid foundation for improving vector operations.
  6. The welcoming, supportive and encouraging community is one that I am proud to be associated with.

My thoughts on where we could make further gains (some of these thoughts simply echo what others have already said):

  1. GDAL (and Proj and GEOS) have their limitations but damn… what a huge effort by very clever individuals that changed the landscape of geospatial analysis. GDAL isn’t going anywhere, and new tooling and readers are constantly being added. We should not try to swim upstream on this one. Nearly every geospatial package in our ecosystem should default to GDAL, Proj and GEOS. This alone would improve the user experience as any geospatial dataset can be read and manipulated, avoiding the need for users to search for bespoke packages for each unique datatype. I find myself just using GeoDataFrames.jl to read all of my vector data as it abstracts away file type. But this comes with a cost that we’re not able to take advantage of some of the gains we can get using Julia native readers… to me the future is for GeoDataFrames.jl to adopt a Rasters style approach where native Julia packages are used when they exist otherwise defaulting to GDAL. The user should not need to have any knowledge of which lower-level package is used. This is where JuliaGeo is failing right now and could use some attention. Maybe a wrapper around GDAL (GEOS/Proj) that points to native packages when beneficial is a way that this could be done without bespoke implementations within each package. An ideological push to replace GDAL would use up more resources than we have… a strategic effort to incrementally replace GDAL functionality that results in maximum gains is probably our best approach here.

  2. There are new technologies that are revolutionizing how we access and work with massive archives of geospatial data that are increasing living on the cloud (Zarr/Kerchunk/Icecunk: GitHub - earth-mover/icechunk: Open-source, cloud-native transactional tensor storage engine). These technologies will help unlock knowledge that has remained trapped in the data due our inability to command the full program of record when doing scientific analysis / ML / AI. The Julia community will need to track/shape/adopt these technologies to be able efficiently interface with the wealth of geospatial data that’s out there. There has been great work with DiskArrays.jl / Zarr.jl / YAXArrays.jl in this domain, and very recently Kerchunk.jl… but as a community we want to make sure that we seamlessly support these technologies so that our tooling does not become dated to an era when analysis was done eagerly on personal computers with local storage. Given the newness and power of these technologies, early buy in for JuliaGeo would provide motivation (cost < benefit) for new users to invest the time to become proficient in Julia and JuliaGeo.

  3. We should not “force” users to lean anything they don’t think they need to learn by making the JuliaGeo tooling any more complex than the bare minimum needed to work efficiently and effectively with the tooling. Simplicity, readability and efficiency are the goals worth pursuing. Everything should be made as simple as possible, but not simpler. I’m a backwards learner (lazy learner) that is always exploring tooling and what it can do to benefit my analysis/research goals. If it does something helpful, then I learn more… if it becomes to complex or onerous I move on. I try to minimize my sunken cost. Probably not the best approach but I suspect that others are lazy/exploratory like me and take a similar approach. JuliaGeo should strive to embody literate programming with a Julian flavor. Ideally one could read the code without reading the documentation and understand what is being done to read and manipulate the data (another kudos to DimensionalData for accomplishing this).

  4. I 100% agree that now that we have GeometryOps and several vector format readers, the time is right to develop a vector version of Rasters.jl that can load data lazily and perform geometry operations without the package and type gymnastics that is currently needed… but I also realize that everyone is doing this on their weekends and holidays so we are also waiting from an enthusiastic contributor to take the lead after which I’m sure the community will enthusiastically support the effort.

  5. [added late] JuliaGeo should strive to abstract away CRS wherever possible. Treating CRS in a similar way as units are handled with Unitful.jl would go a long way to reducing the burden on users, especially those that are geo-curious but not mapping experts.

  6. Lastly, I’ve become humbled enough to realize that no one person knows what’s best for the community and giving room for competing packages to evolve separately with support from the same or different communities is the sign of a health ecosystem. Packages will compete, learn from, merge, and die… all in the name of progress.

10 Likes