A native Julia package for satellite images?

I use satellite data from a few sources (NASA- Landsat, etc., ESA - Sentinel1/2, etc.), and I know that there are a few ways to handle this in Julia (like GeoArrays.jl ArchGDAL.jl). But are there any thoughts to developing a pure Julia implementation for satellite image processing? Would it be worth connecting such a thing to Images.jl (GeoImages.jl?) to make use of that functionality? Or would it be better to build off of GeoArrays.jl? I anticipate being able to help out with such a thing, but only have limited experience developing packages and tend to write pretty hacky code.

1 Like

Hi @mihalybaci, what kind of functionality do you miss in JuliaImages? What would be useful to have for satellite images specifically?

There was this talk on the ESDL.jl package at JuliaCon about Sentinel data.

I also started a repo RemoteSensing.jl a long time ago that never went anywhere other than getting formulas typed up.


I’ve only just begun to start trying to analyze satellite images in Julia, so I don’t have much to go by yet, but from Images I have found the “views” functions, type conversions, and RGB display/save most useful. That being said, I don’t have any attachment to a particular framework, I was asking mostly because I will probably start developing some tools soon that I would like to work within the larger JuliaGeo ecosystem. So before I go off on my own, I wanted to see what would be most useful for the community in terms of where to start and what to build on.

@Crghilardi - Yeah, I did see that talk (thank you, virtual JuliaCon!) and need to check it out more fully.

Would be great to have better satellite image processing tools. Note that satellite images would in most cases be represented as AbstractArrays. Hence Júlio mentioned JuliaImages, since images are also just arrays. Same for JuliaArrays for instance. JuliaImages was built for microscopy, but works just as well for images that cover large swathes of the globe.

You are right that that GeoArrays depends on GDAL, but that is just for input and output. A GeoArray is also just a pure julia AbstractArray. Similarly JuliaImages uses wrapped libraries like ImageMagick for file i/o. We can write more pure julia i/o packages for raster data formats, but I’d personally focus on writing processing algorithms that work on AbstractArray where possible.

As to which particular AbstractArray to use, at JuliaCon there was also a Fancy Array Indexing BoF. Hopefully soon a blog post summarizing the available packages will come out from that.

Lastly, another satellite image processing package worth mentioning is https://github.com/JuliaClimate/ClimateSatellite.jl

1 Like

Okay, thanks for the input! I’ll start playing around with GeoArrays for the time being. And check out JuliaClimate as well.

Check out GeoStats.jl as well @mihalybaci, it generalizes various operations to other kinds of spatial data configurations like point sets (rain gauges), unstructured grids, etc.

Oh yeah, I know about GeoStats :slight_smile: I started watching the videos, but still need to finish them off.

1 Like

Actually, this can be generalized from satellite images to “remote sensing images”, as e.g. aerial and satellite images are often similar in their properties. I just started to play around with GeoArrays and wonder how this could be integrated e.g. with AxisKeys or NamedDims. (I think the two might be suited best and have the best syntax & least overhead.) Then one could for example store the extent as type information. Furthermore it would be possible to have other axis (e.g. time) as well as easy permutation of dimensions, without loosing any of that info, which is currently unsupported by GeoArrays I think.

1 Like

Yeah I agree. Rather than remote sensing / satellite I think the basis should be fully domain agnostic. And if we want we can sprinkle domain specific convenience functions on top in separate packages. This approach is also used for GeoData.jl.

But indeed I also like AxisKeys.jl + NamedDims.jl. GeoArrays.jl uses an affine map rather than keys, but in AxisKeys ranges could be used instead. I have xarray in the back of my head, that is also why I posted here: [ANN] FileTrees.jl -- easy everyday parallelism on trees of files. Because if we have nice composable packages for:

  • named dimensions (e.g. NamedDims.jl)
  • fancy array indexing (e.g. AxisKeys.jl)
  • chunked arrays (e.g. DiskArrays.jl)
  • and can process the chunks in parallel using Dagger.jl / FileTrees.jl / DataSets.jl (see FileTrees link above)

If we have that, we’d be essentially there. But we can take small steps, and not all components are always needed (such as lazy / parallel operations).

This is basically what GeoData.jl does. It handles spatial data with named dimensions and various kinds of index. It also handles projections and combinations of array/index forward/reverse order and dimension permutations fairly seamlessly, and other spatial data properties like the differences between intervals and points, selecting subsets correctly for intervals indexed at the start or center of cells, and tracking of dimension bounds.

It also has tools for handling multi layered files like netcdf and hdf5 as stacks, and simulating that using stacks of tiffs so you can swap file types with the same code.

It’s currently in the process of registation, should be a day or two now.

1 Like

Now that DiskArrays is getting getting rolled out, a lot of that is on the way to working in GeoData.jl. There’s also a way of using both affine map like indexing or regular index build into DimensionalData.jl, without changing any syntax. It needs some work to work with affine mapped GDAL data. I just don’t ever use affine maps so it hasn’t happened yet.

FileTrees.jl could really help with some of the stack/series level operations happening over multiple files, that are currently single-threaded.

1 Like

FWIW I have found myself in the situation as well where there is a variety of packages that implement named axes/axes with values and I did not know which one to build upon. Currently I am in the process of registering https://github.com/meggart/YAXArrayBase.jl which is supposed to serve as a common interface to different AxisArray-like packages. It currently supports AxisArrays, DimensionalData, AxisIndices, NamedDims and ArchGDAL and you can query dimension names and values using the same interface, as well as convert your data types between these backends.

This way you can write your code in a backend-agnostic way and don’t have to guess which of the packages mentioned above will be the one that everybody uses in the future.

BTW I just learned about AxisKeys.jl through this thread, so that will be the next package to support…

1 Like

Since parallelism was mentioned, I am curious about parallelism in GDAL. Does anyone know whether or not GDAL uses much parallel code? Either in whole or in part (as in having only certain functions in parallel). I will try to do some digging today to find out more, but if anyone knows off-hand I’d be interested to know.

I don’t really have experience using parallelism in GDAL. There does seem to be at least some support, e.g. gdalwarp has -multi and -wo with NUM_THREADS. These options can also be set while using the C API.

Unfortunately, thread safety of GDAL isn’t guaranteed, and is dependent on the driver (as most things are). It is often easier to limit yourself to a few drivers (e.g. geopackage, geojson, tiff, shapefiles, etc) – to my knowledge, that’s the approach taken by packages such as rasterio.

Great find, thanks. As a more broad followup to my original post, would it make any sense to attempt a pure Julia version of GDAL (say, JDAL)? Or would that be a fool’s errand? I am wondering if there would be enough to gain to justify the amount of work it would take, specifically in areas like: ease of use, integration with Julia types/code style/etc, parallelization, GPU support, or distributed computing.

Good question. I doubt that implementing a GDAL replacement in julia is worth the effort. I can only report from my subjective point of view, but I’m only using it for input/output. Probably it makes no sense to rewrite this boring i/o code.