I would like to port to Julia some code that we have that works on MODIS satellite imagery. The source files are in HDF4 format, and are correctly recognized as MODIS datasets by the system gdalinfo. However, if I try loading them in Julia using ArchGDAL I get the error
AG.read("MOD03.A2019309.1035.061.NRT.hdf")
ERROR: GDALError (CE_Failure, code 4):
`MOD03.A2019309.1035.061.NRT.hdf' not recognized as being in a supported file format.
My understanding is that this is coming directly from the GDAL library used by ArchGDAL, which is apparently compiled without HDF4 support (indeed, find(x -> startswith(x, "HDF"), AG.listdrivers()) only lists HDF5).
So, assuming I would like to have ArchGDAL come with HDF4 support, what would be my best course of action? Would it be possible to have my ArchGDAL be built against my system gdal so that the supported systems match?
The GDAL build now supports HDF4! Since this PR the GDAL_jll.jl starting from v302.1000.200 supports it. I just verified I can open a MODIS file with ArchGDAL.jl.
Thanks for pushing this @visr. Would you maybe share your code? I tried to access MODIS data, I can open the file with ArchGDAL,
const AG = ArchGDAL
filename = "MYD11C1.A2005352.006.2015269073039.hdf"
ds = AG.read(filename)
GDAL Dataset (Driver: HDF4/Hierarchical Data Format Release 4)
File(s):
MYD11C1.A2005352.006.2015269073039.hdf
but it does not seem to contain any raster layers.
AG.nlayer(ds), AG.nraster(ds)
(0, 0)
When opening the same file using PyNio it can see several layers of raster data. Can you confirm that you can actually extract data from your modis file?
Ok I investigated a bit. The problem is that the different bands are stored in subdatasets (or groups) of the HDF4 file, so one can not open them globally. In python the following works:
from osgeo import gdal
hdf_ds = gdal.Open(filename, gdal.GA_ReadOnly)
hdf_ds.GetSubDatasets()
This returns a list of all datasets in the file with a strange path, e.g. ('HDF4_EOS:EOS_GRID:"MYD11C1.A2003001.006.2015180232737.hdf":MODIS_CMG_3MIN_LST:LST_Day_CMG', '[3600x7200] LST_Day_CMG MODIS_CMG_3MIN_LST (16-bit unsigned integer)')
so I copy-pasted the dataset path into julia and ArchGDAL could happily open it and find the respective bands:
import ArchGDAL as AG
AG.read("HDF4_EOS:EOS_GRID:\"MYD11C1.A2003001.006.2015180232737.hdf\":MODIS_CMG_3MIN_LST:LST_Day_CMG")
You can do the same in ArchGDAL, although it’s a bit more convoluted (which is why I wrote auxiliary functions to handle this).
You can fetch the sub-dataset metadata with
ArchGDAL.metadata(ds, domain="SUBDATASETS")
This will give you a vector of strings SUBDATASET_x_NAME=that weird path and SUBDATASET_x_DESC=the description. With a bit of string processing you can get the path from the ...NAME=path.
Already tried this but at least for my dataset this is not working, AG.metadata(ds, domain="SUBDATASETS") returns String[]. No idea what I am doing wrong.
So when you prepare a PR for ArchGDAL it would be great if the convenience function would programmatically check metadata domains and also query DERIVED_SUBDATASETS if necessary.
Actually this is not an issue anymore. I got a bit confused and requested subdatasets of a subdataset, which returned this metadata domain. So for my use cases, listing only SUBDATASETS would be sufficient and a big step forward.