Loading MODIS HDF4 files with ArchGDAL

I would like to port to Julia some code that we have that works on MODIS satellite imagery. The source files are in HDF4 format, and are correctly recognized as MODIS datasets by the system gdalinfo. However, if I try loading them in Julia using ArchGDAL I get the error

AG.read("MOD03.A2019309.1035.061.NRT.hdf")
ERROR: GDALError (CE_Failure, code 4):
        `MOD03.A2019309.1035.061.NRT.hdf' not recognized as being in a supported file format.

My understanding is that this is coming directly from the GDAL library used by ArchGDAL, which is apparently compiled without HDF4 support (indeed, find(x -> startswith(x, "HDF"), AG.listdrivers()) only lists HDF5).

So, assuming I would like to have ArchGDAL come with HDF4 support, what would be my best course of action? Would it be possible to have my ArchGDAL be built against my system gdal so that the supported systems match?

Alternatively, since I’ve found that the feature has already been requested for GDAL.jl (which, IIUC, ArchGDAL depends on) Compile with HDF4 driver · Issue #84 · JuliaGeo/GDAL.jl · GitHub and a PR has been drafted a few months ago
[GDAL] Add HDF4 support to GDAL by asinghvi17 · Pull Request #9377 · JuliaPackaging/Yggdrasil · GitHub
what would be the best approach to work with a locally patched GDAL, pending an upstream integration of that same patch?

You may have a try with RemoteS. It also uses GDAL under the hood.

Edit: Oh, I read too quickly the HDF4 part. My local GDAL build has HDF4 support by I guess I never tried it with the Julia artifacts.

Yeah, the HDF4 thing is turning out to be surprisingly complicated.

There’s ongoing work here [GDAL] Bump v3.10.2 by jeremiahpslewis · Pull Request #10534 · JuliaPackaging/Yggdrasil · GitHub and apparently it has recently emerged that there were odd build interactions between HDF4, NetCDF and GDAl, see also HDF4: Build without NetCDF support by eschnett · Pull Request #10712 · JuliaPackaging/Yggdrasil · GitHub

So progress is slow, but is happening. I guess it’s just a matter of patience on my side :sunglasses:

The GDAL build now supports HDF4! Since this PR the GDAL_jll.jl starting from v302.1000.200 supports it. I just verified I can open a MODIS file with ArchGDAL.jl.

5 Likes

I just got an email from the GitHub thread and was just setting up to test, and then saw this message. I’m quite happy :sunglasses:

Thanks for pushing this @visr. Would you maybe share your code? I tried to access MODIS data, I can open the file with ArchGDAL,

const AG = ArchGDAL
filename = "MYD11C1.A2005352.006.2015269073039.hdf"
ds = AG.read(filename)
GDAL Dataset (Driver: HDF4/Hierarchical Data Format Release 4)
File(s): 
 MYD11C1.A2005352.006.2015269073039.hdf

but it does not seem to contain any raster layers.

AG.nlayer(ds), AG.nraster(ds)
(0, 0)

When opening the same file using PyNio it can see several layers of raster data. Can you confirm that you can actually extract data from your modis file?

Oh I didn’t try to access the data. Just tried your example with my MODIS file and I got the same…

1 Like

Ok I investigated a bit. The problem is that the different bands are stored in subdatasets (or groups) of the HDF4 file, so one can not open them globally. In python the following works:

from osgeo import gdal
hdf_ds = gdal.Open(filename, gdal.GA_ReadOnly)
hdf_ds.GetSubDatasets()

This returns a list of all datasets in the file with a strange path, e.g. ('HDF4_EOS:EOS_GRID:"MYD11C1.A2003001.006.2015180232737.hdf":MODIS_CMG_3MIN_LST:LST_Day_CMG', '[3600x7200] LST_Day_CMG MODIS_CMG_3MIN_LST (16-bit unsigned integer)')

so I copy-pasted the dataset path into julia and ArchGDAL could happily open it and find the respective bands:

import ArchGDAL as AG
AG.read("HDF4_EOS:EOS_GRID:\"MYD11C1.A2003001.006.2015180232737.hdf\":MODIS_CMG_3MIN_LST:LST_Day_CMG")

worked well and contained a data layer. The big question is now how we make a getsubdatasets function for ArchGDAL so we can automatize the process. The python definition seems to be here gdal/swig/include/python/gdal_python.i at 40758a7f433807eb3a263bbf4a213696dbb56026 · OSGeo/gdal · GitHub . I played a bit with ArchGDALs getmetadata to extract the information but was not lucky so far. Maybe someone can help?

1 Like

You can do the same in ArchGDAL, although it’s a bit more convoluted (which is why I wrote auxiliary functions to handle this).

You can fetch the sub-dataset metadata with

ArchGDAL.metadata(ds, domain="SUBDATASETS")

This will give you a vector of strings SUBDATASET_x_NAME=that weird path and SUBDATASET_x_DESC=the description. With a bit of string processing you can get the path from the ...NAME=path.

If I find the time to do it, I’ll see if something based on my idea can be merged into ArchGDAL. See also Any friendly way to access subdatasets? (SENTINEL2 driver) · Issue #463 · yeesian/ArchGDAL.jl · GitHub

With GMT, I do

using GMT
println(gdalinfo("C:\\v\\S2A_MSIL1C_20151129T111422_N0204_R137_T29SPB_20151129T111510/MTD_MSIL1C.xml"))
Driver: SENTINEL2/Sentinel 2
Files: C:\v\S2A_MSIL1C_20151129T111422_N0204_R137_T29SPB_20151129T111510/MTD_MSIL1C.xml
Size is 512, 512
Metadata:
  CLOUD_COVERAGE_ASSESSMENT=0.0
  DATATAKE_1_DATATAKE_SENSING_START=2015-11-29T11:14:22.031Z
  DATATAKE_1_DATATAKE_TYPE=INS-NOBS
  DATATAKE_1_ID=GS2A_20151129T111422_002279_N02.04
  DATATAKE_1_SENSING_ORBIT_DIRECTION=DESCENDING
  DATATAKE_1_SENSING_ORBIT_NUMBER=137
  DATATAKE_1_SPACECRAFT_NAME=Sentinel-2A
  DEGRADED_ANC_DATA_PERCENTAGE=0
  DEGRADED_MSI_DATA_PERCENTAGE=0
  FOOTPRINT=POLYGON((-7.871745925328026 37.289664369690094, -7.8579147835293 37.33802175206354, -7.815566478350036 37.48551052935225, -7.773513596570973 37.63306466636661, -7.731082734995071 37.78040091942396, -7.68886930502554 37.92777644136925, -7.685484708197319 37.93944169476536, -6.613057842445833 37.92337349546603, -6.644267630540426 36.93451075320804, -7.876819685159111 36.952569122489834, -7.871745925328026 37.289664369690094))
  FORMAT_CORRECTNESS_FLAG=PASSED
  GENERAL_QUALITY_FLAG=PASSED
  GENERATION_TIME=2015-11-29T11:15:10.000000Z
  GEOMETRIC_QUALITY_FLAG=PASSED
  PREVIEW_GEO_INFO=Not applicable
  PREVIEW_IMAGE_URL=Not applicable
  PROCESSING_BASELINE=02.04
  PROCESSING_LEVEL=Level-1C
  PRODUCT_START_TIME=2015-11-29T11:14:22.031Z
  PRODUCT_STOP_TIME=2015-11-29T11:14:22.031Z
  PRODUCT_TYPE=S2MSI1C
  PRODUCT_URI=S2A_MSIL1C_20151129T111422_N0204_R137_T29SPB_20151129T111510.SAFE
  QUANTIFICATION_VALUE=10000
  RADIOMETRIC_QUALITY_FLAG=PASSED
  REFERENCE_BAND=B1
  REFLECTANCE_CONVERSION_U=1.02676089964936
  SENSOR_QUALITY_FLAG=PASSED
  SPECIAL_VALUE_NODATA=0
  SPECIAL_VALUE_SATURATED=65535
Subdatasets:
  SUBDATASET_1_NAME=SENTINEL2_L1C:C:\v\S2A_MSIL1C_20151129T111422_N0204_R137_T29SPB_20151129T111510/MTD_MSIL1C.xml:10m:EPSG_32629
  SUBDATASET_1_DESC=Bands B2, B3, B4, B8 with 10m resolution, UTM 29N
  SUBDATASET_2_NAME=SENTINEL2_L1C:C:\v\S2A_MSIL1C_20151129T111422_N0204_R137_T29SPB_20151129T111510/MTD_MSIL1C.xml:20m:EPSG_32629
  SUBDATASET_2_DESC=Bands B5, B6, B7, B8A, B11, B12 with 20m resolution, UTM 29N
  SUBDATASET_3_NAME=SENTINEL2_L1C:C:\v\S2A_MSIL1C_20151129T111422_N0204_R137_T29SPB_20151129T111510/MTD_MSIL1C.xml:60m:EPSG_32629
  SUBDATASET_3_DESC=Bands B1, B9, B10 with 60m resolution, UTM 29N
  SUBDATASET_4_NAME=SENTINEL2_L1C:C:\v\S2A_MSIL1C_20151129T111422_N0204_R137_T29SPB_20151129T111510/MTD_MSIL1C.xml:TCI:EPSG_32629
  SUBDATASET_4_DESC=True color image, UTM 29N
Corner Coordinates:
Upper Left  (    0.0,    0.0)
Lower Left  (    0.0,  512.0)
Upper Right (  512.0,    0.0)
Lower Right (  512.0,  512.0)
Center      (  256.0,  256.0)

and than read the subdataset with

I = gdalread("SENTINEL2_L1C:C:\\v\\S2A_MSIL1C_20151129T111422_N0204_R137_T29SPB_20151129T111510/MTD_MSIL1C.xml:TCI:EPSG_32629")
A GMTimage object with 3 bands of type UInt8
        B4, central wavelength 665 nm
        B3, central wavelength 560 nm
        B2, central wavelength 490 nm
Pixel node registration used
x_min: 600000.0 x_max :709800.0 x_inc :10.0     n_columns :10980
y_min: 4.0902e6 y_max :4.2e6    y_inc :10.0     n_rows :10980
z_min: 0.0      z_max :255.0
Mem layout:     TRBa
PROJ: +proj=utm +zone=29 +datum=WGS84 +units=m +no_defs

Already tried this but at least for my dataset this is not working, AG.metadata(ds, domain="SUBDATASETS") returns String[]. No idea what I am doing wrong.

Ok, got it. For my file the metadata domains were named differently

AG.metadatadomainlist(ds)
2-element Vector{String}:
 ""
 "DERIVED_SUBDATASETS"

So when you prepare a PR for ArchGDAL it would be great if the convenience function would programmatically check metadata domains and also query DERIVED_SUBDATASETS if necessary.

@fabiangans thanks for the heads up. Does the metadata in that domain still present itself as

SUBDATASET_1_NAME=blahblah
SUBDATASET_1_DESC=blahblah

or does it have different key names?

Actually this is not an issue anymore. I got a bit confused and requested subdatasets of a subdataset, which returned this metadata domain. So for my use cases, listing only SUBDATASETS would be sufficient and a big step forward.