As NASA’s preferred file format a lot of remote sensing data is now delivered in HDF5 format. However, as there are multiple ways of storing geospatial information within HDF5 reading the data with GDAL often requires some effort. Luckily with the h5py and GDAL Python libraries converting to another GDAL format is easy.
The instructions here are for reading Synthetic Aperture Radar (SAR) data from the AirMOSS mission, a polarimetric P-band system. More details about the AirMOSS mission and links to download data are available from airmoss.jpl.nasa.gov (registration required to download data). However, the general process should be similar for most HDF5 formats.
- Import file
- Get some information about data
- Extract data
- Get geospatial information
- Create new GDAL file
- Copy data to new file
import h5py inHDF5File = "indata.h5" # Read in data indata = h5py.File(inHDF5File)
# List keys for key in indata.keys(): print(key) # List items for item in indata.items(): print(item) # List attributes for att in indata.attrs: print(att)
hhData = indata['HHHH'] vvData = indata['VVVV'] hvData = indata['HVHV']
For AirMOSS data the geospatial information are stored as attributes in the HDF5 file and are read using ‘indata.attrs’
# Set image size inXSize = hhData.shape[1] inYSize = hhData.shape[0] # Get coordinates minLon = indata.attrs['northwest longitude'] maxLat = indata.attrs['northwest latitude'] # Get pixel resolution pixelX = indata.attrs['longitude spacing'] pixelY = indata.attrs['latitude spacing']
You can select any format GDAL can write to as the output format, I’ve chosen to stick with HDF5 and go with the KEA format (which provides an implementation of the full GDAL specification).
# Set Geospatial Information geoTransform = [] for i in range(6): geoTransform.append(0.0) geoTransform[0] = minLon # top left x geoTransform[1] = pixelX geoTransform[2] = 0 geoTransform[3] = maxLat # top left y geoTransform[4] = 0 geoTransform[5] = pixelY srs = osr.SpatialReference() srs.SetWellKnownGeogCS("WGS84") # Creat output image numBands = 3 gdalDriver = getGDALFormatFromExt(outFileName) driver = gdal.GetDriverByName(gdalDriver) metadata = driver.GetMetadata() newDataset = driver.Create(outFileName, inXSize, inYSize, numBands, gdal.GDT_Float32) newDataset.SetGeoTransform(geoTransform) newDataset.SetProjection(srs.ExportToWkt())
newDataset.GetRasterBand(1).WriteArray(hhData) newDataset.GetRasterBand(1).SetDescription('HH') # Close dataset newDataset = None
The data should now be readable in anything which can read the output format you selected (e.g., TuiView.)
The full script can be downloaded from https://github.com/MiXIL/AirMOSS-GDAL-Utilities/, there are some differences in the final script, a loop is used to read and write each band, rather than loading all the data to memory and the no data value of -9999 is changed to 0.