The latest version of GDAL (2.1) has a driver to read Sentinel 2 data (see http://gdal.org/frmt_sentinel2.html). Like HDF files they are read as subdatasets. Running ‘gdalinfo’ on the zipped folder or the .xml file contained within the .SAFE directory will display all the subdatasets, as well as all the metadata (so quite a lot of information!).
As with HDF5 you can pass the subdataset names to gdalinfo to get more information or gdal_translate to extract them as a separate dataset.
The script will get a list of all subdatasets using GDAL:
from osgeo import gdal dataset = gdal.Open('S2/S2.xml', gdal.GA_ReadOnly) subdatasets = dataset.GetSubDatasets() dataset = None
The ones to be extracted are for the 10, 20 and 60 m resolution band groups for each UTM zone (if the file crosses multiple zones).
For each subdataset it will give an output name, replacing the EPSG code with the UTM zone and ‘:’ with ‘_’.
Then the gdal_translate command is used to create a new file for each. By default the output is KEA format, called using subprocess.
To run the script first install GDAL 2.1, the conda-forge channel has recent builds, to install them using conda:
conda create -n gdal2 -c conda-forge gdal source activate gdal2
(If you are on Windows leave out ‘source’)
To extract all subdatasets from a zipped Sentinel 2 scene to the current directory you can then use:
extract_s2_data.py -o . \ S2A_OPER_PRD_MSIL1C_PDMC_20151201T144038_R010_V20151130T142545_20151130T142545.zip
The gdal_translate command used is printed to the screen.
The default output format is KEA, you can change using the ‘–of’ flag. For example to convert an unzipped scene to GeoTiff:
extract_s2_data.py -o . --of GTiff \ S2A_OPER_PRD_MSIL1C_PDMC_20151201T144038_R010_V20151130T142545_20151130T142545.SAFE
To get the extension for all supported drivers, and some creation options the ‘get_gdal_drivers’ module from arsf_dem_scripts is optionally used. You can just download this file and copy into the same directory ‘extract_s2_data.py’ has been saved to. For Linux or OS X you can run:
# OS X curl https://raw.githubusercontent.com/pmlrsg/arsf_dem_scripts/master/arsf_dem/get_gdal_drivers.py > get_gdal_drivers.py # Linux wget https://raw.githubusercontent.com/pmlrsg/arsf_dem_scripts/master/arsf_dem/get_gdal_drivers.py