Tag Archives: Python

Convert Sentinel-2 Data Using GDAL

The latest version of GDAL (2.1) has a driver to read Sentinel 2 data (see http://gdal.org/frmt_sentinel2.html). Like HDF files they are read as subdatasets. Running ‘gdalinfo’ on the zipped folder or the .xml file contained within the .SAFE directory will display all the subdatasets, as well as all the metadata (so quite a lot of information!).

As with HDF5 you can pass the subdataset names to gdalinfo to get more information or gdal_translate to extract them as a separate dataset.

To make it easier to extract all the subdatasets I wrote a script (extract_s2_data.py) which can be downloaded from the https://bitbucket.org/petebunting/rsgis_scripts repository.

The script will get a list of all subdatasets using GDAL:

from osgeo import gdal
dataset = gdal.Open('S2/S2.xml', gdal.GA_ReadOnly)
subdatasets = dataset.GetSubDatasets()
dataset = None

The ones to be extracted are for the 10, 20 and 60 m resolution band groups for each UTM zone (if the file crosses multiple zones).

For each subdataset it will give an output name, replacing the EPSG code with the UTM zone and ‘:’ with ‘_’.

Then the gdal_translate command is used to create a new file for each. By default the output is KEA format, called using subprocess.

To run the script first install GDAL 2.1, the conda-forge channel has recent builds, to install them using conda:

conda create -n gdal2 -c conda-forge gdal
source activate gdal2

(If you are on Windows leave out ‘source’)

To extract all subdatasets from a zipped Sentinel 2 scene to the current directory you can then use:

extract_s2_data.py -o . \

The gdal_translate command used is printed to the screen.

The default output format is KEA, you can change using the ‘–of’ flag. For example to convert an unzipped scene to GeoTiff:

extract_s2_data.py -o . --of GTiff \

To get the extension for all supported drivers, and some creation options the ‘get_gdal_drivers’ module from arsf_dem_scripts is optionally used. You can just download this file and copy into the same directory ‘extract_s2_data.py’ has been saved to. For Linux or OS X you can run:

# OS X
curl https://raw.githubusercontent.com/pmlrsg/arsf_dem_scripts/master/arsf_dem/get_gdal_drivers.py > get_gdal_drivers.py

# Linux
wget https://raw.githubusercontent.com/pmlrsg/arsf_dem_scripts/master/arsf_dem/get_gdal_drivers.py

Create a CSV with the coordinates from geotagged photos

Recently I took a lot of photos on my phone during fieldwork (survey for the NERC-ARF LiDAR and hyperspectral calibration flights) and wanted to extract the GPS coordinates from each photo so I could load them into QGIS using the add delimited text dialogue. The aim wasn’t to have precise locations for each photo (GPS positions for each of the points surveyed were recorded separately) but to give a quick idea of the locations we’d visited before the main GPS data were processed.

I while ago (2012!) I wrote a script for this task. The script (CreateJPEGKMZ) requires pillow and the imagemagick command line tools.

It code isn’t particularly tidy (despite my recent updates) but it basically pulls out the GPS location from the EXIF tags and writes this to a CSV file. To create a KMZ a thumbnail image is created and a corresponding KML file. Both the thumbnail and KML are then zipped together to generate the KMZ which can be opened in GoogleEarth.

To use the script to write a CSV file:

CreateJPEGKMZ.py --outcsv gps_points.csv input_jpeg_files

To also create KMZ files for each photo:

CreateJPEGKMZ.py --outcsv gps_points.csv \
                 --outkmz output_kmz_files 

This was also a good lesson in the benefits of having scripts in version control and publicly available.

Working with hyperspectral data in ENVI BIL format using Python

For working with ENVI files I normally use GDAL as code can then be applied to different formats. However, there are a couple of limitations with GDAL when working with hyperspectral data in ENVI format:

  1. GDAL doesn’t copy every item from the header file to a new header file if they don’t fit in with the GDAL data model. Examples are FWHM and comments. Sometimes extra attributes are copied to the aux.xml file GDAL creates, these files aren’t read by ENVI or other programs based on IDL (e.g., ATCOR).
  2. For data stored Band Interleaved by Line (BIL) rather than Band Sequential (BSQ) reading and writing a band at a time is inefficient as it is necessary to keep jumping around the file

To overcome these issues NERC-ARF-DAN use their own Python functions for reading / writing header files and loading BIL files a line at a time. These functions have been tidied up and released through the NERC-ARF Tools repository on GitHub (https://github.com/pmlrsg/arsf_tools). The functions depend on NumPy.

To install them it is recommended to use the following steps:

  1. Download miniconda from http://conda.pydata.org/miniconda.html#miniconda and follow the instructions to install.
  2. Open a ‘Command Prompt’ / ‘Terminal’ window and install numpy by typing:
    conda install numpy
  3. Download ‘arsf_tools’ from GitHub (https://github.com/pmlrsg/arsf_tools/archive/master.zip)
  4. Unzip and within a ‘Command Prompt’ or ‘Terminal’ window navigate to the the folder using (for example):
    cd Downloads\arsf_tools-master
  5. Install the tools and library by typing:
    python setup.py install

Note, if you are using Linux you can install the arsf_binary reader from https://github.com/arsf/arsf_binaryreader which is written in C++. The ‘arsf_envi_reader’ module will import this if available as it is faster than the standard NumPy BIL reader.

If you are a UK based researcher with access to the JASMIN system the library is already installed and can be loaded using:

module load contrib/arsf/arsf_tools

An simple example of reading each line of a file, adding 1 to every band and writing back out again is:

from arsf_envi_reader import numpy_bin_reader
from arsf_envi_reader import envi_header

# Open file for output
out_file = open("out_file.bil", "w")

# Open input file
in_data = numpy_bin_reader.BilReader("in_file.bil")

for line in in_data:
out_line = line + 1

# Copy header

# Close files
in_data = None

A more advanced example is applying a Savitzky-Golay filter to each pixel. As the filter requires every band for each pixel it is efficient to work with BIL files.

For developing algorithms using spatial data, in particular multiple files it is recommended to convert the files to another band-sequential format using the ‘gdal_translate’ so they can be read using programs which use GDAL, for example RIOS or RSGISLib.

To convert files from ENVI BIL to ENVI BSQ and copy all header attributes the ‘envi_header’ module can be used after converting the interleave with GDAL. For example:

import os
import subprocess
from arsf_envi_reader import envi_header

# Set input and output image (could get with argparse)
input_image = "input.bil"
output_image = "output_bsq.bsq"

# Get interleave from file extension
output_interleave = os.path.splitext(output_image)[-1].lstrip(".")

# 1. Convert interleave
print("Converting interleave to {}".format(output_interleave.upper()))
gdal_translate_cmd = ["gdal_translate",
                      "-of", "ENVI",
                      "-co", "INTERLEAVE={}".format(output_interleave.upper())]
gdal_translate_cmd.extend([input_image, output_image])

# 2. Copy header (GDAL doesn't copy all items)
# Find header files
input_header = envi_header.find_hdr_file(input_image)
output_header = envi_header.find_hdr_file(output_image)

# Read input header to dictionary
input_header_dict = envi_header.read_hdr_file(input_header)

# Change interleave
input_header_dict["interleave"] = output_interleave.upper()

# Write header (replace one generated by GDAL)
envi_header.write_envi_header(output_header, input_header_dict)

Add all scripts within a repository to $PATH using envmaster

I have a couple of general scripts repositories for myself and shared with colleagues. These are for scripts which are useful but don’t fit into existing projects or justify having their own repository. An example is rsgis_scripts on bitbucket. The scripts are split into different directories which aren’t available on the main path. To make them available when needed I use EnvMaster (described in a previous post). I have an envmaster module which will search the repository for folders containing executables and add them to $PATH. It will also add folders containing ‘__init__.py’ to $PYTHONPATH.

import os
import glob
REPO_PATH = "/home/dan/Documents/Development/rsgis_scripts"
# Walk through directory
for d_name, sd_name, f_list in os.walk(REPO_PATH):
    # Ignore hidden directories
    if not d_name.startswith(".") and not ".git" in d_name and not ".hg" in d_name:
        file_list = glob.glob(os.path.join(d_name,"*"))
        # Check if a directory contains executable files
        if True in [(os.path.isfile(f) and os.access(f, os.X_OK)) for f in file_list]:
        # Check for Python libraries
        if len(glob.glob(os.path.join(d_name,"*","__init__.py"))) > 0:

The script is saved as ‘rsgis_scripts’ in ‘$ENVMASTERPATH’.
To load the module and prepend all the folders to ‘$PATH’ use:

envmaster load rsgis_scripts

They will remain on $PATH until you close the terminal or unload the module using:

envmaster unload rsgis_scripts

This script is just an example, it would be easy to modify for different use cases. If you have multiple repositories a module file can be created for each one.

Convert EASE-2 grid cell to latitude and longitude using Python

The EASE-2 grid is used for a number of NASA datasets including SMAP. It is described in the following paper:

Brodzik, M. J., Billingsley, B., Haran, T., Raup, B., & Savoie, M. H. (2012). EASE-Grid 2.0: Incremental but Significant Improvements for Earth-Gridded Data Sets. ISPRS International Journal of Geo-Information, 1(3), 32–45. http://doi.org/10.3390/ijgi1010032

Files with the centre coordinate of each cell for EASE-2 grids at different resolutions are available from https://nsidc.org/data/ease/tools.html as well as tools for conversion.

To read these files into Python the following steps can be used:

  1. Download the relevant gridloc file.

    The FTP link for the grid location files is: ftp://sidads.colorado.edu/pub/tools/easegrid2/

    For this example I’ve chosen the 36km cylindrical EASE-2 grid (gridloc.EASE2_M36km.tgz)

  2. Un-tar using:
    tar -xvf gridloc.EASE2_M36km.tgz
  3. Read the files into Python:
    import numpy
    # Read binary files and reshape to correct size
    # The number of rows and columns are in the file name
    lats = numpy.fromfile('EASE2_M36km.lats.964x406x1.double', 
    lons = numpy.fromfile('EASE2_M36km.lons.964x406x1.double', 
    # Extract latitude and longitude
    # for a given row and column 
    grid_row = 46
    grid_column = 470
    lat_val = lats[grid_row, grid_column]
    lon_val = lons[grid_row, grid_column]

Calculations on large Raster Attribute Tables using ratapplier

The RIOS Library (http://rioshome.org) provides two methods of manipulating columns within a Raster Attribute Table (RAT):

  1. The ‘rat’ module can read an entire column to memory.
    from osgeo import gdal
    from rios import rat
    # Open RAT dataset
    rat_dataset = gdal.Open("clumps.kea", gdal.GA_Update)
    # Get columns with average red and NIR for each object
    red = rat.readColumn(rat_dataset, "RedAvg")
    nir = rat.readColumn(rat_dataset, "NIR1Avg")
    ndvi = (nir - red) / (nir + red)
    # Write out column
    rat.writeColumn(rat_dataset, "NDVI", ndvi)
    # Close RAT dataset
    rat_dataset = None
  2. The newer ‘ratapplier’ module is modelled after the ‘applier’ module for images and allows a function to be applied to chunks of rows, making it particularly useful for a RAT which is too large to load to memory.
    from rios import ratapplier
    def _ratapplier_calc_ndvi(info, inputs, outputs):
        Calculate NDVI from RAT.
        Called by ratapplier
        # Get columns with average red and NIR for each object
        # within block
        red = getattr(inputs.inrat, "RedAvg")
        nir = getattr(inputs.inrat, "NIR1Avg")
        # Calculate NDVI
        ndvi = (nir - red) / (nir + red)
        # Save to 'NDVI' column (will create if doesn't exist)
        setattr(outputs.outrat,"NDVI", ndvi)
    if __name__ == "__main__":
        # Set up rat applier for input / output
        in_rats = ratapplier.RatAssociations()
        out_rats = ratapplier.RatAssociations()
        # Pass in clumps file
        # Same file is used for input and output to write
        # to existing RAT
        in_rats.inrat = ratapplier.RatHandle("clumps.kea")
        out_rats.outrat = ratapplier.RatHandle("clumps.kea")
        # Apply function to all rows in chunks
        ratapplier.apply(_ratapplier_calc_ndvi, in_rats, out_rats)

Although using ratapplier looks slightly more complicated at first writing scripts to use it rather than the ‘rat’ interface means they will scale much better to larger datasets.

Both these examples are available to download from https://bitbucket.org/snippets/danclewley/7E666

Clip and Reproject an image to a master file using RSGISLib and GDAL

Nathan Thomas

Through my research I handle a plethora of raster datasets, each with a variety of resolutions and projections that I have to clip, reproject and resample to a set of consistent parameters, derived from a ‘master’ dataset. This would usually require multiple processing steps and a series of inputs, such as the extents of the image and the input projection (possibly a WKT file). The greater the number of inputs that are required the greater the chance that exists of an error occurring.

Here is an alternative method using a simple combination of RSGISLib and GDAL, which requires very little user input. This method relieves the user from having to retrieve the parameters of the input dataset from the image file, if these parameters are unknown. This is particularly useful if a user wishes to clip a dataset at a series of different study sites, each with a different spatial extent and possibly even different projection. This method is aimed at being simple to implement and would suit a beginner in Python looking to get multiple input datasets to a set of standard parameters, without requiring the knowledge to open an image and retrieve the required extent, resolution and projection.

Create Copy Image

The first step implements a command in RSGISLib to create a blank copy of a master dataset, populated with 0 values. This creates a blank image of a single value that has the parameters of the input file such as projection information, extent and resolution. The number of bands can also be specified so that a 7-band image can be created with the parameters from a single band image input. The RSGISLib command is as follows:

import rsgislib

inMask = 'inMaskImage.kea'
outName = 'outImage.kea'
numberOfBands = 1
GDALformat = 'KEA'
GDALtype = rsgislib.TYPE_32FLOAT

rsgislib.imageutils.createCopyImage(inMask, outName, \
 numberOfBands, 0, GDALformat, GDALtype)

GDAL Reproject Image

The second step implements a GDAL command to open a second input dataset (the one to clip) as read only and effectively populate the blank image created in step one with the values from the second input dataset. The output of this is a clipped image with the parameters of the master input file. The process is as follows:

from osgeo import gdal

ClipFile = 'largeRaster.kea'
outName = 'outImage.kea'
interpolationMethod = gdal.GRA_CubicSpline

inFile = gdal.Open(ClipFile, gdal.GA_ReadOnly)
outFile = gdal.Open(outName, gdal.GA_Update)

gdal.ReprojectImage(inFile, outFile, None, None, \


The script is simple to implement and requires a number of inputs:

  • ‘-m’ Input Mask – The input file with the desired parameters
  • ‘-I’ Input Image – The image you wish to clip
  • ‘-o’ Output Image – The output image
  • ‘-n’ Number of Bands – The number of bands that the clipped scene requires (ie, to clip an RGB dataset the number of bands would be 3)
  • ‘-of’ GDAL format – The format of the output image (i.e. KEA)

An example of the implementation may look like:

python CutReproject.py –m N00E103_HH.kea \
        –i GlobalSRTMmosaic.kea \
        –o N00E103_SRTM.kea –n 1 –of KEA

CutReproject.py is available to download from bitbucket.org/nathanmthomas/bucket-of-rs-and-gis-scripts/