Tag Archives: GDAL

Find scenes from an archive which contain a point

This is a quick way of locating which scenes, from an archive of data, contain a point you are interested in.

First make a list of all available scenes

ls data/S1/2017/??/??/*/*vv*img > all_scenes.txt

Then use gdalbuildvrt to make a VRT file containing all scenes as a separate band (assumes scenes only have a single band).

gdalbuildvrt -input_file_list all_scenes.txt \
             -separate -o s1_all_scenes.vrt

Use gdallocation info with ‘-lifonly’ flag to the scene the point we are interested in (1.5 N, 103 E) is within and redirect to a text file.

gdallocationinfo -lifonly -wgs84 s1_all_scenes.vrt \
                 103 1.5 > selected_scenes.txt

This will show if within the bounding box of the scene. To find scenes which have data can use gdallocationinfo again but with the ‘-valonly’ flag and save the values to a text file.

gdallocationinfo -valonly -wgs84 s1_all_scenes.vrt \
                  103 1.5 > selected_scenes_values.txt

Can then subset the original list of files with the values file using Python:

import numpy
# Read in data to numpy arrays
scenes = numpy.genfromtxt("selected_scenes.txt", dtype=str)
values = numpy.genfromtxt("selected_scenes_values.txt")

# Select only scenes where the value is not 0
scenes_data = scenes[values != 0]

# Save to text file
numpy.savetxt("selected_scenes_data.txt", scenes_data, fmt="%s")

If you have a very large archive of data and need to find which scenes intersect a point often, having a spatial database with the scene outlines would be a better approach. However, if it isn’t something you do often this quick approach using only the GDAL utilities and a bit of Python is worth knowing.

Convert Sentinel-2 Data Using GDAL

The latest version of GDAL (2.1) has a driver to read Sentinel 2 data (see http://gdal.org/frmt_sentinel2.html). Like HDF files they are read as subdatasets. Running ‘gdalinfo’ on the zipped folder or the .xml file contained within the .SAFE directory will display all the subdatasets, as well as all the metadata (so quite a lot of information!).

As with HDF5 you can pass the subdataset names to gdalinfo to get more information or gdal_translate to extract them as a separate dataset.

To make it easier to extract all the subdatasets I wrote a script (extract_s2_data.py) which can be downloaded from the https://bitbucket.org/petebunting/rsgis_scripts repository.

The script will get a list of all subdatasets using GDAL:

from osgeo import gdal
dataset = gdal.Open('S2/S2.xml', gdal.GA_ReadOnly)
subdatasets = dataset.GetSubDatasets()
dataset = None

The ones to be extracted are for the 10, 20 and 60 m resolution band groups for each UTM zone (if the file crosses multiple zones).

For each subdataset it will give an output name, replacing the EPSG code with the UTM zone and ‘:’ with ‘_’.

Then the gdal_translate command is used to create a new file for each. By default the output is KEA format, called using subprocess.

To run the script first install GDAL 2.1, the conda-forge channel has recent builds, to install them using conda:

conda create -n gdal2 -c conda-forge gdal
source activate gdal2

(If you are on Windows leave out ‘source’)

To extract all subdatasets from a zipped Sentinel 2 scene to the current directory you can then use:

extract_s2_data.py -o . \

The gdal_translate command used is printed to the screen.

The default output format is KEA, you can change using the ‘–of’ flag. For example to convert an unzipped scene to GeoTiff:

extract_s2_data.py -o . --of GTiff \

To get the extension for all supported drivers, and some creation options the ‘get_gdal_drivers’ module from arsf_dem_scripts is optionally used. You can just download this file and copy into the same directory ‘extract_s2_data.py’ has been saved to. For Linux or OS X you can run:

# OS X
curl https://raw.githubusercontent.com/pmlrsg/arsf_dem_scripts/master/arsf_dem/get_gdal_drivers.py > get_gdal_drivers.py

# Linux
wget https://raw.githubusercontent.com/pmlrsg/arsf_dem_scripts/master/arsf_dem/get_gdal_drivers.py

Useful GDAL commands for working with hyperspectral data

The Geospatial Data Abstraction Library (GDAL) contains a number of utilities for working with remote sensing data. The following are some commands which are particularly useful when working with airborne hyperspectral data.

Create subset and change format

To create a band and spatial subset of a larger hyperspectral dataset the following command can be used:

gdal_translate -of GTiff -co COMPRESS=LZW \
 -projwin 481500 766000 482500 765000 \
 -b 190 -b 40 -b 15  \
 outputs/f281013b_utm_wgs84N50_mapped_subsection.bil \
  • -of: Output Format – we are using GeoTiff file. For all available options see http://www.gdal.org/formats_list.html
  • -co: Creation Option – specifies the creation options, these are specific to the format. For GeoTiff we use compression to reduce the file size. For ENVI we can use the creation options to specify the Interleave (BSQ or BIL)
  • -projwin: A rectangle delineating an area to be extracted, in the format min X min Y max X max Y
  • -b: A given band to extract
  • final two arguments: input and output filenames

Create mosaic of multiple flightlines

Often a mosaic of all individual flightlines is required. Due to the large sizes of the mapped files this can often be time consuming. One option is to first make a ‘Virtial Raster’ using the ‘gdalbuildvrt’ command:

gdalbuildvrt -srcnodata 0 outputs/f2810_mapped_mosaic.vrt \
 outputs/f281013b_utm_wgs84N50_mapped_3band.bil \
  • -srcnodata: No data value of input lines
  • first argument: output VRT file
  • final arguments: list of input files

Rather than creating a new file a small .vrt text file is created which contains links to the individual lines.

Many programs which use GDAL (e.g., TuiView) can open this file directly. To make opening the file easier we can do some optimisation. First pre-calculate statistics using:

gdalinfo -stats outputs/f2810_mapped_mosaic.vrt
  • -stats: Calculate statistics and save to .aux.xml file
  • final argument: file name

The gdalinfo command can also be used to print information about a raster. Note for the .vrt file it shows the individual files which are included.

After creating statistics generate overviews (lower resolution copies at different resolutions) using gdaladdo:

gdaladdo -r average outputs/f2810_mapped_mosaic.vrt 2 4 8 16
  • -r average: Use average for resampling overviews
  • first argument: file name
  • numbers: overview levels to build (2x, 4x, 8x and 16x)

We can also make a real raster using gdal_translate (as shown previously).

Extract pixel values for a given location

gdallocationinfo -valonly -geoloc \
 outputs/f281013b_utm_wgs84N50_mapped_subsection.bil \
 481801 765289
  • -valonly: Only print values to the screen
  • -geoloc: Interprerate x and y values as coordinates rather than pixel and line
  • first argument: file name
  • x y: Easting and Northing of pixel to extract statistics from

This will print the values to the screen. To save as a text file we can redirect the output using ‘>’

gdallocationinfo -valonly -geoloc \
 outputs/f281013b_utm_wgs84N50_mapped_subsection.bil \
 481801 765289 > outputs/extracted_values.txt

This text file can be opened in other programs for further analysis.

This post is based on the hyperspectral practical session from the NERC Airborne Survey and Research Facility workshop held at Plymouth Marine Laboratory, March 2016. For more information about the ARSF Data Analysis Node (ARSF-DAN) and details about future training opportunities follow @ARSF_DAN on twitter. Hyperspectral data acquired by ARSF are available to download from NEODC.

Assign a projection to an image from a WKT file, Proj4 string or another image

Sometimes, during the course of data processing even through the spatial information is correct there can be problems with the projection information, either it it lost, not stored in a format the software can understand or is stored slightly differently to other datasets with the same projection. In these situations, the easiest solution is to just copy the correct projection across. Within RSGISLib there are some functions for this within imageutils. For example if you want to copy the projection from an existing image:

from rsgislib import imageutils
inputImage = 'image_without_projection.kea'
refImage = 'image_with_projection.kea'
imageutils.copyProjFromImage(inputImage, refImage)

You can also assign a projection using a Well Known Text (WKT) file (in this example ‘UTM50S.wkt’, stored in the same directory):

from rsgislib import imageutils
inputImage = 'image_without_projection.kea'
imageutils.assignProj(inputImage, wktFile='UTM50S.wkt')

Note: this syntax requires version of RSGISLib 2.3 or later

If you don’t have RSGISLib installed there is a script to assign a projection within PML’s ARSF Tools repository on GitHub which just requires the GDAL Python bindings.

Usage is simple, to assign a projection from a WKT file:

python assign_projection.py --wkt 'UTM50S.wkt' image_without_projection.kea

If you have a lot of files which need the projection assigning you can use:

python assign_projection.py --wkt 'UTM50S' *.kea

This will work under both Unix-like and Windows operating systems.

As well as a WKT file you can also use Proj4 strings, for example:

python assign_projection.py \
 --proj4 '+proj=utm +zone=55 +south +ellps=WGS84 +towgs84=0,0,0,0,0,0,0 +units=m +no_defs' \

A list of WKT files and Proj4 strings are available from spatialreference.org. You can also get the WKT string from an existing dataset using:

gdalinfo image_with_projection.kea

Or a Proj4 string using:

gdalinfo -proj4 image_with_projection.kea

Open zipped raster files in GDAL

A neat trick you can do with GDAL is open zipped files without unzipping them first. The syntax is described on http://trac.osgeo.org/gdal/wiki/UserDocs/ReadInZip

The first step is to find out the contents of the file (if you don’t already know it), you can do this using the following command:

unzip -l f088013b_mapped_osng_bil.zip

You can then run GDAL commands using the following syntax:

gdalinfo /vsizip/f088013b_mapped_osng_bil.zip/f088013b_mapped_osng.bil

The latest version of TuiView (1.1.3) will automatically search inside the zip file for known raster types and pass the correct path to GDAL so you can view zipped files using:

tuiview f088013b_mapped_osng_bil.zip

You can also use for tar.gz files. Again, get a list of the file contents using:

tar -tf LC80430332013291LGN00.tar.gz

Get file information using:

gdalinfo /vsitar/LC80430332013291LGN00.tar.gz/LC80430332013291LGN00_B1.TIF

Note /vsitar/ is used instead of /vsizip/.

Although this is handy a better option is storing data in a format which supports lossless compression. The KEA image format [1] is one example of a GDAL compatible raster format which uses zlib compression and is capable of storing very large datasets.

You can convert a tar.gz’ipped file to the KEA format using:

gdal_translate -of KEA /vsitar/LC80430332013291LGN00.tar.gz/LC80430332013291LGN00_B1.TIF \

[1] Bunting, P and Gillingham, S. 2013. The KEA image file format, Computers & Geosciences, Volume 57. Pages 54-58, ISSN 0098-3004, http://dx.doi.org/10.1016/j.cageo.2013.03.025.

Clip and Reproject an image to a master file using RSGISLib and GDAL

Nathan Thomas

Through my research I handle a plethora of raster datasets, each with a variety of resolutions and projections that I have to clip, reproject and resample to a set of consistent parameters, derived from a ‘master’ dataset. This would usually require multiple processing steps and a series of inputs, such as the extents of the image and the input projection (possibly a WKT file). The greater the number of inputs that are required the greater the chance that exists of an error occurring.

Here is an alternative method using a simple combination of RSGISLib and GDAL, which requires very little user input. This method relieves the user from having to retrieve the parameters of the input dataset from the image file, if these parameters are unknown. This is particularly useful if a user wishes to clip a dataset at a series of different study sites, each with a different spatial extent and possibly even different projection. This method is aimed at being simple to implement and would suit a beginner in Python looking to get multiple input datasets to a set of standard parameters, without requiring the knowledge to open an image and retrieve the required extent, resolution and projection.

Create Copy Image

The first step implements a command in RSGISLib to create a blank copy of a master dataset, populated with 0 values. This creates a blank image of a single value that has the parameters of the input file such as projection information, extent and resolution. The number of bands can also be specified so that a 7-band image can be created with the parameters from a single band image input. The RSGISLib command is as follows:

import rsgislib

inMask = 'inMaskImage.kea'
outName = 'outImage.kea'
numberOfBands = 1
GDALformat = 'KEA'
GDALtype = rsgislib.TYPE_32FLOAT

rsgislib.imageutils.createCopyImage(inMask, outName, \
 numberOfBands, 0, GDALformat, GDALtype)

GDAL Reproject Image

The second step implements a GDAL command to open a second input dataset (the one to clip) as read only and effectively populate the blank image created in step one with the values from the second input dataset. The output of this is a clipped image with the parameters of the master input file. The process is as follows:

from osgeo import gdal

ClipFile = 'largeRaster.kea'
outName = 'outImage.kea'
interpolationMethod = gdal.GRA_CubicSpline

inFile = gdal.Open(ClipFile, gdal.GA_ReadOnly)
outFile = gdal.Open(outName, gdal.GA_Update)

gdal.ReprojectImage(inFile, outFile, None, None, \


The script is simple to implement and requires a number of inputs:

  • ‘-m’ Input Mask – The input file with the desired parameters
  • ‘-I’ Input Image – The image you wish to clip
  • ‘-o’ Output Image – The output image
  • ‘-n’ Number of Bands – The number of bands that the clipped scene requires (ie, to clip an RGB dataset the number of bands would be 3)
  • ‘-of’ GDAL format – The format of the output image (i.e. KEA)

An example of the implementation may look like:

python CutReproject.py –m N00E103_HH.kea \
        –i GlobalSRTMmosaic.kea \
        –o N00E103_SRTM.kea –n 1 –of KEA

CutReproject.py is available to download from bitbucket.org/nathanmthomas/bucket-of-rs-and-gis-scripts/

Converting HDF5 format AirMOSS data to a GDAL image

As NASA’s preferred file format a lot of remote sensing data is now delivered in HDF5 format. However, as there are multiple ways of storing geospatial information within HDF5 reading the data with GDAL often requires some effort. Luckily with the h5py and GDAL Python libraries converting to another GDAL format is easy.

The instructions here are for reading Synthetic Aperture Radar (SAR) data from the AirMOSS mission, a polarimetric P-band system. More details about the AirMOSS mission and links to download data are available from airmoss.jpl.nasa.gov (registration required to download data). However, the general process should be similar for most HDF5 formats.

  1. Import file
  2. import h5py
    inHDF5File = "indata.h5"
    # Read in data
    indata = h5py.File(inHDF5File)
  3. Get some information about data
  4. # List keys
    for key in indata.keys():
    # List items
    for item in indata.items():
    # List attributes
    for att in indata.attrs:
  5. Extract data
  6. hhData = indata['HHHH']
    vvData = indata['VVVV']
    hvData = indata['HVHV']
  7. Get geospatial information
  8. For AirMOSS data the geospatial information are stored as attributes in the HDF5 file and are read using ‘indata.attrs’

    # Set image size
    inXSize = hhData.shape[1]
    inYSize = hhData.shape[0]
    # Get coordinates
    minLon = indata.attrs['northwest longitude'] 
    maxLat = indata.attrs['northwest latitude']
    # Get pixel resolution
    pixelX = indata.attrs['longitude spacing']
    pixelY = indata.attrs['latitude spacing']
  9. Create new GDAL file
  10. You can select any format GDAL can write to as the output format, I’ve chosen to stick with HDF5 and go with the KEA format (which provides an implementation of the full GDAL specification).

    # Set Geospatial Information
    geoTransform = []
    for i in range(6):
    geoTransform[0] = minLon # top left x 
    geoTransform[1] = pixelX
    geoTransform[2] = 0
    geoTransform[3] = maxLat # top left y 
    geoTransform[4] = 0
    geoTransform[5] = pixelY
    srs = osr.SpatialReference()
    # Creat output image
    numBands = 3
    gdalDriver = getGDALFormatFromExt(outFileName)
    driver = gdal.GetDriverByName(gdalDriver)
    metadata = driver.GetMetadata()
    newDataset = driver.Create(outFileName, inXSize, inYSize, numBands, gdal.GDT_Float32)
  11. Copy data to new file
  12. newDataset.GetRasterBand(1).WriteArray(hhData)
    # Close dataset
    newDataset = None

    The data should now be readable in anything which can read the output format you selected (e.g., TuiView.)

    The full script can be downloaded from https://github.com/MiXIL/AirMOSS-GDAL-Utilities/, there are some differences in the final script, a loop is used to read and write each band, rather than loading all the data to memory and the no data value of -9999 is changed to 0.

Set output file type and options in RIOS

RIOS allows you to set the default file type and creation options using environmental variables, rather than having to set them each time, meaning these are set based on the system rather than the script and you only have to worry about them once. The following are used to set the file type:


And creation options:


If you are using EnvMaster (which is awesome, see earlier post) you can set this as part of the EnvMaster file using:

module.setVar('GTiff', 'RIOS_DFLT_DRIVER')

However, sometimes it is necessary to override these default variables, this can be done using the ApplierControls class:

controls = applier.ApplierControls()


Recently I wanted to create a script that would set the output file type and creation options based on the extension of the output file the user provided. To accomplish this I wrote following function to set the file type and creation options based on the extension of the output file.

def getOutDriver(outFileName):

    """ Set output driver type and creation options
        based on file extension.

        Returns driver name and list of creation options
        as dictionary.

    outControls = {}

    gdalFormat = 'ENVI'
    gdalCOOptions = []
    calcStats = False

    extension = os.path.splitext(outFileName)[-1]
    if extension == '.kea':
        gdalFormat = 'KEA'
        calcStats = True
    elif extension == '.tif':
        gdalFormat = 'GTiff'
        gdalCOOptions = ['COMPRESS=DEFLATE']
        calcStats = True
    elif extension == '.img':
        gdalFormat = 'HFA'
        gdalCOOptions = ['COMPRESSED=YES']
        calcStats = True
    elif extension == '.pix':
        gdalFormat = 'PCIDSK'
        gdalCOOptions = ['COMPRESSION=RLE']
        calcStats = True

    outControls['gdalFormat'] = gdalFormat
    outControls['gdalCOOptions'] = gdalCOOptions
    outControls['calcStats'] = gdalCOOptions

    return outControls

Which is called using:

# Set format for output image
outControls = getOutDriver('outImageFile.kea')

# Set options

This will also calculate stats and overviews for formats which support them.

For a full description of environmental variables used by RIOS see the manual

The full script I created this function for (for topographically correcting SAR data) can be downloaded from here.

Obtaining JAXA PALSAR Data and Forest / Non-Forest Maps

JAXA have recently released their global forest / non-forest map at 50 m resolution and the Advanced Land Orbiting Satellite (ALOS) Phased Array L-Band SAR (PALSAR) data from which they were derived. This is really exciting because SAR data provides a different view of the world than optical data, which we’re more used to viewing. A particularly interesting feature of L-band SAR for mapping vegetation is the ability to ‘see’ through clouds and the canopy layer of vegetation. A good introduction to SAR data, in the context of vegetation mapping, is provided in the following paper:

Rosenqvist, A., Finlayson, C. M., Lowry, J and Taylor, D., 2007. The potential of long- wavelength satellite-borne radar to support implementation of the Ramsar Wetlands Convention. Aquatic Conservation: Marine and Freshwater Ecosystems. 17:229–244.

Downloading data

You can download data from:


You need to sign up for an account but this is a quick and straightforward process.

You can download data in 1 x 1 degree tiles or batches of 5 x 5 degrees. Data are in ENVI format, and can be read with GDAL, or programs that use GDAL (e.g., QGIS). If you don’t already have a viewer, you can download TuiView to open them with. ArcMap can read them (as it uses GDAL) but it won’t recognise it if you go through the ‘Add Data’ dialogue. However, you can just drag the files (larger files without the ‘.hdr’ extension) from windows explorer to the ‘Table of Contents’.

Mosaic data

To mosaic all files in a 5 x 5 degree batch (or any number of files), you can use a combination of GNU Parallel to untar and gdalbuildvrt. Assuming we want to mosaic the HH- and HV-polarisation PALSAR data the following commands can be used:

# Untar file
tar -xf N60W105_07_MOS.tar.gz

# Change into directory
cd N60W105

# Untar all files, in parallel using GNU Parallel
ls *.gz | parallel tar xf

# Create a list of HH and HV files
ls *_HH > hhfilelist.txt
ls *_HV > hvfilelist.txt

# Build VRT
gdalbuildvrt -input_file_list hhfilelist.txt N60W105_HH.vrt
gdalbuildvrt -input_file_list hvfilelist.txt N60W105_HV.vrt

This will create virtual rasters, which are text files containing references to the data. You can convert to real rasters (KEA, geotiff etc.,) using gdal_translate:

gdal_translate -of GTiff 60W105_HH.vrt 60W105_HH.tif
gdal_translate -of GTiff 60W105_HV.vrt 60W105_HV.tif

Calibrate data

The data are supplied as digital numbers, to calibrate and convert to dB, the following equation is used [1]:

10 \times \log10 (DN^2) - 83.0

You can run this calibration in RSGISLib using the following Python script:

 # Import RSGISLib
 import rsgislib
 from rsgislib import out image

# Set input and output image
 inimage = 'N60W105_HH.vrt'
 outimage = 'N60W105_HH_db.kea'

# Run image maths
 imagecalc.imageMath(inimage, outimage, '10*log10(b1^2) - 83.0,
         'KEA', rsgislib.TYPE_32FLOAT)

# Calculate stats and pyramids (for fast display)

Alternatively you could grab the calPALSAR_dB.py script to perform the same calculation using RIOS (which will run under windows, see earlier post for more details)

SAR data takes a while to get your head into but once you do it provides a wealth of information.

Update – data are now available at 25 m resolution from the same place

[1] Shimada, M. and Ohtaki, T. 2010. Generating large-scale high-quality SAR mosaic datasets: Application to PALSAR data for global monitoring. IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing. 3(4):637–656.

Co-register images using RSGISLib

When using data from different sensors there are sometimes slight issues with co-registration. If you know one image is well registered (the reference image) it is possible to register the other (the floating image) to it. Within RSGISLib there are two options for automatically generating tie points between two images for this task, which are a simplified versions of the algorithm proposed in:

Bunting, P.J., Labrosse, F. & Lucas, R.M., 2010. A multi-resolution area-based technique for automatic multi-modal image registration. Image and Vision Computing, 28(8), pp.1203–1219. http://dx.doi.org/10.1016/j.imavis.2009.12.005

The following XML script utilises the basic algorithm in RSGISLib which assumes all tie points are independent of each other. The search (search) can be reduced if the images are already closely registered to reduce run time. If data are very different (e.g., optical and SAR) the correlation threshold (threshold) may need to be reduced. The tie points generated are added to a GDAL file, and gdalwarp is used to warp the image, in this case using the thin plate splines transform.

<?xml version="1.0" encoding="UTF-8" ?>
Co-register two images
<rsgis:commands xmlns:rsgis="http://www.rsgislib.org/xml/">
<!– Generate GCPs –>
<rsgis:command algor="registration" option="basic"
outputType="rsgis_img2map" metric="correlation"
pixelgap="50" window="100" search="10" threshold="0.6"
stddevRef="2" stddevFloat="2" subpixelresolution="4" />
<!– Add GCPs to GDAL –>
<rsgis:command algor="registration" option="gcp2gdal"
datatype="UInt16" format="KEA" />
<!– Warp using GDAL –>
<rsgis:command algor="commandline" option="execute"
command="gdalwarp -of KEA -ot Byte -tr 100 100 -tps float_image_gcps.kea float_image_warp.kea" />
<rsgis:command algor="commandline" option="execute"
command="rm float_image_gcps.kea" />

view raw
hosted with ❤ by GitHub

To run the script use:

rsgisexe -x warpImage.xml 

More details on the parameters are available in the RSGISLib documentation (pdf).