Tag Archives: GDAL

Find scenes from an archive which contain a point

This is a quick way of locating which scenes, from an archive of data, contain a point you are interested in.

First make a list of all available scenes

ls data/S1/2017/??/??/*/*vv*img > all_scenes.txt

Then use gdalbuildvrt to make a VRT file containing all scenes as a separate band (assumes scenes only have a single band).

gdalbuildvrt -input_file_list all_scenes.txt \
             -separate -o s1_all_scenes.vrt

Use gdallocation info with ‘-lifonly’ flag to the scene the point we are interested in (1.5 N, 103 E) is within and redirect to a text file.

gdallocationinfo -lifonly -wgs84 s1_all_scenes.vrt \
                 103 1.5 > selected_scenes.txt

This will show if within the bounding box of the scene. To find scenes which have data can use gdallocationinfo again but with the ‘-valonly’ flag and save the values to a text file.

gdallocationinfo -valonly -wgs84 s1_all_scenes.vrt \
                  103 1.5 > selected_scenes_values.txt

Can then subset the original list of files with the values file using Python:

import numpy
# Read in data to numpy arrays
scenes = numpy.genfromtxt("selected_scenes.txt", dtype=str)
values = numpy.genfromtxt("selected_scenes_values.txt")

# Select only scenes where the value is not 0
scenes_data = scenes[values != 0]

# Save to text file
numpy.savetxt("selected_scenes_data.txt", scenes_data, fmt="%s")

If you have a very large archive of data and need to find which scenes intersect a point often, having a spatial database with the scene outlines would be a better approach. However, if it isn’t something you do often this quick approach using only the GDAL utilities and a bit of Python is worth knowing.

Convert Sentinel-2 Data Using GDAL

The latest version of GDAL (2.1) has a driver to read Sentinel 2 data (see http://gdal.org/frmt_sentinel2.html). Like HDF files they are read as subdatasets. Running ‘gdalinfo’ on the zipped folder or the .xml file contained within the .SAFE directory will display all the subdatasets, as well as all the metadata (so quite a lot of information!).

As with HDF5 you can pass the subdataset names to gdalinfo to get more information or gdal_translate to extract them as a separate dataset.

To make it easier to extract all the subdatasets I wrote a script (extract_s2_data.py) which can be downloaded from the https://bitbucket.org/petebunting/rsgis_scripts repository.

The script will get a list of all subdatasets using GDAL:

from osgeo import gdal
dataset = gdal.Open('S2/S2.xml', gdal.GA_ReadOnly)
subdatasets = dataset.GetSubDatasets()
dataset = None

The ones to be extracted are for the 10, 20 and 60 m resolution band groups for each UTM zone (if the file crosses multiple zones).

For each subdataset it will give an output name, replacing the EPSG code with the UTM zone and ‘:’ with ‘_’.

Then the gdal_translate command is used to create a new file for each. By default the output is KEA format, called using subprocess.

To run the script first install GDAL 2.1, the conda-forge channel has recent builds, to install them using conda:

conda create -n gdal2 -c conda-forge gdal
source activate gdal2

(If you are on Windows leave out ‘source’)

To extract all subdatasets from a zipped Sentinel 2 scene to the current directory you can then use:

extract_s2_data.py -o . \

The gdal_translate command used is printed to the screen.

The default output format is KEA, you can change using the ‘–of’ flag. For example to convert an unzipped scene to GeoTiff:

extract_s2_data.py -o . --of GTiff \

To get the extension for all supported drivers, and some creation options the ‘get_gdal_drivers’ module from arsf_dem_scripts is optionally used. You can just download this file and copy into the same directory ‘extract_s2_data.py’ has been saved to. For Linux or OS X you can run:

# OS X
curl https://raw.githubusercontent.com/pmlrsg/arsf_dem_scripts/master/arsf_dem/get_gdal_drivers.py > get_gdal_drivers.py

# Linux
wget https://raw.githubusercontent.com/pmlrsg/arsf_dem_scripts/master/arsf_dem/get_gdal_drivers.py

Useful GDAL commands for working with hyperspectral data

The Geospatial Data Abstraction Library (GDAL) contains a number of utilities for working with remote sensing data. The following are some commands which are particularly useful when working with airborne hyperspectral data.

Create subset and change format

To create a band and spatial subset of a larger hyperspectral dataset the following command can be used:

gdal_translate -of GTiff -co COMPRESS=LZW \
 -projwin 481500 766000 482500 765000 \
 -b 190 -b 40 -b 15  \
 outputs/f281013b_utm_wgs84N50_mapped_subsection.bil \
  • -of: Output Format – we are using GeoTiff file. For all available options see http://www.gdal.org/formats_list.html
  • -co: Creation Option – specifies the creation options, these are specific to the format. For GeoTiff we use compression to reduce the file size. For ENVI we can use the creation options to specify the Interleave (BSQ or BIL)
  • -projwin: A rectangle delineating an area to be extracted, in the format min X min Y max X max Y
  • -b: A given band to extract
  • final two arguments: input and output filenames

Create mosaic of multiple flightlines

Often a mosaic of all individual flightlines is required. Due to the large sizes of the mapped files this can often be time consuming. One option is to first make a ‘Virtial Raster’ using the ‘gdalbuildvrt’ command:

gdalbuildvrt -srcnodata 0 outputs/f2810_mapped_mosaic.vrt \
 outputs/f281013b_utm_wgs84N50_mapped_3band.bil \
  • -srcnodata: No data value of input lines
  • first argument: output VRT file
  • final arguments: list of input files

Rather than creating a new file a small .vrt text file is created which contains links to the individual lines.

Many programs which use GDAL (e.g., TuiView) can open this file directly. To make opening the file easier we can do some optimisation. First pre-calculate statistics using:

gdalinfo -stats outputs/f2810_mapped_mosaic.vrt
  • -stats: Calculate statistics and save to .aux.xml file
  • final argument: file name

The gdalinfo command can also be used to print information about a raster. Note for the .vrt file it shows the individual files which are included.

After creating statistics generate overviews (lower resolution copies at different resolutions) using gdaladdo:

gdaladdo -r average outputs/f2810_mapped_mosaic.vrt 2 4 8 16
  • -r average: Use average for resampling overviews
  • first argument: file name
  • numbers: overview levels to build (2x, 4x, 8x and 16x)

We can also make a real raster using gdal_translate (as shown previously).

Extract pixel values for a given location

gdallocationinfo -valonly -geoloc \
 outputs/f281013b_utm_wgs84N50_mapped_subsection.bil \
 481801 765289
  • -valonly: Only print values to the screen
  • -geoloc: Interprerate x and y values as coordinates rather than pixel and line
  • first argument: file name
  • x y: Easting and Northing of pixel to extract statistics from

This will print the values to the screen. To save as a text file we can redirect the output using ‘>’

gdallocationinfo -valonly -geoloc \
 outputs/f281013b_utm_wgs84N50_mapped_subsection.bil \
 481801 765289 > outputs/extracted_values.txt

This text file can be opened in other programs for further analysis.

This post is based on the hyperspectral practical session from the NERC Airborne Survey and Research Facility workshop held at Plymouth Marine Laboratory, March 2016. For more information about the ARSF Data Analysis Node (ARSF-DAN) and details about future training opportunities follow @ARSF_DAN on twitter. Hyperspectral data acquired by ARSF are available to download from NEODC.

Assign a projection to an image from a WKT file, Proj4 string or another image

Sometimes, during the course of data processing even through the spatial information is correct there can be problems with the projection information, either it it lost, not stored in a format the software can understand or is stored slightly differently to other datasets with the same projection. In these situations, the easiest solution is to just copy the correct projection across. Within RSGISLib there are some functions for this within imageutils. For example if you want to copy the projection from an existing image:

from rsgislib import imageutils
inputImage = 'image_without_projection.kea'
refImage = 'image_with_projection.kea'
imageutils.copyProjFromImage(inputImage, refImage)

You can also assign a projection using a Well Known Text (WKT) file (in this example ‘UTM50S.wkt’, stored in the same directory):

from rsgislib import imageutils
inputImage = 'image_without_projection.kea'
imageutils.assignProj(inputImage, wktFile='UTM50S.wkt')

Note: this syntax requires version of RSGISLib 2.3 or later

If you don’t have RSGISLib installed there is a script to assign a projection within PML’s ARSF Tools repository on GitHub which just requires the GDAL Python bindings.

Usage is simple, to assign a projection from a WKT file:

python assign_projection.py --wkt 'UTM50S.wkt' image_without_projection.kea

If you have a lot of files which need the projection assigning you can use:

python assign_projection.py --wkt 'UTM50S' *.kea

This will work under both Unix-like and Windows operating systems.

As well as a WKT file you can also use Proj4 strings, for example:

python assign_projection.py \
 --proj4 '+proj=utm +zone=55 +south +ellps=WGS84 +towgs84=0,0,0,0,0,0,0 +units=m +no_defs' \

A list of WKT files and Proj4 strings are available from spatialreference.org. You can also get the WKT string from an existing dataset using:

gdalinfo image_with_projection.kea

Or a Proj4 string using:

gdalinfo -proj4 image_with_projection.kea

Open zipped raster files in GDAL

A neat trick you can do with GDAL is open zipped files without unzipping them first. The syntax is described on http://trac.osgeo.org/gdal/wiki/UserDocs/ReadInZip

The first step is to find out the contents of the file (if you don’t already know it), you can do this using the following command:

unzip -l f088013b_mapped_osng_bil.zip

You can then run GDAL commands using the following syntax:

gdalinfo /vsizip/f088013b_mapped_osng_bil.zip/f088013b_mapped_osng.bil

The latest version of TuiView (1.1.3) will automatically search inside the zip file for known raster types and pass the correct path to GDAL so you can view zipped files using:

tuiview f088013b_mapped_osng_bil.zip

You can also use for tar.gz files. Again, get a list of the file contents using:

tar -tf LC80430332013291LGN00.tar.gz

Get file information using:

gdalinfo /vsitar/LC80430332013291LGN00.tar.gz/LC80430332013291LGN00_B1.TIF

Note /vsitar/ is used instead of /vsizip/.

Although this is handy a better option is storing data in a format which supports lossless compression. The KEA image format [1] is one example of a GDAL compatible raster format which uses zlib compression and is capable of storing very large datasets.

You can convert a tar.gz’ipped file to the KEA format using:

gdal_translate -of KEA /vsitar/LC80430332013291LGN00.tar.gz/LC80430332013291LGN00_B1.TIF \

[1] Bunting, P and Gillingham, S. 2013. The KEA image file format, Computers & Geosciences, Volume 57. Pages 54-58, ISSN 0098-3004, http://dx.doi.org/10.1016/j.cageo.2013.03.025.

Clip and Reproject an image to a master file using RSGISLib and GDAL

Nathan Thomas

Through my research I handle a plethora of raster datasets, each with a variety of resolutions and projections that I have to clip, reproject and resample to a set of consistent parameters, derived from a ‘master’ dataset. This would usually require multiple processing steps and a series of inputs, such as the extents of the image and the input projection (possibly a WKT file). The greater the number of inputs that are required the greater the chance that exists of an error occurring.

Here is an alternative method using a simple combination of RSGISLib and GDAL, which requires very little user input. This method relieves the user from having to retrieve the parameters of the input dataset from the image file, if these parameters are unknown. This is particularly useful if a user wishes to clip a dataset at a series of different study sites, each with a different spatial extent and possibly even different projection. This method is aimed at being simple to implement and would suit a beginner in Python looking to get multiple input datasets to a set of standard parameters, without requiring the knowledge to open an image and retrieve the required extent, resolution and projection.

Create Copy Image

The first step implements a command in RSGISLib to create a blank copy of a master dataset, populated with 0 values. This creates a blank image of a single value that has the parameters of the input file such as projection information, extent and resolution. The number of bands can also be specified so that a 7-band image can be created with the parameters from a single band image input. The RSGISLib command is as follows:

import rsgislib

inMask = 'inMaskImage.kea'
outName = 'outImage.kea'
numberOfBands = 1
GDALformat = 'KEA'
GDALtype = rsgislib.TYPE_32FLOAT

rsgislib.imageutils.createCopyImage(inMask, outName, \
 numberOfBands, 0, GDALformat, GDALtype)

GDAL Reproject Image

The second step implements a GDAL command to open a second input dataset (the one to clip) as read only and effectively populate the blank image created in step one with the values from the second input dataset. The output of this is a clipped image with the parameters of the master input file. The process is as follows:

from osgeo import gdal

ClipFile = 'largeRaster.kea'
outName = 'outImage.kea'
interpolationMethod = gdal.GRA_CubicSpline

inFile = gdal.Open(ClipFile, gdal.GA_ReadOnly)
outFile = gdal.Open(outName, gdal.GA_Update)

gdal.ReprojectImage(inFile, outFile, None, None, \


The script is simple to implement and requires a number of inputs:

  • ‘-m’ Input Mask – The input file with the desired parameters
  • ‘-I’ Input Image – The image you wish to clip
  • ‘-o’ Output Image – The output image
  • ‘-n’ Number of Bands – The number of bands that the clipped scene requires (ie, to clip an RGB dataset the number of bands would be 3)
  • ‘-of’ GDAL format – The format of the output image (i.e. KEA)

An example of the implementation may look like:

python CutReproject.py –m N00E103_HH.kea \
        –i GlobalSRTMmosaic.kea \
        –o N00E103_SRTM.kea –n 1 –of KEA

CutReproject.py is available to download from bitbucket.org/nathanmthomas/bucket-of-rs-and-gis-scripts/

Converting HDF5 format AirMOSS data to a GDAL image

As NASA’s preferred file format a lot of remote sensing data is now delivered in HDF5 format. However, as there are multiple ways of storing geospatial information within HDF5 reading the data with GDAL often requires some effort. Luckily with the h5py and GDAL Python libraries converting to another GDAL format is easy.

The instructions here are for reading Synthetic Aperture Radar (SAR) data from the AirMOSS mission, a polarimetric P-band system. More details about the AirMOSS mission and links to download data are available from airmoss.jpl.nasa.gov (registration required to download data). However, the general process should be similar for most HDF5 formats.

  1. Import file
  2. import h5py
    inHDF5File = "indata.h5"
    # Read in data
    indata = h5py.File(inHDF5File)
  3. Get some information about data
  4. # List keys
    for key in indata.keys():
    # List items
    for item in indata.items():
    # List attributes
    for att in indata.attrs:
  5. Extract data
  6. hhData = indata['HHHH']
    vvData = indata['VVVV']
    hvData = indata['HVHV']
  7. Get geospatial information
  8. For AirMOSS data the geospatial information are stored as attributes in the HDF5 file and are read using ‘indata.attrs’

    # Set image size
    inXSize = hhData.shape[1]
    inYSize = hhData.shape[0]
    # Get coordinates
    minLon = indata.attrs['northwest longitude'] 
    maxLat = indata.attrs['northwest latitude']
    # Get pixel resolution
    pixelX = indata.attrs['longitude spacing']
    pixelY = indata.attrs['latitude spacing']
  9. Create new GDAL file
  10. You can select any format GDAL can write to as the output format, I’ve chosen to stick with HDF5 and go with the KEA format (which provides an implementation of the full GDAL specification).

    # Set Geospatial Information
    geoTransform = []
    for i in range(6):
    geoTransform[0] = minLon # top left x 
    geoTransform[1] = pixelX
    geoTransform[2] = 0
    geoTransform[3] = maxLat # top left y 
    geoTransform[4] = 0
    geoTransform[5] = pixelY
    srs = osr.SpatialReference()
    # Creat output image
    numBands = 3
    gdalDriver = getGDALFormatFromExt(outFileName)
    driver = gdal.GetDriverByName(gdalDriver)
    metadata = driver.GetMetadata()
    newDataset = driver.Create(outFileName, inXSize, inYSize, numBands, gdal.GDT_Float32)
  11. Copy data to new file
  12. newDataset.GetRasterBand(1).WriteArray(hhData)
    # Close dataset
    newDataset = None

    The data should now be readable in anything which can read the output format you selected (e.g., TuiView.)

    The full script can be downloaded from https://github.com/MiXIL/AirMOSS-GDAL-Utilities/, there are some differences in the final script, a loop is used to read and write each band, rather than loading all the data to memory and the no data value of -9999 is changed to 0.