Category Archives: General Remote Sensing

Find scenes from an archive which contain a point

This is a quick way of locating which scenes, from an archive of data, contain a point you are interested in.

First make a list of all available scenes

ls data/S1/2017/??/??/*/*vv*img > all_scenes.txt

Then use gdalbuildvrt to make a VRT file containing all scenes as a separate band (assumes scenes only have a single band).

gdalbuildvrt -input_file_list all_scenes.txt \
             -separate -o s1_all_scenes.vrt

Use gdallocation info with ‘-lifonly’ flag to the scene the point we are interested in (1.5 N, 103 E) is within and redirect to a text file.

gdallocationinfo -lifonly -wgs84 s1_all_scenes.vrt \
                 103 1.5 > selected_scenes.txt

This will show if within the bounding box of the scene. To find scenes which have data can use gdallocationinfo again but with the ‘-valonly’ flag and save the values to a text file.

gdallocationinfo -valonly -wgs84 s1_all_scenes.vrt \
                  103 1.5 > selected_scenes_values.txt

Can then subset the original list of files with the values file using Python:

import numpy
# Read in data to numpy arrays
scenes = numpy.genfromtxt("selected_scenes.txt", dtype=str)
values = numpy.genfromtxt("selected_scenes_values.txt")

# Select only scenes where the value is not 0
scenes_data = scenes[values != 0]

# Save to text file
numpy.savetxt("selected_scenes_data.txt", scenes_data, fmt="%s")

If you have a very large archive of data and need to find which scenes intersect a point often, having a spatial database with the scene outlines would be a better approach. However, if it isn’t something you do often this quick approach using only the GDAL utilities and a bit of Python is worth knowing.

Convert Sentinel-2 Data Using GDAL

The latest version of GDAL (2.1) has a driver to read Sentinel 2 data (see http://gdal.org/frmt_sentinel2.html). Like HDF files they are read as subdatasets. Running ‘gdalinfo’ on the zipped folder or the .xml file contained within the .SAFE directory will display all the subdatasets, as well as all the metadata (so quite a lot of information!).

As with HDF5 you can pass the subdataset names to gdalinfo to get more information or gdal_translate to extract them as a separate dataset.

To make it easier to extract all the subdatasets I wrote a script (extract_s2_data.py) which can be downloaded from the https://bitbucket.org/petebunting/rsgis_scripts repository.

The script will get a list of all subdatasets using GDAL:

from osgeo import gdal
dataset = gdal.Open('S2/S2.xml', gdal.GA_ReadOnly)
subdatasets = dataset.GetSubDatasets()
dataset = None

The ones to be extracted are for the 10, 20 and 60 m resolution band groups for each UTM zone (if the file crosses multiple zones).

For each subdataset it will give an output name, replacing the EPSG code with the UTM zone and ‘:’ with ‘_’.

Then the gdal_translate command is used to create a new file for each. By default the output is KEA format, called using subprocess.

To run the script first install GDAL 2.1, the conda-forge channel has recent builds, to install them using conda:

conda create -n gdal2 -c conda-forge gdal
source activate gdal2

(If you are on Windows leave out ‘source’)

To extract all subdatasets from a zipped Sentinel 2 scene to the current directory you can then use:

extract_s2_data.py -o . \
S2A_OPER_PRD_MSIL1C_PDMC_20151201T144038_R010_V20151130T142545_20151130T142545.zip

The gdal_translate command used is printed to the screen.

The default output format is KEA, you can change using the ‘–of’ flag. For example to convert an unzipped scene to GeoTiff:

extract_s2_data.py -o . --of GTiff \
S2A_OPER_PRD_MSIL1C_PDMC_20151201T144038_R010_V20151130T142545_20151130T142545.SAFE

To get the extension for all supported drivers, and some creation options the ‘get_gdal_drivers’ module from arsf_dem_scripts is optionally used. You can just download this file and copy into the same directory ‘extract_s2_data.py’ has been saved to. For Linux or OS X you can run:

# OS X
curl https://raw.githubusercontent.com/pmlrsg/arsf_dem_scripts/master/arsf_dem/get_gdal_drivers.py > get_gdal_drivers.py

# Linux
wget https://raw.githubusercontent.com/pmlrsg/arsf_dem_scripts/master/arsf_dem/get_gdal_drivers.py

Useful GDAL commands for working with hyperspectral data

The Geospatial Data Abstraction Library (GDAL) contains a number of utilities for working with remote sensing data. The following are some commands which are particularly useful when working with airborne hyperspectral data.

Create subset and change format

To create a band and spatial subset of a larger hyperspectral dataset the following command can be used:

gdal_translate -of GTiff -co COMPRESS=LZW \
 -projwin 481500 766000 482500 765000 \
 -b 190 -b 40 -b 15  \
 outputs/f281013b_utm_wgs84N50_mapped_subsection.bil \
 outputs/f281013b_utm_wgs84N50_mapped_subsection_3band.tif
  • -of: Output Format – we are using GeoTiff file. For all available options see http://www.gdal.org/formats_list.html
  • -co: Creation Option – specifies the creation options, these are specific to the format. For GeoTiff we use compression to reduce the file size. For ENVI we can use the creation options to specify the Interleave (BSQ or BIL)
  • -projwin: A rectangle delineating an area to be extracted, in the format min X min Y max X max Y
  • -b: A given band to extract
  • final two arguments: input and output filenames

Create mosaic of multiple flightlines

Often a mosaic of all individual flightlines is required. Due to the large sizes of the mapped files this can often be time consuming. One option is to first make a ‘Virtial Raster’ using the ‘gdalbuildvrt’ command:

gdalbuildvrt -srcnodata 0 outputs/f2810_mapped_mosaic.vrt \
 outputs/f281013b_utm_wgs84N50_mapped_3band.bil \
 outputs/f281023b_utm_wgs84N50_mapped_3band.bil
  • -srcnodata: No data value of input lines
  • first argument: output VRT file
  • final arguments: list of input files

Rather than creating a new file a small .vrt text file is created which contains links to the individual lines.

Many programs which use GDAL (e.g., TuiView) can open this file directly. To make opening the file easier we can do some optimisation. First pre-calculate statistics using:

gdalinfo -stats outputs/f2810_mapped_mosaic.vrt
  • -stats: Calculate statistics and save to .aux.xml file
  • final argument: file name

The gdalinfo command can also be used to print information about a raster. Note for the .vrt file it shows the individual files which are included.

After creating statistics generate overviews (lower resolution copies at different resolutions) using gdaladdo:

gdaladdo -r average outputs/f2810_mapped_mosaic.vrt 2 4 8 16
  • -r average: Use average for resampling overviews
  • first argument: file name
  • numbers: overview levels to build (2x, 4x, 8x and 16x)

We can also make a real raster using gdal_translate (as shown previously).

Extract pixel values for a given location

gdallocationinfo -valonly -geoloc \
 outputs/f281013b_utm_wgs84N50_mapped_subsection.bil \
 481801 765289
  • -valonly: Only print values to the screen
  • -geoloc: Interprerate x and y values as coordinates rather than pixel and line
  • first argument: file name
  • x y: Easting and Northing of pixel to extract statistics from

This will print the values to the screen. To save as a text file we can redirect the output using ‘>’

gdallocationinfo -valonly -geoloc \
 outputs/f281013b_utm_wgs84N50_mapped_subsection.bil \
 481801 765289 > outputs/extracted_values.txt

This text file can be opened in other programs for further analysis.

This post is based on the hyperspectral practical session from the NERC Airborne Survey and Research Facility workshop held at Plymouth Marine Laboratory, March 2016. For more information about the ARSF Data Analysis Node (ARSF-DAN) and details about future training opportunities follow @ARSF_DAN on twitter. Hyperspectral data acquired by ARSF are available to download from NEODC.

Copy information between ENVI header files

The header files (.hdr) of ENVI files are text files which contain the size of the file (pixels and lines) geospatial information and other meta data such as wavelengths, data units etc.,

Not all programs are aware of all items in then header file and some may become lost during processing. ARSF-DAN have made a script available which can copy attributes from one header to another. The script can be downloaded from https://github.com/pmlrsg/arsf_tools and is run using:

copy_header_info.py f221013_mapped_osng.bil.hdr \
                      f221013_nadir_mapped_osng.hdr \
                      -k "map info" "projection info" "wavelength"

To show the available fields the following script (also from the arsf_tools repo) can be used:

get_info_from_header.py f221013_mapped_osng.bil.hdr

Create a GIF from a series of images

Creating a video is a cool way to visualise a time series of remote sensing data / outputs. Rather than creating a full video you can also use a GIF. One advantage of GIFs is you can embed them in Powerpoint presentations and they should will play in presentation mode (I’ve found them more reliable than videos in other formats).

You can use the ImageMagick convert command to create a GIF:

convert -delay 20 -loop 0 in_files/*.png out_animation.gif

However, depending on how your files are named it might not sort the numbers as expected leading to jumps in the video. This problem was addressed in the following post on the ‘remotelysensible’ blog:

https://remotelysensible.wordpress.com/2014/09/18/alphanumeric-string-sorting-in-python/

This can be used to sort the files in Python then call the convert command using subprocess:

#!/usr/bin/env python

import argparse
import re
import subprocess

re_natural = re.compile('[0-9]+|[^0-9]+')

def natural_key(s):
   """
   Natural sorting function from http://stackoverflow.com/questions/12184015/in-python-how-can-i-naturally-sort-a-list-of-alphanumeric-strings-such-that-alp
   """
   return [(1, int(c)) if c.isdigit() else (0, c.lower()) for c in re_natural.findall(s)] + [s]

if __name__ == '__main__':
   parser = argparse.ArgumentParser(description='Create an animated gif from a series of images')
   parser.add_argument("images", nargs='+',type=str, help="Input images")
   parser.add_argument('-o', '--outgif',
                       metavar ='Output gif',
                       help ='Output name for gif',
                       required=True)
   args=parser.parse_args()

   sorted_png_files = sorted(args.images, key=natural_key)

   convert_cmd = ['convert','-delay','20','-loop','0']
   convert_cmd.extend(sorted_png_files)
   convert_cmd.extend([args.outgif])

   subprocess.call(convert_cmd)

The script is run using:

python create_gif.py -o out_animation.gif in_files/*.png 

As well as creating GIFS of a time series you can use SPD 3D Points Viewer to export images of a point cloud as it rotates and create a gif from these similar to this video of airborne LiDAR data over Lake Vyrnwy on YouTube: https://www.youtube.com/watch?v=P_9998KJib4

The script can be downloaded from GitHub. Note the delay is hardcoded at 20/100th of a second but this can be changed if needed.

Batch processing atmospheric correction using ARCSI

Previous posts have covered atmospheric correction using the Atmospheric and Radiometric Correction of Satellite Imagery (ARCSI) software. When a large number of scenes require processing then commands to automate steps are desirable and greatly simplify the process.

ARCSI provides a number of commands which are useful for batch processing:

  • arcsisortlandsat.py sorts the landsat ‘tar.gz’ files into individual directories for each sensor (i.e., LS 5 TM) and also builds a standard directory structure for processing.
  • arcsiextractdata.py extracts the contents of \verb|tar| and \verb|tar.gz| archives with each archive being extracted into an individual directory.
  • arcsibuildcmdslist.py builds the ‘arcsi.py’ commands for each scene creating a shell script.

The files used for this tutorial are shown below, but others could be downloaded from from earthexplorer and used instead.

LC80720882013174LGN00.tar.gz	LT52040232011310KIS00.tar.gz
LC82040242013139LGN01.tar.gz	LT52040242011118KIS00.tar.gz
LT52040232011118KIS00.tar.gz	LT52050232011109KIS00.tar.gz

Sort Landsat Scenes

The first command to execute is ‘arcsisortlandsat.py’ which needs to be executed from within the directory containing files:

arcsisortlandsat.py -i . -o .

This will generate the following directory structure and move the files into the appropriate directories.

> ls *
LS5:
Inputs	Outputs	RAW	tmp

LS8:
Inputs	Outputs	RAW	tmp

> ls */RAW
LS5/RAW:
LT52040232011118KIS00.tar.gz	LT52040242011118KIS00.tar.gz
LT52040232011310KIS00.tar.gz	LT52050232011109KIS00.tar.gz

LS8/RAW:
LC82040242013139LGN01.tar.gz	LC82060242015047LGN00.tar.gz
LC82060232015047LGN00.tar.gz

Extract Data

To extract the data from the ‘tar.gz’ files the ‘arcsiextractdata.py’ command is used as shown below:

arcsiextractdata.py -i ./LS5/RAW/ -o ./LS5/Inputs/

arcsiextractdata.py -i ./LS8/RAW/ -o ./LS8/Inputs/

Once the files are extracted the directory structure will look like the following:

> ls */RAW
LS5/RAW:
LT52040232011118KIS00.tar.gz	LT52040242011118KIS00.tar.gz
LT52040232011310KIS00.tar.gz	LT52050232011109KIS00.tar.gz

LS8/RAW:
LC80720882013174LGN00.tar.gz	LC82040242013139LGN01.tar.gz

> ls */Inputs
LS5/Inputs:
LT52040232011118KIS00	LT52040242011118KIS00
LT52040232011310KIS00	LT52050232011109KIS00

LS8/Inputs:
LC82040242013139LGN01	LC82060242015047LGN00
LC82060232015047LGN00

Build ARCSI Commands

To build the ‘arcsi.py’ commands, one for each input file, the following commands are used. Notice that these are very similar to the individual commands that you previously executed but now provide inputs to the ‘arcsibuildcmdslist.py’ command which selects a number of input files and generate a single shell script output.

arcsibuildcmdslist.py -s ls5tm -f KEA --stats -p RAD DOSAOTSGL SREF \
          --outpath ./LS5/Outputs --aeroimg ../WorldAerosolParams.kea \
          --atmosimg ../WorldAtmosphereParams.kea --dem ../UKSRTM_90m.kea \
          --tmpath ./LS5/tmp --minaot 0.05 --maxaot 0.6 --simpledos \
          -i ./LS5/Inputs -e MTL.txt -o LS5ARCSICmds.sh

arcsibuildcmdslist.py -s ls8 -f KEA --stats -p RAD DOSAOTSGL SREF \
        --outpath ./LS8/Outputs --aeroimg ../WorldAerosolParams.kea \
        --atmosimg ../WorldAtmosphereParams.kea --dem ../UKSRTM_90m.kea \
        --tmpath ./LS8/tmp --minaot 0.05 --maxaot 0.6 --simpledos \
        -i ./LS8/Inputs -e MTL.txt -o LS8ARCSICmds.sh

Following the execution of these commands the following two files will have been created ‘LS5ARCSICmds.sh’ and ‘LS8ARCSICmds.sh’. These files contain the ‘arcsi.py’ commands to be executed. Open the files and take a look, you will notice that all the file paths have been convert to absolute paths which means the file can be executed from anywhere on the system as long as the input files are not moved.

Execute ARCSI Commands

To execute the ‘arcsi.py’ commands the easiest methods is to run each in turn using the following command:

sh LS5ARCSICmds.sh

sh LS8ARCSICmds.sh

This will run each of the commands sequentially. However, most computers now have multiple processing cores and to take advantage of those cores we can use the GNU parallel command line tool. Taking advantage of those cores means that processing can be completed much quicker and more efficiently.

cat LS5ARCSICmds.sh LS8ARCSICmds.sh | parallel -j 2 

The switch ‘-j 2’ specifies the number of processing cores which can be used for this processing. If no switches are provided then all the cores will be used. As some parts of the process involve reading / writing large files, depending on the speed of you hard drive and the number of cores you have available you might find limiting the number of cores is needed. Please note that until all processing has completed nothing will be printed to the console. You can check the command are running using the top / htop command.

Once you have completed your processing you should clean up your system to remove any files you don’t need for any later processing steps. In most cases you will only need to keep the ‘tar.gz’ (so you can reprocess the RAW data at a later date if required) and the SREF product. It is recommended that you also retain the scripts you used for processing and a record of the commands you used for a) reference if you need to rerun the data and b) as documentation for the datasets so you know what parameters and options were used.

This post was adapted from notes for a module taught as part of the Masters course in GIS and Remote Sensing at Aberystwyth University, you can read more about the program here.

Apply the same stretch to multiple files in TuiView

There are some cases when it is necessary to play around with the stretch used to display data in TuiView (edit -> stretch). For example, if the scene contains very bright or dark areas (e.g., clouds/shadows) this can skew the statistics and make the images display too light/dark. One option is to use a local stretch (round house icon), which will only use values from the current display to calculate statistics. You can also manually set the minimum and maximum values.

Once a good stretch has been obtained this can be saved to the image for formats which support it (e.g., KEA) using the ‘Save Stretch and Lookup Table’ button (disk icon). If multiple files need to be opened with the same stretch applied you can pass in a file with a saved stretch when opening TuiView e.g.,

tuiview --stretchfromgdal image_with_saved_stretch.kea *kea

This will open all files matching the pattern ‘*bil’, applying the stretch saved in ‘image_with_saved_stretch.kea’

You can also save a stretch to a text file using the ‘Export Stretch and Lookup Table to Text File’ button (disk with ABC icon), this can be passed in when opening TuiView using

tuiview --stretchfromtext casi.stretch *bil

This will open all files matching the pattern ‘*bil’, applying the stretch saved in the text file ‘casi.stretch’