Useful GDAL commands for working with hyperspectral data

The Geospatial Data Abstraction Library (GDAL) contains a number of utilities for working with remote sensing data. The following are some commands which are particularly useful when working with airborne hyperspectral data.

Create subset and change format

To create a band and spatial subset of a larger hyperspectral dataset the following command can be used:

gdal_translate -of GTiff -co COMPRESS=LZW \
 -projwin 481500 766000 482500 765000 \
 -b 190 -b 40 -b 15  \
 outputs/f281013b_utm_wgs84N50_mapped_subsection.bil \
 outputs/f281013b_utm_wgs84N50_mapped_subsection_3band.tif
  • -of: Output Format – we are using GeoTiff file. For all available options see http://www.gdal.org/formats_list.html
  • -co: Creation Option – specifies the creation options, these are specific to the format. For GeoTiff we use compression to reduce the file size. For ENVI we can use the creation options to specify the Interleave (BSQ or BIL)
  • -projwin: A rectangle delineating an area to be extracted, in the format min X min Y max X max Y
  • -b: A given band to extract
  • final two arguments: input and output filenames

Create mosaic of multiple flightlines

Often a mosaic of all individual flightlines is required. Due to the large sizes of the mapped files this can often be time consuming. One option is to first make a ‘Virtial Raster’ using the ‘gdalbuildvrt’ command:

gdalbuildvrt -srcnodata 0 outputs/f2810_mapped_mosaic.vrt \
 outputs/f281013b_utm_wgs84N50_mapped_3band.bil \
 outputs/f281023b_utm_wgs84N50_mapped_3band.bil
  • -srcnodata: No data value of input lines
  • first argument: output VRT file
  • final arguments: list of input files

Rather than creating a new file a small .vrt text file is created which contains links to the individual lines.

Many programs which use GDAL (e.g., TuiView) can open this file directly. To make opening the file easier we can do some optimisation. First pre-calculate statistics using:

gdalinfo -stats outputs/f2810_mapped_mosaic.vrt
  • -stats: Calculate statistics and save to .aux.xml file
  • final argument: file name

The gdalinfo command can also be used to print information about a raster. Note for the .vrt file it shows the individual files which are included.

After creating statistics generate overviews (lower resolution copies at different resolutions) using gdaladdo:

gdaladdo -r average outputs/f2810_mapped_mosaic.vrt 2 4 8 16
  • -r average: Use average for resampling overviews
  • first argument: file name
  • numbers: overview levels to build (2x, 4x, 8x and 16x)

We can also make a real raster using gdal_translate (as shown previously).

Extract pixel values for a given location

gdallocationinfo -valonly -geoloc \
 outputs/f281013b_utm_wgs84N50_mapped_subsection.bil \
 481801 765289
  • -valonly: Only print values to the screen
  • -geoloc: Interprerate x and y values as coordinates rather than pixel and line
  • first argument: file name
  • x y: Easting and Northing of pixel to extract statistics from

This will print the values to the screen. To save as a text file we can redirect the output using ‘>’

gdallocationinfo -valonly -geoloc \
 outputs/f281013b_utm_wgs84N50_mapped_subsection.bil \
 481801 765289 > outputs/extracted_values.txt

This text file can be opened in other programs for further analysis.

This post is based on the hyperspectral practical session from the NERC Airborne Survey and Research Facility workshop held at Plymouth Marine Laboratory, March 2016. For more information about the ARSF Data Analysis Node (ARSF-DAN) and details about future training opportunities follow @ARSF_DAN on twitter. Hyperspectral data acquired by ARSF are available to download from NEODC.

Add all scripts within a repository to $PATH using envmaster

I have a couple of general scripts repositories for myself and shared with colleagues. These are for scripts which are useful but don’t fit into existing projects or justify having their own repository. An example is rsgis_scripts on bitbucket. The scripts are split into different directories which aren’t available on the main path. To make them available when needed I use EnvMaster (described in a previous post). I have an envmaster module which will search the repository for folders containing executables and add them to $PATH. It will also add folders containing ‘__init__.py’ to $PYTHONPATH.

#%EnvMaster1.0
import os
import glob
 
REPO_PATH = "/home/dan/Documents/Development/rsgis_scripts"
 
# Walk through directory
for d_name, sd_name, f_list in os.walk(REPO_PATH):
    # Ignore hidden directories
    if not d_name.startswith(".") and not ".git" in d_name and not ".hg" in d_name:
        file_list = glob.glob(os.path.join(d_name,"*"))
        # Check if a directory contains executable files
        if True in [(os.path.isfile(f) and os.access(f, os.X_OK)) for f in file_list]:
            module.setBin(d_name)
        # Check for Python libraries
        if len(glob.glob(os.path.join(d_name,"*","__init__.py"))) > 0:
            module.setPython(d_name)

The script is saved as ‘rsgis_scripts’ in ‘$ENVMASTERPATH’.
To load the module and prepend all the folders to ‘$PATH’ use:

envmaster load rsgis_scripts

They will remain on $PATH until you close the terminal or unload the module using:

envmaster unload rsgis_scripts

This script is just an example, it would be easy to modify for different use cases. If you have multiple repositories a module file can be created for each one.

Attribute LAS points using hyperspectral data

A good way of visualising LiDAR point clouds is to attribute them with colour information.
LAS files of point type 2 and above have the ability to store RGB colour information. These can be viewed with programs such as plas.io and CloudCompare.

To add RGB colour information to a LAS file ARSF-DAN have made a Python script available from: https://github.com/pmlrsg/arsf_tools

The script uses a combination of laspy (to read and write a LAS file) and GDAL (to read the raster) and is capable of working with images with more than 3 bands such as multispectral or hyperspectral files without creating a three band copy of the image first.

For this post I’ll demonstrate running the script on the JASMIN/CMEMS system. In a nutshell it is a Linux computing facility with lots of fast storage and direct access to many datasets, including the ARSF archive of airborne LiDAR and Hyperspectral data (apply for access to ARSF data here). Researchers in the UK who are part of the NERC or National Centre for Earth Observation (NCEO) scientific communities can apply for access. If you don’t have access to JASMIN you can run on your own machine just ignore the JASMIN specific steps.

  1. Apply for access and logon to JASMIN
    If you are using JASMIN the first stage is to get an account and log on by following the steps in this guide: http://jasmin.ac.uk/workflow/
  2. Set up scripts
    Once you have logged onto one of the shared VMs (e.g., cems-sci2.cems.rl.ac.uk) you need to load laspy, which isn’t installed by default. You can do this using:

    module load contrib/arsf/laspy/1.3.0
    

    Then checkout the repository containing the ‘colour_las_file.py’ script using:

    git clone https://github.com/pmlrsg/arsf_tools.git
    

    Check the script runs OK by typing:

    python2.7 arsf_tools/colour_las_file.py -h
    

    Note, you need to specify Python 2.7 as the default system Python is 2.6 but most of the Python libraries (e.g., GDAL) are build against 2.7. If you are not running the scripts on JASMIN just using ‘python’ should be fine.

  3. Colour using ARSF hyperspectral dataFor this example data from Mont Blanc flown as part of a EUFAR campaign in 2015 on Julian day 175a (24/06/2015) will be used. This data can be downloaded from NEODC (direct link).
    LIDAR_PATH=/neodc/arsf/2015/EUFAR15_28/EUFAR15_28-2015_175a_Mont_Blanc/LiDAR/flightlines/las1.2
    HS_PATH=/neodc/arsf/2015/EUFAR15_28/EUFAR15_28-2015_175a_Mont_Blanc/hyperspectral/flightlines/mapped
    
    python2.7 arsf_tools/colour_las_file.py \
              --image /vsizip/$HS_PATH/f175a183b_mapped_utm32n_bil.zip/f175a183b_mapped_utm32n.bil \
              --red 38 \
              --green 25 \
              --blue 12 \
              --scale \
             $LIDAR_PATH/LDR-EUFAR15_28-2015-175a-18.LAS \
             LDR-EUFAR15_28-2015-175a-18_col.LAS
    

    This will attribute using bands 38, 25 and 12 (true colour) and will scale the pixel values from 16 bit to 8 bit (0 – 255) using a standard deviation stretch.To read the hyperspectral data without unzipping a GDAL virtual file system is used, as described in a previous post.

  4. LASzip files and Download
    To reduce the size of the files before downloading you can use LASzip. On JASMIN LAStools can be loaded using:

    module load contrib/arsf/lastools/20150925
    

    To compress the LAS file and drop points flagged as noise the following command can be used:

    las2las -drop_class 7 -i LDR-EUFAR15_28-2015-175a-18_col.LAS -o LDR-EUFAR15_28-2015-175a-18_col.laz
    

    A similar command can be used to decompress files.

    See the following guide on transferring data off JASMIN: http://jasmin.ac.uk/how-to-use-jasmin/data-transfer/

  5. View using plas.io
  6. Open http://plas.io/ in a modern browser and click ‘Browse’ to load in the coloured LAS file.
    plasio_screenshot
    The nice thing about using plas.io and JASMIN is no specialist software needs to be installed on your local machine, just an ssh client and a web browser.

  7. View using Cloud Compare Viewer
  8. If you want to view files locally you can use Cloud Compare, which is available for Windows, OS X and Linux. To open the coloured LAS/LAZ file in the cloud compare viewer just drag it into the viewer window.

    cc_screenshot

  9. Additional – Colour using an existing three band image
    If you already have a three band image, with pixel values 0 – 255 a simpler command can be used:

    colour_las_file.py --image in_raster_file_rgb.tif \
                  in_las_file.las out_las_file.las
    

Convert EASE-2 grid cell to latitude and longitude using Python

The EASE-2 grid is used for a number of NASA datasets including SMAP. It is described in the following paper:

Brodzik, M. J., Billingsley, B., Haran, T., Raup, B., & Savoie, M. H. (2012). EASE-Grid 2.0: Incremental but Significant Improvements for Earth-Gridded Data Sets. ISPRS International Journal of Geo-Information, 1(3), 32–45. http://doi.org/10.3390/ijgi1010032

Files with the centre coordinate of each cell for EASE-2 grids at different resolutions are available from https://nsidc.org/data/ease/tools.html as well as tools for conversion.

To read these files into Python the following steps can be used:

  1. Download the relevant gridloc file.

    The FTP link for the grid location files is: ftp://sidads.colorado.edu/pub/tools/easegrid2/

    For this example I’ve chosen the 36km cylindrical EASE-2 grid (gridloc.EASE2_M36km.tgz)

  2. Un-tar using:
    tar -xvf gridloc.EASE2_M36km.tgz
    
  3. Read the files into Python:
    import numpy
    
    # Read binary files and reshape to correct size
    # The number of rows and columns are in the file name
    lats = numpy.fromfile('EASE2_M36km.lats.964x406x1.double', 
                          dtype=numpy.float64).reshape((406,964))
    lons = numpy.fromfile('EASE2_M36km.lons.964x406x1.double', 
                          dtype=numpy.float64).reshape((406,964))
    
    # Extract latitude and longitude
    # for a given row and column 
    grid_row = 46
    grid_column = 470
    
    lat_val = lats[grid_row, grid_column]
    lon_val = lons[grid_row, grid_column]
    

Calculations on large Raster Attribute Tables using ratapplier

The RIOS Library (http://rioshome.org) provides two methods of manipulating columns within a Raster Attribute Table (RAT):

  1. The ‘rat’ module can read an entire column to memory.
    from osgeo import gdal
    from rios import rat
    
    # Open RAT dataset
    rat_dataset = gdal.Open("clumps.kea", gdal.GA_Update)
    
    # Get columns with average red and NIR for each object
    red = rat.readColumn(rat_dataset, "RedAvg")
    nir = rat.readColumn(rat_dataset, "NIR1Avg")
    
    ndvi = (nir - red) / (nir + red)
    
    # Write out column
    rat.writeColumn(rat_dataset, "NDVI", ndvi)
    
    # Close RAT dataset
    rat_dataset = None
    
  2. The newer ‘ratapplier’ module is modelled after the ‘applier’ module for images and allows a function to be applied to chunks of rows, making it particularly useful for a RAT which is too large to load to memory.
    from rios import ratapplier
    
    def _ratapplier_calc_ndvi(info, inputs, outputs):
        """
        Calculate NDVI from RAT.
    
        Called by ratapplier
        """
        # Get columns with average red and NIR for each object
        # within block
        red = getattr(inputs.inrat, "RedAvg")
        nir = getattr(inputs.inrat, "NIR1Avg")
    
        # Calculate NDVI
        ndvi = (nir - red) / (nir + red)
    
        # Save to 'NDVI' column (will create if doesn't exist)
        setattr(outputs.outrat,"NDVI", ndvi)
    
    if __name__ == "__main__":
    
        # Set up rat applier for input / output
        in_rats = ratapplier.RatAssociations()
        out_rats = ratapplier.RatAssociations()
    
        # Pass in clumps file
        # Same file is used for input and output to write
        # to existing RAT
        in_rats.inrat = ratapplier.RatHandle("clumps.kea")
        out_rats.outrat = ratapplier.RatHandle("clumps.kea")
    
        # Apply function to all rows in chunks
        ratapplier.apply(_ratapplier_calc_ndvi, in_rats, out_rats)
    

Although using ratapplier looks slightly more complicated at first writing scripts to use it rather than the ‘rat’ interface means they will scale much better to larger datasets.

Both these examples are available to download from https://bitbucket.org/snippets/danclewley/7E666

Bash one-liner to untar downloaded Landsat data into separate directories

To make it easier to read I’ve split this into separate lines to post. You could remove the ‘;\’ and it create a multi-line bash script.

When you download Landsat data it comes as a .tar.gz archive (or .tar.bz if you download from Google). To uncompress the files into a separate folder for each scene the following series of bash commands can be used:

for scene in `ls *tar.gz | sed 's/.tar.gz//'`;\
   do echo "Uncompressing ${scene}";\
   mkdir ${scene};\
   mv ${scene}.tar.gz ${scene};\
   cd ${scene};\
   tar -xvf ${scene}.tar.gz;\
   cd -;\
done

This will list all the files matching ‘*.tar.gz’, remove the extension (using sed), print the scene name (echo), make a directory (mkdir), move the .tar.gz archive into it (mv), untar (tar -xvf) and then change back to the original directory (cd -).

You can also use the ‘arcsiextractdata.py’ tool, as detailed in an earlier post

Copy information between ENVI header files

The header files (.hdr) of ENVI files are text files which contain the size of the file (pixels and lines) geospatial information and other meta data such as wavelengths, data units etc.,

Not all programs are aware of all items in then header file and some may become lost during processing. ARSF-DAN have made a script available which can copy attributes from one header to another. The script can be downloaded from https://github.com/pmlrsg/arsf_tools and is run using:

copy_header_info.py f221013_mapped_osng.bil.hdr \
                      f221013_nadir_mapped_osng.hdr \
                      -k "map info" "projection info" "wavelength"

To show the available fields the following script (also from the arsf_tools repo) can be used:

get_info_from_header.py f221013_mapped_osng.bil.hdr