Working with hyperspectral data in ENVI BIL format using Python

For working with ENVI files I normally use GDAL as code can then be applied to different formats. However, there are a couple of limitations with GDAL when working with hyperspectral data in ENVI format:

  1. GDAL doesn’t copy every item from the header file to a new header file if they don’t fit in with the GDAL data model. Examples are FWHM and comments. Sometimes extra attributes are copied to the aux.xml file GDAL creates, these files aren’t read by ENVI or other programs based on IDL (e.g., ATCOR).
  2. For data stored Band Interleaved by Line (BIL) rather than Band Sequential (BSQ) reading and writing a band at a time is inefficient as it is necessary to keep jumping around the file

To overcome these issues NERC-ARF-DAN use their own Python functions for reading / writing header files and loading BIL files a line at a time. These functions have been tidied up and released through the NERC-ARF Tools repository on GitHub (https://github.com/pmlrsg/arsf_tools). The functions depend on NumPy.

To install them it is recommended to use the following steps:

  1. Download miniconda from http://conda.pydata.org/miniconda.html#miniconda and follow the instructions to install.
  2. Open a ‘Command Prompt’ / ‘Terminal’ window and install numpy by typing:
    conda install numpy
    
  3. Download ‘arsf_tools’ from GitHub (https://github.com/pmlrsg/arsf_tools/archive/master.zip)
  4. Unzip and within a ‘Command Prompt’ or ‘Terminal’ window navigate to the the folder using (for example):
    cd Downloads\arsf_tools-master
    
  5. Install the tools and library by typing:
    python setup.py install
    

Note, if you are using Linux you can install the arsf_binary reader from https://github.com/arsf/arsf_binaryreader which is written in C++. The ‘arsf_envi_reader’ module will import this if available as it is faster than the standard NumPy BIL reader.

If you are a UK based researcher with access to the JASMIN system the library is already installed and can be loaded using:

module load contrib/arsf/arsf_tools

An simple example of reading each line of a file, adding 1 to every band and writing back out again is:

from arsf_envi_reader import numpy_bin_reader
from arsf_envi_reader import envi_header

# Open file for output
out_file = open("out_file.bil", "w")

# Open input file
in_data = numpy_bin_reader.BilReader("in_file.bil")

for line in in_data:
out_line = line + 1
out_line.tofile(out_file)

# Copy header
envi_header.write_envi_header("out_file.bil.hdr",
                              in_data.get_hdr_dict())

# Close files
out_file.close()
in_data = None

A more advanced example is applying a Savitzky-Golay filter to each pixel. As the filter requires every band for each pixel it is efficient to work with BIL files.

For developing algorithms using spatial data, in particular multiple files it is recommended to convert the files to another band-sequential format using the ‘gdal_translate’ so they can be read using programs which use GDAL, for example RIOS or RSGISLib.

To convert files from ENVI BIL to ENVI BSQ and copy all header attributes the ‘envi_header’ module can be used after converting the interleave with GDAL. For example:

import os
import subprocess
from arsf_envi_reader import envi_header

# Set input and output image (could get with argparse)
input_image = "input.bil"
output_image = "output_bsq.bsq"

# Get interleave from file extension
output_interleave = os.path.splitext(output_image)[-1].lstrip(".")

# 1. Convert interleave
print("Converting interleave to {}".format(output_interleave.upper()))
gdal_translate_cmd = ["gdal_translate",
                      "-of", "ENVI",
                      "-co", "INTERLEAVE={}".format(output_interleave.upper())]
gdal_translate_cmd.extend([input_image, output_image])
subprocess.call(gdal_translate_cmd)

# 2. Copy header (GDAL doesn't copy all items)
# Find header files
input_header = envi_header.find_hdr_file(input_image)
output_header = envi_header.find_hdr_file(output_image)

# Read input header to dictionary
input_header_dict = envi_header.read_hdr_file(input_header)

# Change interleave
input_header_dict["interleave"] = output_interleave.upper()

# Write header (replace one generated by GDAL)
envi_header.write_envi_header(output_header, input_header_dict)

Useful GDAL commands for working with hyperspectral data

The Geospatial Data Abstraction Library (GDAL) contains a number of utilities for working with remote sensing data. The following are some commands which are particularly useful when working with airborne hyperspectral data.

Create subset and change format

To create a band and spatial subset of a larger hyperspectral dataset the following command can be used:

gdal_translate -of GTiff -co COMPRESS=LZW \
 -projwin 481500 766000 482500 765000 \
 -b 190 -b 40 -b 15  \
 outputs/f281013b_utm_wgs84N50_mapped_subsection.bil \
 outputs/f281013b_utm_wgs84N50_mapped_subsection_3band.tif
  • -of: Output Format – we are using GeoTiff file. For all available options see http://www.gdal.org/formats_list.html
  • -co: Creation Option – specifies the creation options, these are specific to the format. For GeoTiff we use compression to reduce the file size. For ENVI we can use the creation options to specify the Interleave (BSQ or BIL)
  • -projwin: A rectangle delineating an area to be extracted, in the format min X min Y max X max Y
  • -b: A given band to extract
  • final two arguments: input and output filenames

Create mosaic of multiple flightlines

Often a mosaic of all individual flightlines is required. Due to the large sizes of the mapped files this can often be time consuming. One option is to first make a ‘Virtial Raster’ using the ‘gdalbuildvrt’ command:

gdalbuildvrt -srcnodata 0 outputs/f2810_mapped_mosaic.vrt \
 outputs/f281013b_utm_wgs84N50_mapped_3band.bil \
 outputs/f281023b_utm_wgs84N50_mapped_3band.bil
  • -srcnodata: No data value of input lines
  • first argument: output VRT file
  • final arguments: list of input files

Rather than creating a new file a small .vrt text file is created which contains links to the individual lines.

Many programs which use GDAL (e.g., TuiView) can open this file directly. To make opening the file easier we can do some optimisation. First pre-calculate statistics using:

gdalinfo -stats outputs/f2810_mapped_mosaic.vrt
  • -stats: Calculate statistics and save to .aux.xml file
  • final argument: file name

The gdalinfo command can also be used to print information about a raster. Note for the .vrt file it shows the individual files which are included.

After creating statistics generate overviews (lower resolution copies at different resolutions) using gdaladdo:

gdaladdo -r average outputs/f2810_mapped_mosaic.vrt 2 4 8 16
  • -r average: Use average for resampling overviews
  • first argument: file name
  • numbers: overview levels to build (2x, 4x, 8x and 16x)

We can also make a real raster using gdal_translate (as shown previously).

Extract pixel values for a given location

gdallocationinfo -valonly -geoloc \
 outputs/f281013b_utm_wgs84N50_mapped_subsection.bil \
 481801 765289
  • -valonly: Only print values to the screen
  • -geoloc: Interprerate x and y values as coordinates rather than pixel and line
  • first argument: file name
  • x y: Easting and Northing of pixel to extract statistics from

This will print the values to the screen. To save as a text file we can redirect the output using ‘>’

gdallocationinfo -valonly -geoloc \
 outputs/f281013b_utm_wgs84N50_mapped_subsection.bil \
 481801 765289 > outputs/extracted_values.txt

This text file can be opened in other programs for further analysis.

This post is based on the hyperspectral practical session from the NERC Airborne Survey and Research Facility workshop held at Plymouth Marine Laboratory, March 2016. For more information about the ARSF Data Analysis Node (ARSF-DAN) and details about future training opportunities follow @ARSF_DAN on twitter. Hyperspectral data acquired by ARSF are available to download from NEODC.

Add all scripts within a repository to $PATH using envmaster

I have a couple of general scripts repositories for myself and shared with colleagues. These are for scripts which are useful but don’t fit into existing projects or justify having their own repository. An example is rsgis_scripts on bitbucket. The scripts are split into different directories which aren’t available on the main path. To make them available when needed I use EnvMaster (described in a previous post). I have an envmaster module which will search the repository for folders containing executables and add them to $PATH. It will also add folders containing ‘__init__.py’ to $PYTHONPATH.

#%EnvMaster1.0
import os
import glob
 
REPO_PATH = "/home/dan/Documents/Development/rsgis_scripts"
 
# Walk through directory
for d_name, sd_name, f_list in os.walk(REPO_PATH):
    # Ignore hidden directories
    if not d_name.startswith(".") and not ".git" in d_name and not ".hg" in d_name:
        file_list = glob.glob(os.path.join(d_name,"*"))
        # Check if a directory contains executable files
        if True in [(os.path.isfile(f) and os.access(f, os.X_OK)) for f in file_list]:
            module.setBin(d_name)
        # Check for Python libraries
        if len(glob.glob(os.path.join(d_name,"*","__init__.py"))) > 0:
            module.setPython(d_name)

The script is saved as ‘rsgis_scripts’ in ‘$ENVMASTERPATH’.
To load the module and prepend all the folders to ‘$PATH’ use:

envmaster load rsgis_scripts

They will remain on $PATH until you close the terminal or unload the module using:

envmaster unload rsgis_scripts

This script is just an example, it would be easy to modify for different use cases. If you have multiple repositories a module file can be created for each one.

Attribute LAS points using hyperspectral data

A good way of visualising LiDAR point clouds is to attribute them with colour information.
LAS files of point type 2 and above have the ability to store RGB colour information. These can be viewed with programs such as plas.io and CloudCompare.

To add RGB colour information to a LAS file ARSF-DAN have made a Python script available from: https://github.com/pmlrsg/arsf_tools

The script uses a combination of laspy (to read and write a LAS file) and GDAL (to read the raster) and is capable of working with images with more than 3 bands such as multispectral or hyperspectral files without creating a three band copy of the image first.

For this post I’ll demonstrate running the script on the JASMIN/CMEMS system. In a nutshell it is a Linux computing facility with lots of fast storage and direct access to many datasets, including the ARSF archive of airborne LiDAR and Hyperspectral data (apply for access to ARSF data here). Researchers in the UK who are part of the NERC or National Centre for Earth Observation (NCEO) scientific communities can apply for access. If you don’t have access to JASMIN you can run on your own machine just ignore the JASMIN specific steps.

  1. Apply for access and logon to JASMIN
    If you are using JASMIN the first stage is to get an account and log on by following the steps in this guide: http://jasmin.ac.uk/workflow/
  2. Set up scripts
    Once you have logged onto one of the shared VMs (e.g., cems-sci2.cems.rl.ac.uk) you need to load laspy, which isn’t installed by default. You can do this using:

    module load contrib/arsf/laspy/1.3.0
    

    Then checkout the repository containing the ‘colour_las_file.py’ script using:

    git clone https://github.com/pmlrsg/arsf_tools.git
    

    Check the script runs OK by typing:

    python2.7 arsf_tools/colour_las_file.py -h
    

    Note, you need to specify Python 2.7 as the default system Python is 2.6 but most of the Python libraries (e.g., GDAL) are build against 2.7. If you are not running the scripts on JASMIN just using ‘python’ should be fine.

  3. Colour using ARSF hyperspectral dataFor this example data from Mont Blanc flown as part of a EUFAR campaign in 2015 on Julian day 175a (24/06/2015) will be used. This data can be downloaded from NEODC (direct link).
    LIDAR_PATH=/neodc/arsf/2015/EUFAR15_28/EUFAR15_28-2015_175a_Mont_Blanc/LiDAR/flightlines/las1.2
    HS_PATH=/neodc/arsf/2015/EUFAR15_28/EUFAR15_28-2015_175a_Mont_Blanc/hyperspectral/flightlines/mapped
    
    python2.7 arsf_tools/colour_las_file.py \
              --image /vsizip/$HS_PATH/f175a183b_mapped_utm32n_bil.zip/f175a183b_mapped_utm32n.bil \
              --red 38 \
              --green 25 \
              --blue 12 \
              --scale \
             $LIDAR_PATH/LDR-EUFAR15_28-2015-175a-18.LAS \
             LDR-EUFAR15_28-2015-175a-18_col.LAS
    

    This will attribute using bands 38, 25 and 12 (true colour) and will scale the pixel values from 16 bit to 8 bit (0 – 255) using a standard deviation stretch.To read the hyperspectral data without unzipping a GDAL virtual file system is used, as described in a previous post.

  4. LASzip files and Download
    To reduce the size of the files before downloading you can use LASzip. On JASMIN LAStools can be loaded using:

    module load contrib/arsf/lastools/20150925
    

    To compress the LAS file and drop points flagged as noise the following command can be used:

    las2las -drop_class 7 -i LDR-EUFAR15_28-2015-175a-18_col.LAS -o LDR-EUFAR15_28-2015-175a-18_col.laz
    

    A similar command can be used to decompress files.

    See the following guide on transferring data off JASMIN: http://jasmin.ac.uk/how-to-use-jasmin/data-transfer/

  5. View using plas.io
  6. Open http://plas.io/ in a modern browser and click ‘Browse’ to load in the coloured LAS file.
    plasio_screenshot
    The nice thing about using plas.io and JASMIN is no specialist software needs to be installed on your local machine, just an ssh client and a web browser.

  7. View using Cloud Compare Viewer
  8. If you want to view files locally you can use Cloud Compare, which is available for Windows, OS X and Linux. To open the coloured LAS/LAZ file in the cloud compare viewer just drag it into the viewer window.

    cc_screenshot

  9. Additional – Colour using an existing three band image
    If you already have a three band image, with pixel values 0 – 255 a simpler command can be used:

    colour_las_file.py --image in_raster_file_rgb.tif \
                  in_las_file.las out_las_file.las
    

Convert EASE-2 grid cell to latitude and longitude using Python

The EASE-2 grid is used for a number of NASA datasets including SMAP. It is described in the following paper:

Brodzik, M. J., Billingsley, B., Haran, T., Raup, B., & Savoie, M. H. (2012). EASE-Grid 2.0: Incremental but Significant Improvements for Earth-Gridded Data Sets. ISPRS International Journal of Geo-Information, 1(3), 32–45. http://doi.org/10.3390/ijgi1010032

Files with the centre coordinate of each cell for EASE-2 grids at different resolutions are available from https://nsidc.org/data/ease/tools.html as well as tools for conversion.

To read these files into Python the following steps can be used:

  1. Download the relevant gridloc file.

    The FTP link for the grid location files is: ftp://sidads.colorado.edu/pub/tools/easegrid2/

    For this example I’ve chosen the 36km cylindrical EASE-2 grid (gridloc.EASE2_M36km.tgz)

  2. Un-tar using:
    tar -xvf gridloc.EASE2_M36km.tgz
    
  3. Read the files into Python:
    import numpy
    
    # Read binary files and reshape to correct size
    # The number of rows and columns are in the file name
    lats = numpy.fromfile('EASE2_M36km.lats.964x406x1.double', 
                          dtype=numpy.float64).reshape((406,964))
    lons = numpy.fromfile('EASE2_M36km.lons.964x406x1.double', 
                          dtype=numpy.float64).reshape((406,964))
    
    # Extract latitude and longitude
    # for a given row and column 
    grid_row = 46
    grid_column = 470
    
    lat_val = lats[grid_row, grid_column]
    lon_val = lons[grid_row, grid_column]
    

Calculations on large Raster Attribute Tables using ratapplier

The RIOS Library (http://rioshome.org) provides two methods of manipulating columns within a Raster Attribute Table (RAT):

  1. The ‘rat’ module can read an entire column to memory.
    from osgeo import gdal
    from rios import rat
    
    # Open RAT dataset
    rat_dataset = gdal.Open("clumps.kea", gdal.GA_Update)
    
    # Get columns with average red and NIR for each object
    red = rat.readColumn(rat_dataset, "RedAvg")
    nir = rat.readColumn(rat_dataset, "NIR1Avg")
    
    ndvi = (nir - red) / (nir + red)
    
    # Write out column
    rat.writeColumn(rat_dataset, "NDVI", ndvi)
    
    # Close RAT dataset
    rat_dataset = None
    
  2. The newer ‘ratapplier’ module is modelled after the ‘applier’ module for images and allows a function to be applied to chunks of rows, making it particularly useful for a RAT which is too large to load to memory.
    from rios import ratapplier
    
    def _ratapplier_calc_ndvi(info, inputs, outputs):
        """
        Calculate NDVI from RAT.
    
        Called by ratapplier
        """
        # Get columns with average red and NIR for each object
        # within block
        red = getattr(inputs.inrat, "RedAvg")
        nir = getattr(inputs.inrat, "NIR1Avg")
    
        # Calculate NDVI
        ndvi = (nir - red) / (nir + red)
    
        # Save to 'NDVI' column (will create if doesn't exist)
        setattr(outputs.outrat,"NDVI", ndvi)
    
    if __name__ == "__main__":
    
        # Set up rat applier for input / output
        in_rats = ratapplier.RatAssociations()
        out_rats = ratapplier.RatAssociations()
    
        # Pass in clumps file
        # Same file is used for input and output to write
        # to existing RAT
        in_rats.inrat = ratapplier.RatHandle("clumps.kea")
        out_rats.outrat = ratapplier.RatHandle("clumps.kea")
    
        # Apply function to all rows in chunks
        ratapplier.apply(_ratapplier_calc_ndvi, in_rats, out_rats)
    

Although using ratapplier looks slightly more complicated at first writing scripts to use it rather than the ‘rat’ interface means they will scale much better to larger datasets.

Both these examples are available to download from https://bitbucket.org/snippets/danclewley/7E666

Bash one-liner to untar downloaded Landsat data into separate directories

To make it easier to read I’ve split this into separate lines to post. You could remove the ‘;\’ and it create a multi-line bash script.

When you download Landsat data it comes as a .tar.gz archive (or .tar.bz if you download from Google). To uncompress the files into a separate folder for each scene the following series of bash commands can be used:

for scene in `ls *tar.gz | sed 's/.tar.gz//'`;\
   do echo "Uncompressing ${scene}";\
   mkdir ${scene};\
   mv ${scene}.tar.gz ${scene};\
   cd ${scene};\
   tar -xvf ${scene}.tar.gz;\
   cd -;\
done

This will list all the files matching ‘*.tar.gz’, remove the extension (using sed), print the scene name (echo), make a directory (mkdir), move the .tar.gz archive into it (mv), untar (tar -xvf) and then change back to the original directory (cd -).

You can also use the ‘arcsiextractdata.py’ tool, as detailed in an earlier post