Author Archives: danclewley

About danclewley

Remote Sensing Analyst at Plymouth Marine Laboratory.

A script to find and download GEDI passes

The Global Ecosystem Dynamics Investigation (GEDI) is a spaceborne lidar instrument mounted on the International Space Station (ISS). The GEDI instrument is a geodetic-class lidar with 3 lasers that produce 8 parallel tracks. Each laser illuminates a 25 m footprint on the ground and fires at a rate of 242 times per second. Each footprint is separated by 60 m in the along track direction, at an across-track distance of 600 m between each of the 8 tracks. GEDI’s precise measurements of forest canopy height, canopy vertical structure, and surface elevation will be critical to characterizing and understanding carbon and water cycling processes to further our knowledge of the world we live in: http://www.gedi.umd.edu.

GEDI is mounted on the ISS and therefore its orbit is dictated by that of the ISS. This prevents repeat acquisitions in the same way that Landsat or Sentinel satellite data is collected. While this enables wider coverage, its data acquisition is not consistent. Further to this, GEDI has the ability to point its lasers so that the area on the ground imaged is not necessarily that at nadir beneath the sensor.

To date, GEDI data is available via two means:

Earthdata (https://search.earthdata.nasa.gov/search) provides the data but does not currently provide sufficient visualization so that users can see whether the data intersects their ROI.

Alternatively, the NASA LP DAAC provide the data in list/ftp format (https://e4ftl01.cr.usgs.gov/GEDI/). This allows easy access to all the data but requires that the user knows the orbit that they require. Since February 2020, this has been supplemented by a web tool called GEDI Finder (https://lpdaacsvc.cr.usgs.gov/services/gedifinder). Users pass in bounding box dimensions alongside the GEDI product and version they require and it returns a subset of the list of data that intersects their bounding box. While this is a simple method for accessing data, it requires that a user either manually downloads each one or passes each name to software (wget, curl) to pull the data.

SearchPullGEDI.py was written to make the best of these existing tools and enable the automation of the whole search and download process. It relies heavily on the GEDI Finder tool to search for the data that intersects a bounding box and is, in its simplest form, a wrapper for this tool. It also allows an optional date range to be specified if users are only interested in data collected during a specific time period. SearchPullGEDI.py requires a number of command line options that include GEDI product, version, bounding box, output path and Earthdata login credentials. An overview and example of SearchPullGEDI.py is:

python SearchPullGedi.py -p GEDI02_B -v 001 \
-bb 0.3714633 9.277247 -0.08164874 10.00922 \
-d 2019-01-01 2019-04-30 \
-o /Users/Me/Data/Gedi \
-u MyEarthDataUsername -pw MyEarthDataPassword

whereby:

-p is the GEDI product (e.g., GEDI02_B)

-v is the GEDI version (e.g., 001)

-bb is the bounding box given in UpperLeftLon (maxY), UpperLeftLat (minx), LowerRightLon (minY) and LowerRightLat (maxX). These should be passed to the terminal separated by spaces.

-d is the date range specified in the format YYYY-MM-DD with the start and end data separated with a space

-o is the local path where you want to download the data

-u is your EarthData login username (Note an EarthData account is required)

-pw is your EarthData login password

Running:

python SearchPullData.py -h

will also provide this information.

SearchPullGedi.py contains 3 functions.

  • The first constructs the command line options and passes it to the existing GEDI Finder tool and pulls a list of the GEDI files that are on the html webpage.
  • The second function parses the search results pulled from the GEDI Finder URL and constructs a list where each element is a separate GEDI H5 file.
  • The final function iterates over the list of H5 files and pulls each one using wget software.

SearchPullGEDI.py is available here: https://bitbucket.org/nathanmthomas/bucket-of-rs-and-gis-scripts/src/master/SearchPullGEDI.py

SearchPullGEDI.py requires wget is installed.  On Linux this can be installed through the package manager if not already installed, on macOS you can install this from conda-forge using:

conda create -n gedi -c conda-forge python wget

The command will print the number of files found then start downloading them. Each file is approximatly 1 GB.

Author: Nathan Thomas (@DrNASApants)

Nathan is a UMD Earth System Science Interdisciplinary Center (ESSIC) PostDoc positioned at the NASA Goddard Space Flight Center. Nathan’s research is focused primarily around land cover mapping and characterizing the above ground structure of vegetation, particularly mangrove forests. Through this he uses python to pull, preprocess, calibrate, analyze and display remote sensing info. Some of this code is distributed through his bitbucket: https://bitbucket.org/nathanmthomas/bucket-of-rs-and-gis-scripts/src/master/

Converting NEODAAS Mercator Projection netCDF files to GeoTiffs for use in QGIS / ArcMap

NetCDF files are a common format for distributing Earth Observation data and allow the ability to store a number of variables alongside metadata. However, using netCDF files in a GIS is not always as easy as it could be.

The NERC Earth Observation Data Acquisition and Analysis Service (NEODAAS) routinely produce products such as Chlorophyll from EO data and store as netCDF files. For the UK they use a Mercator projection within a netCDF file storing the latitude and longitude of each pixel within separate arrays. Unfortunately QGIS and ArcMap are often unable to read this information so don’t read data into the correct location making it difficult to use with other datasets.

To read data into the correct location I wrote a script which converts the latitude and longitude values in the netCDF file into tie points and then uses these to warp a GeoTiff into the correct location. It exports a single variable at a time.

To use for creating a GeoTiff from Chlorophyll data:

python reproject_neodaas_netcdf.py \
   sentinel3a_olci_all_products_L3 median_uk_7d_20180720_20180726.nc \
   sentinel3a_olci_all_products_L3-median_uk_7d_20180720_20180726_chl.tif \
   --variable CHL_OC4ME

The script is below. It requires the GDAL and netCDF Python libraries.

Note although the files should line up with other datasets you are warping the data which may introduce some errors. For best results, if you have a research project funded by NERC or are eligible for a NERC research grant or training award you can contact NEODAAS to discuss specific processing requirements.

Working with full waveform LiDAR data in SPDLib (part 2)

The first post of this series was written quite a while ago now. Apologies it has taken so long for a follow up. Since the first post has been written there have been two exciting developments:

  1. The methods described for generating full waveform metrics have been used to perform the LiDAR analysis for a paper led by Chloe Brown, University of Nottingham.Brown, C.; Boyd, D.S.; Sjögersten, S.; Clewley, D.; Evers, S.L.; Aplin, P. Tropical Peatland Vegetation Structure and Biomass: Optimal Exploitation of Airborne Laser Scanning. Remote Sens. 2018, 10, 671. https://doi.org/10.3390/rs10050671
  2. SPDLib is now available on Windows, macOS and Linux through conda-forge (as is RSGISLib). See part one for updated install instructions.

At the end of part one we had imported the LAS 1.3 file into SPDLib and decomposed the waveforms. This next section will cover ground classification and metrics generation.

  1. Spatially index data

    In part one we had been working with an SPD file without a spatial index (UPD file). However, for subsequent processing steps a spatial index is needed so a spatially indexed file is generated using the spdtranslate command.

    As the gridding can use a lot of RAM we are going to process in tiles and then stitch them together. We create a temporary directory to store the tiles using:

    mkdir spd_tmp
    

    Then run the translate command:

    spdtranslate --if SPD --of SPD \
                 -x LAST_RETURN \
                 -b 1 \
                 --temppath spd_tmp \
                 -i LDR-FW-RG13_06-2014-303-05_subset_decomp.spd \
                 -o LDR-FW-RG13_06-2014-303-05_subset_decomp_gridded.spd
    

    The temporary directory can be removed after processing has completed:

    rm -fr spd_tmp
    
  2. Classify ground returns and populate height

    Many metrics use the height about ground rather than absolute elevation so this must be defined. To derive heights from LiDAR data it is first necessary to determine the ground elevation so heights can be calculated above this. Within SPDLib the ground classification results are achieved using a combination of two classification algorithms: a Progressive Morphology Filter (PMF; [1]) followed by the Multi-Scale Curvature algorithm (MCC; [2]). Both these algorithms use only the discrete points rather than the waveform information.

    Apply a Progressive Morphology Filter using the following command:

    spdpmfgrd --grd 1 -i LDR-FW-RG13_06-2014-303-05_subset_decomp_gridded.spd \
              -o LDR-FW-RG13_06-2014-303-05_subset_decomp_gridded_pmf_grd.spd
    

    Then apply the Multi-Scale Curvature algorithm to the output file using:

    spdmccgrd --class 3 --initcurvetol 1 \
              -i LDR-FW-RG13_06-2014-303-05_subset_decomp_gridded_pmf_grd.spd \
              -o LDR-FW-RG13_06-2014-303-05_subset_decomp_gridded_pmf_mcc_grd.spd
    
  3. Attribute with height

    The final step of the SPD processing is to attribute each pulse with heights above ground level. An interpolation is used for ground points, similar to generating a Digital Terrain Model (DTM), but rather than using a regular grid the ground height is calculated for the position of each point.

    spddefheight --interp --in NATURAL_NEIGHBOR_CGAL \
                 -i LDR-FW-RG13_06-2014-303-05_subset_decomp_gridded_pmf_mcc_grd.spd \
                 -o LDR-FW-RG13_06-2014-303-05_subset_decomp_gridded_pmf_mcc_grd_defheight.spd
    
  4. Calculate metrics

    After all the pre-processing steps to convert the LAS 1.3 file into a gridded SPD format file with a defined height it is possible to generate a number of metrics from the waveform data. The command to calculate metrics within SPDLib (`spdmetrics`) takes an XML file in which the metrics are defined. There are a large number of metrics available and operators (addition, subtraction etc.,) allowing existing metrics to be combined to implement new metrics. The full list of metrics is available in the The full list of metrics is available in the SPDMetrics.xml file, distributed with the source of SPDLib. Most metrics have an option to specify the minimum number of returns (`minNumReturns`), setting this to 0 will use the waveform information to calculate the metric, setting to 1 (default) or above will use the discrete data. In this way full waveform and discrete metrics can be created at the same time.For this exercise we will be calculating Height of Medium Energy (HOME) and waveform distance (WD), a detailed description of these metrics is given in [3].

    First, create a file containing these metrics. Create a text file called ‘spd_metrics.xml’ and paste the text below into it:

    
    <!-- SPDLib Metrics file -->
    
    <spdlib:metrics
    xmlns:spdlib="http://www.spdlib.org/xml/">
    
    <!-- HOME -->
    
    <spdlib:metric metric="home" field="HOME"/>
    
    <!-- WD -->
    
    <spdlib:metric metric="maxheight" field="WD" minNumReturns="0"/>
    
    </spdlib:metrics>
    

    If this doesn’t display, try copying from https://gist.github.com/danclewley/4eefda2200e7593f1e5e2aaa6bae2c03

    To calculate the metrics and produce an image as an output run.

    spdmetrics --image -o LDR-FW-RG13_06-2014-303-05_subset_metrics.bsq \
               -f ENVI \
               -i LDR-FW-RG13_06-2014-303-05_subset_decomp_gridded_pmf_mcc_grd_defheight.spd \
               -m spd_metrics.xml
    

    Once the command has finished, open the metrics image using:

    tuiview LDR-FW-RG13_06-2014-303-05_subset_metrics.bsq
    

More metrics can be added to the ‘spd_metrics.xml’ file as needed, it is also possible to define new metrics using the operator tags.

This post was derived from the LiDAR practical given as part of the NERC-ARF workshop held at BAS, Cambridge in March 2018. If you have any questions about working with NERC-ARF data contact the NERC-ARF Data Analysis Node (NERC-ARF-DAN) see https://nerc-arf-dan.pml.ac.uk/ or follow us on twitter: @NERC_ARF_DAN.

[1] Keqi Zhang, Shu-Ching Chen, Whitman, D., Mei-Ling Shyu, Jianhua Yan, & Zhang, C. (2003). A progressive morphological filter for removing nonground measurements from airborne LIDAR data. IEEE Transactions on Geoscience and Remote Sensing, 41(4), 872–882. http://doi.org/10.1109/TGRS.2003.810682
[2]: Evans, J.S., Hudak, A.T., 2007. A multiscale curvature algorithm for classifying discrete return lidar in forested environments. IEEE Transactions on Geoscience and Remote Sensing 45 (4), 1029–1038.
[3] Cao, L., Coops, N., Hermosilla, T., Innes, J., Dai, J., & She, G. (2014). Using Small-Footprint Discrete and Full-Waveform Airborne LiDAR Metrics to Estimate Total Biomass and Biomass Components in Subtropical Forests. Remote Sensing, 6(8), 7110–7135. http://doi.org/10.3390/rs6087110

Find scenes from an archive which contain a point

This is a quick way of locating which scenes, from an archive of data, contain a point you are interested in.

First make a list of all available scenes

ls data/S1/2017/??/??/*/*vv*img > all_scenes.txt

Then use gdalbuildvrt to make a VRT file containing all scenes as a separate band (assumes scenes only have a single band).

gdalbuildvrt -input_file_list all_scenes.txt \
             -separate -o s1_all_scenes.vrt

Use gdallocation info with ‘-lifonly’ flag to the scene the point we are interested in (1.5 N, 103 E) is within and redirect to a text file.

gdallocationinfo -lifonly -wgs84 s1_all_scenes.vrt \
                 103 1.5 > selected_scenes.txt

This will show if within the bounding box of the scene. To find scenes which have data can use gdallocationinfo again but with the ‘-valonly’ flag and save the values to a text file.

gdallocationinfo -valonly -wgs84 s1_all_scenes.vrt \
                  103 1.5 > selected_scenes_values.txt

Can then subset the original list of files with the values file using Python:

import numpy
# Read in data to numpy arrays
scenes = numpy.genfromtxt("selected_scenes.txt", dtype=str)
values = numpy.genfromtxt("selected_scenes_values.txt")

# Select only scenes where the value is not 0
scenes_data = scenes[values != 0]

# Save to text file
numpy.savetxt("selected_scenes_data.txt", scenes_data, fmt="%s")

If you have a very large archive of data and need to find which scenes intersect a point often, having a spatial database with the scene outlines would be a better approach. However, if it isn’t something you do often this quick approach using only the GDAL utilities and a bit of Python is worth knowing.

Introduction to RSGISLib Training Course

A course introducing RSGISLib was recently given in Japan by Pete Bunting. Material from the course is available to download from the following links:

The course covers pre-processing of PALSAR SAR data followed by a demonstration of pixel-based and object-based classification. All the required data and scripts are included in the package.

If you have any questions on the course please use the RSGISLib support Google group: https://groups.google.com/forum/#!forum/rsgislib-support

Convert LaTeX to Word

A big problem with writing in LaTeX is collaborating with colleagues who don’t use it. One option is to generate a Word .docx version and use the comments and track changes features in Word / LibreOffice. This does require manually copying the changes back to LaTeX so isn’t quite as nice as using latexdiff (see earlier post) but is slightly easier than adding comments to a PDF.

The best program I’ve found for converting LaTeX to Word is the open source (GPL) command line tool, pandoc (http://pandoc.org/).

Basic usage is quite straightforward:

pandoc latex_document.tex -o latex_document_word_version.docx

The conversion isn’t perfect, figures and tables can get a bit mangled, but it does a good job with the text.

Pandoc can convert between many different formats, including from markdown and reStructuredText (commonly used for software documentation) so it is worth having installed.

Convert Sentinel-2 Data Using GDAL

The latest version of GDAL (2.1) has a driver to read Sentinel 2 data (see http://gdal.org/frmt_sentinel2.html). Like HDF files they are read as subdatasets. Running ‘gdalinfo’ on the zipped folder or the .xml file contained within the .SAFE directory will display all the subdatasets, as well as all the metadata (so quite a lot of information!).

As with HDF5 you can pass the subdataset names to gdalinfo to get more information or gdal_translate to extract them as a separate dataset.

To make it easier to extract all the subdatasets I wrote a script (extract_s2_data.py) which can be downloaded from the https://bitbucket.org/petebunting/rsgis_scripts repository.

The script will get a list of all subdatasets using GDAL:

from osgeo import gdal
dataset = gdal.Open('S2/S2.xml', gdal.GA_ReadOnly)
subdatasets = dataset.GetSubDatasets()
dataset = None

The ones to be extracted are for the 10, 20 and 60 m resolution band groups for each UTM zone (if the file crosses multiple zones).

For each subdataset it will give an output name, replacing the EPSG code with the UTM zone and ‘:’ with ‘_’.

Then the gdal_translate command is used to create a new file for each. By default the output is KEA format, called using subprocess.

To run the script first install GDAL 2.1, the conda-forge channel has recent builds, to install them using conda:

conda create -n gdal2 -c conda-forge gdal
source activate gdal2

(If you are on Windows leave out ‘source’)

To extract all subdatasets from a zipped Sentinel 2 scene to the current directory you can then use:

extract_s2_data.py -o . \
S2A_OPER_PRD_MSIL1C_PDMC_20151201T144038_R010_V20151130T142545_20151130T142545.zip

The gdal_translate command used is printed to the screen.

The default output format is KEA, you can change using the ‘–of’ flag. For example to convert an unzipped scene to GeoTiff:

extract_s2_data.py -o . --of GTiff \
S2A_OPER_PRD_MSIL1C_PDMC_20151201T144038_R010_V20151130T142545_20151130T142545.SAFE

To get the extension for all supported drivers, and some creation options the ‘get_gdal_drivers’ module from arsf_dem_scripts is optionally used. You can just download this file and copy into the same directory ‘extract_s2_data.py’ has been saved to. For Linux or OS X you can run:

# OS X
curl https://raw.githubusercontent.com/pmlrsg/arsf_dem_scripts/master/arsf_dem/get_gdal_drivers.py > get_gdal_drivers.py

# Linux
wget https://raw.githubusercontent.com/pmlrsg/arsf_dem_scripts/master/arsf_dem/get_gdal_drivers.py