A script to find and download GEDI passes

The Global Ecosystem Dynamics Investigation (GEDI) is a spaceborne lidar instrument mounted on the International Space Station (ISS). The GEDI instrument is a geodetic-class lidar with 3 lasers that produce 8 parallel tracks. Each laser illuminates a 25 m footprint on the ground and fires at a rate of 242 times per second. Each footprint is separated by 60 m in the along track direction, at an across-track distance of 600 m between each of the 8 tracks. GEDI’s precise measurements of forest canopy height, canopy vertical structure, and surface elevation will be critical to characterizing and understanding carbon and water cycling processes to further our knowledge of the world we live in: http://www.gedi.umd.edu.

GEDI is mounted on the ISS and therefore its orbit is dictated by that of the ISS. This prevents repeat acquisitions in the same way that Landsat or Sentinel satellite data is collected. While this enables wider coverage, its data acquisition is not consistent. Further to this, GEDI has the ability to point its lasers so that the area on the ground imaged is not necessarily that at nadir beneath the sensor.

To date, GEDI data is available via two means:

Earthdata (https://search.earthdata.nasa.gov/search) provides the data but does not currently provide sufficient visualization so that users can see whether the data intersects their ROI.

Alternatively, the NASA LP DAAC provide the data in list/ftp format (https://e4ftl01.cr.usgs.gov/GEDI/). This allows easy access to all the data but requires that the user knows the orbit that they require. Since February 2020, this has been supplemented by a web tool called GEDI Finder (https://lpdaacsvc.cr.usgs.gov/services/gedifinder). Users pass in bounding box dimensions alongside the GEDI product and version they require and it returns a subset of the list of data that intersects their bounding box. While this is a simple method for accessing data, it requires that a user either manually downloads each one or passes each name to software (wget, curl) to pull the data.

SearchPullGEDI.py was written to make the best of these existing tools and enable the automation of the whole search and download process. It relies heavily on the GEDI Finder tool to search for the data that intersects a bounding box and is, in its simplest form, a wrapper for this tool. It also allows an optional date range to be specified if users are only interested in data collected during a specific time period. SearchPullGEDI.py requires a number of command line options that include GEDI product, version, bounding box, output path and Earthdata login credentials. An overview and example of SearchPullGEDI.py is:

python SearchPullGedi.py -p GEDI02_B -v 001 \
-bb 0.3714633 9.277247 -0.08164874 10.00922 \
-d 2019-01-01 2019-04-30 \
-o /Users/Me/Data/Gedi \
-u MyEarthDataUsername -pw MyEarthDataPassword

whereby:

-p is the GEDI product (e.g., GEDI02_B)

-v is the GEDI version (e.g., 001)

-bb is the bounding box given in UpperLeftLon (maxY), UpperLeftLat (minx), LowerRightLon (minY) and LowerRightLat (maxX). These should be passed to the terminal separated by spaces.

-d is the date range specified in the format YYYY-MM-DD with the start and end data separated with a space

-o is the local path where you want to download the data

-u is your EarthData login username (Note an EarthData account is required)

-pw is your EarthData login password

Running:

python SearchPullData.py -h

will also provide this information.

SearchPullGedi.py contains 3 functions.

  • The first constructs the command line options and passes it to the existing GEDI Finder tool and pulls a list of the GEDI files that are on the html webpage.
  • The second function parses the search results pulled from the GEDI Finder URL and constructs a list where each element is a separate GEDI H5 file.
  • The final function iterates over the list of H5 files and pulls each one using wget software.

SearchPullGEDI.py is available here: https://bitbucket.org/nathanmthomas/bucket-of-rs-and-gis-scripts/src/master/SearchPullGEDI.py

SearchPullGEDI.py requires wget is installed.  On Linux this can be installed through the package manager if not already installed, on macOS you can install this from conda-forge using:

conda create -n gedi -c conda-forge python wget

The command will print the number of files found then start downloading them. Each file is approximatly 1 GB.

Author: Nathan Thomas (@DrNASApants)

Nathan is a UMD Earth System Science Interdisciplinary Center (ESSIC) PostDoc positioned at the NASA Goddard Space Flight Center. Nathan’s research is focused primarily around land cover mapping and characterizing the above ground structure of vegetation, particularly mangrove forests. Through this he uses python to pull, preprocess, calibrate, analyze and display remote sensing info. Some of this code is distributed through his bitbucket: https://bitbucket.org/nathanmthomas/bucket-of-rs-and-gis-scripts/src/master/

Converting NEODAAS Mercator Projection netCDF files to GeoTiffs for use in QGIS / ArcMap

NetCDF files are a common format for distributing Earth Observation data and allow the ability to store a number of variables alongside metadata. However, using netCDF files in a GIS is not always as easy as it could be.

The NERC Earth Observation Data Acquisition and Analysis Service (NEODAAS) routinely produce products such as Chlorophyll from EO data and store as netCDF files. For the UK they use a Mercator projection within a netCDF file storing the latitude and longitude of each pixel within separate arrays. Unfortunately QGIS and ArcMap are often unable to read this information so don’t read data into the correct location making it difficult to use with other datasets.

To read data into the correct location I wrote a script which converts the latitude and longitude values in the netCDF file into tie points and then uses these to warp a GeoTiff into the correct location. It exports a single variable at a time.

To use for creating a GeoTiff from Chlorophyll data:

python reproject_neodaas_netcdf.py \
   sentinel3a_olci_all_products_L3 median_uk_7d_20180720_20180726.nc \
   sentinel3a_olci_all_products_L3-median_uk_7d_20180720_20180726_chl.tif \
   --variable CHL_OC4ME

The script is below. It requires the GDAL and netCDF Python libraries.

#!/usr/bin/env python
"""
Create a single layer tif in WGS84 projection from
NEODAAS (https://www.neodaas.ac.uk/) netCDF files in mercator projection.
Converts the location array to GCPs and uses these to warp the image with
GDAL.
Dan Clewley (dac@pml.ac.uk).
2018-11-12
"""
import argparse
import os
import shutil
import tempfile
import netCDF4
from osgeo import gdal
# The geolocation grid in the netCDF file contains a location for
# each pixel. This is often more than GDAL will handle as GCPs so set
# a resampling factor to reduce.
GEOLOCATION_GRID_RESAMPLING = 50
def create_single_var_tiff(input_netcdf, output_tif, variable):
"""
Export netCDF as a single variable GeoTiff
"""
# Create single variable tif from input netCDF
in_ds = gdal.Open('NETCDF:"{}":{}'.format(input_netcdf, variable), gdal.GA_ReadOnly)
gdal.Translate(output_tif, in_ds)
in_ds = None
if __name__ == "__main__":
parser = argparse.ArgumentParser(description="Reproject NetCDF")
parser.add_argument("innetcdf", nargs=1, type=str,
help="Input netCDF file")
parser.add_argument("outfile", nargs=1, type=str,
help="Output file (.tif)")
parser.add_argument("--variable", required=True,
help="Variable from netCDF to extract (e.g., CHL_OC4ME)")
args = parser.parse_args()
# Make temporary directory
temp_dir = tempfile.mkdtemp(prefix="netcdf_convert_")
temp_tif = os.path.join(temp_dir, os.path.basename(args.innetcdf[0]))
# Export a single layer as a geotiff
create_single_var_tiff(args.innetcdf[0], temp_tif, args.variable)
# Open new geotiff to add GCPs to
data = gdal.Open(temp_tif, gdal.GA_Update)
# Open input netCDF
data_nc = netCDF4.Dataset(args.innetcdf[0])
gcp_list = []
# Go through longitude and latitude arrays and convert to GCPs
for x in range(1, data_nc.variables['longitude'].size, GEOLOCATION_GRID_RESAMPLING):
for y in range(1, data_nc.variables['latitude'].size, GEOLOCATION_GRID_RESAMPLING):
gcp_list.append(gdal.GCP(float(data_nc.variables['longitude'][x]),
float(data_nc.variables['latitude'][y]), 0,
x, y))
# Add GCPs to tif. Set projection is None to ignore it
data.SetGCPs(gcp_list, None)
# Warp file based on GCPs
gdal.Warp(args.outfile[0], data, dstSRS='EPSG:4326', format='GTiff')
# Close files
data = None
data_nc = None
# Remove temporary directory
shutil.rmtree(temp_dir)

Note although the files should line up with other datasets you are warping the data which may introduce some errors. For best results, if you have a research project funded by NERC or are eligible for a NERC research grant or training award you can contact NEODAAS to discuss specific processing requirements.

Working with full waveform LiDAR data in SPDLib (part 2)

The first post of this series was written quite a while ago now. Apologies it has taken so long for a follow up. Since the first post has been written there have been two exciting developments:

  1. The methods described for generating full waveform metrics have been used to perform the LiDAR analysis for a paper led by Chloe Brown, University of Nottingham.Brown, C.; Boyd, D.S.; Sjögersten, S.; Clewley, D.; Evers, S.L.; Aplin, P. Tropical Peatland Vegetation Structure and Biomass: Optimal Exploitation of Airborne Laser Scanning. Remote Sens. 2018, 10, 671. https://doi.org/10.3390/rs10050671
  2. SPDLib is now available on Windows, macOS and Linux through conda-forge (as is RSGISLib). See part one for updated install instructions.

At the end of part one we had imported the LAS 1.3 file into SPDLib and decomposed the waveforms. This next section will cover ground classification and metrics generation.

  1. Spatially index data

    In part one we had been working with an SPD file without a spatial index (UPD file). However, for subsequent processing steps a spatial index is needed so a spatially indexed file is generated using the spdtranslate command.

    As the gridding can use a lot of RAM we are going to process in tiles and then stitch them together. We create a temporary directory to store the tiles using:

    mkdir spd_tmp
    

    Then run the translate command:

    spdtranslate --if SPD --of SPD \
                 -x LAST_RETURN \
                 -b 1 \
                 --temppath spd_tmp \
                 -i LDR-FW-RG13_06-2014-303-05_subset_decomp.spd \
                 -o LDR-FW-RG13_06-2014-303-05_subset_decomp_gridded.spd
    

    The temporary directory can be removed after processing has completed:

    rm -fr spd_tmp
    
  2. Classify ground returns and populate height

    Many metrics use the height about ground rather than absolute elevation so this must be defined. To derive heights from LiDAR data it is first necessary to determine the ground elevation so heights can be calculated above this. Within SPDLib the ground classification results are achieved using a combination of two classification algorithms: a Progressive Morphology Filter (PMF; [1]) followed by the Multi-Scale Curvature algorithm (MCC; [2]). Both these algorithms use only the discrete points rather than the waveform information.

    Apply a Progressive Morphology Filter using the following command:

    spdpmfgrd --grd 1 -i LDR-FW-RG13_06-2014-303-05_subset_decomp_gridded.spd \
              -o LDR-FW-RG13_06-2014-303-05_subset_decomp_gridded_pmf_grd.spd
    

    Then apply the Multi-Scale Curvature algorithm to the output file using:

    spdmccgrd --class 3 --initcurvetol 1 \
              -i LDR-FW-RG13_06-2014-303-05_subset_decomp_gridded_pmf_grd.spd \
              -o LDR-FW-RG13_06-2014-303-05_subset_decomp_gridded_pmf_mcc_grd.spd
    
  3. Attribute with height

    The final step of the SPD processing is to attribute each pulse with heights above ground level. An interpolation is used for ground points, similar to generating a Digital Terrain Model (DTM), but rather than using a regular grid the ground height is calculated for the position of each point.

    spddefheight --interp --in NATURAL_NEIGHBOR_CGAL \
                 -i LDR-FW-RG13_06-2014-303-05_subset_decomp_gridded_pmf_mcc_grd.spd \
                 -o LDR-FW-RG13_06-2014-303-05_subset_decomp_gridded_pmf_mcc_grd_defheight.spd
    
  4. Calculate metrics

    After all the pre-processing steps to convert the LAS 1.3 file into a gridded SPD format file with a defined height it is possible to generate a number of metrics from the waveform data. The command to calculate metrics within SPDLib (`spdmetrics`) takes an XML file in which the metrics are defined. There are a large number of metrics available and operators (addition, subtraction etc.,) allowing existing metrics to be combined to implement new metrics. The full list of metrics is available in the The full list of metrics is available in the SPDMetrics.xml file, distributed with the source of SPDLib. Most metrics have an option to specify the minimum number of returns (`minNumReturns`), setting this to 0 will use the waveform information to calculate the metric, setting to 1 (default) or above will use the discrete data. In this way full waveform and discrete metrics can be created at the same time.For this exercise we will be calculating Height of Medium Energy (HOME) and waveform distance (WD), a detailed description of these metrics is given in [3].

    First, create a file containing these metrics. Create a text file called ‘spd_metrics.xml’ and paste the text below into it:

    
    <!-- SPDLib Metrics file -->
    
    <spdlib:metrics
    xmlns:spdlib="http://www.spdlib.org/xml/">
    
    <!-- HOME -->
    
    <spdlib:metric metric="home" field="HOME"/>
    
    <!-- WD -->
    
    <spdlib:metric metric="maxheight" field="WD" minNumReturns="0"/>
    
    </spdlib:metrics>
    

    If this doesn’t display, try copying from https://gist.github.com/danclewley/4eefda2200e7593f1e5e2aaa6bae2c03

    To calculate the metrics and produce an image as an output run.

    spdmetrics --image -o LDR-FW-RG13_06-2014-303-05_subset_metrics.bsq \
               -f ENVI \
               -i LDR-FW-RG13_06-2014-303-05_subset_decomp_gridded_pmf_mcc_grd_defheight.spd \
               -m spd_metrics.xml
    

    Once the command has finished, open the metrics image using:

    tuiview LDR-FW-RG13_06-2014-303-05_subset_metrics.bsq
    

More metrics can be added to the ‘spd_metrics.xml’ file as needed, it is also possible to define new metrics using the operator tags.

This post was derived from the LiDAR practical given as part of the NERC-ARF workshop held at BAS, Cambridge in March 2018. If you have any questions about working with NERC-ARF data contact the NERC-ARF Data Analysis Node (NERC-ARF-DAN) see https://nerc-arf-dan.pml.ac.uk/ or follow us on twitter: @NERC_ARF_DAN.

[1] Keqi Zhang, Shu-Ching Chen, Whitman, D., Mei-Ling Shyu, Jianhua Yan, & Zhang, C. (2003). A progressive morphological filter for removing nonground measurements from airborne LIDAR data. IEEE Transactions on Geoscience and Remote Sensing, 41(4), 872–882. http://doi.org/10.1109/TGRS.2003.810682
[2]: Evans, J.S., Hudak, A.T., 2007. A multiscale curvature algorithm for classifying discrete return lidar in forested environments. IEEE Transactions on Geoscience and Remote Sensing 45 (4), 1029–1038.
[3] Cao, L., Coops, N., Hermosilla, T., Innes, J., Dai, J., & She, G. (2014). Using Small-Footprint Discrete and Full-Waveform Airborne LiDAR Metrics to Estimate Total Biomass and Biomass Components in Subtropical Forests. Remote Sensing, 6(8), 7110–7135. http://doi.org/10.3390/rs6087110

Find scenes from an archive which contain a point

This is a quick way of locating which scenes, from an archive of data, contain a point you are interested in.

First make a list of all available scenes

ls data/S1/2017/??/??/*/*vv*img > all_scenes.txt

Then use gdalbuildvrt to make a VRT file containing all scenes as a separate band (assumes scenes only have a single band).

gdalbuildvrt -input_file_list all_scenes.txt \
             -separate -o s1_all_scenes.vrt

Use gdallocation info with ‘-lifonly’ flag to the scene the point we are interested in (1.5 N, 103 E) is within and redirect to a text file.

gdallocationinfo -lifonly -wgs84 s1_all_scenes.vrt \
                 103 1.5 > selected_scenes.txt

This will show if within the bounding box of the scene. To find scenes which have data can use gdallocationinfo again but with the ‘-valonly’ flag and save the values to a text file.

gdallocationinfo -valonly -wgs84 s1_all_scenes.vrt \
                  103 1.5 > selected_scenes_values.txt

Can then subset the original list of files with the values file using Python:

import numpy
# Read in data to numpy arrays
scenes = numpy.genfromtxt("selected_scenes.txt", dtype=str)
values = numpy.genfromtxt("selected_scenes_values.txt")

# Select only scenes where the value is not 0
scenes_data = scenes[values != 0]

# Save to text file
numpy.savetxt("selected_scenes_data.txt", scenes_data, fmt="%s")

If you have a very large archive of data and need to find which scenes intersect a point often, having a spatial database with the scene outlines would be a better approach. However, if it isn’t something you do often this quick approach using only the GDAL utilities and a bit of Python is worth knowing.

Introduction to RSGISLib Training Course

A course introducing RSGISLib was recently given in Japan by Pete Bunting. Material from the course is available to download from the following links:

The course covers pre-processing of PALSAR SAR data followed by a demonstration of pixel-based and object-based classification. All the required data and scripts are included in the package.

If you have any questions on the course please use the RSGISLib support Google group: https://groups.google.com/forum/#!forum/rsgislib-support

Convert LaTeX to Word

A big problem with writing in LaTeX is collaborating with colleagues who don’t use it. One option is to generate a Word .docx version and use the comments and track changes features in Word / LibreOffice. This does require manually copying the changes back to LaTeX so isn’t quite as nice as using latexdiff (see earlier post) but is slightly easier than adding comments to a PDF.

The best program I’ve found for converting LaTeX to Word is the open source (GPL) command line tool, pandoc (http://pandoc.org/).

Basic usage is quite straightforward:

pandoc latex_document.tex -o latex_document_word_version.docx

The conversion isn’t perfect, figures and tables can get a bit mangled, but it does a good job with the text.

Pandoc can convert between many different formats, including from markdown and reStructuredText (commonly used for software documentation) so it is worth having installed.

Convert Sentinel-2 Data Using GDAL

The latest version of GDAL (2.1) has a driver to read Sentinel 2 data (see http://gdal.org/frmt_sentinel2.html). Like HDF files they are read as subdatasets. Running ‘gdalinfo’ on the zipped folder or the .xml file contained within the .SAFE directory will display all the subdatasets, as well as all the metadata (so quite a lot of information!).

As with HDF5 you can pass the subdataset names to gdalinfo to get more information or gdal_translate to extract them as a separate dataset.

To make it easier to extract all the subdatasets I wrote a script (extract_s2_data.py) which can be downloaded from the https://bitbucket.org/petebunting/rsgis_scripts repository.

The script will get a list of all subdatasets using GDAL:

from osgeo import gdal
dataset = gdal.Open('S2/S2.xml', gdal.GA_ReadOnly)
subdatasets = dataset.GetSubDatasets()
dataset = None

The ones to be extracted are for the 10, 20 and 60 m resolution band groups for each UTM zone (if the file crosses multiple zones).

For each subdataset it will give an output name, replacing the EPSG code with the UTM zone and ‘:’ with ‘_’.

Then the gdal_translate command is used to create a new file for each. By default the output is KEA format, called using subprocess.

To run the script first install GDAL 2.1, the conda-forge channel has recent builds, to install them using conda:

conda create -n gdal2 -c conda-forge gdal
source activate gdal2

(If you are on Windows leave out ‘source’)

To extract all subdatasets from a zipped Sentinel 2 scene to the current directory you can then use:

extract_s2_data.py -o . \
S2A_OPER_PRD_MSIL1C_PDMC_20151201T144038_R010_V20151130T142545_20151130T142545.zip

The gdal_translate command used is printed to the screen.

The default output format is KEA, you can change using the ‘–of’ flag. For example to convert an unzipped scene to GeoTiff:

extract_s2_data.py -o . --of GTiff \
S2A_OPER_PRD_MSIL1C_PDMC_20151201T144038_R010_V20151130T142545_20151130T142545.SAFE

To get the extension for all supported drivers, and some creation options the ‘get_gdal_drivers’ module from arsf_dem_scripts is optionally used. You can just download this file and copy into the same directory ‘extract_s2_data.py’ has been saved to. For Linux or OS X you can run:

# OS X
curl https://raw.githubusercontent.com/pmlrsg/arsf_dem_scripts/master/arsf_dem/get_gdal_drivers.py > get_gdal_drivers.py

# Linux
wget https://raw.githubusercontent.com/pmlrsg/arsf_dem_scripts/master/arsf_dem/get_gdal_drivers.py

Working with full waveform LiDAR data in SPDLib (part 1)

The SPD file format was designed around storing LiDAR pulses with digitised waveforms and associated points. The most recent version (3.3) has the ability to import waveform data from LAS files using LASlib, which is part of LAStools. Binaries are available for Linux, macOS and Windows, they can be installed through conda.

conda create -n spdlib -c conda-forge spdlib tuiview
. activate spdlib
conda update -c conda-forge --all

For this example LAS 1.3 files acquired by the NERC Airborne Research Facility (NERC-ARF, previously ARSF) over Borneo using a Leica ALS50-II instrument with full waveform digitiser will be used.

Once you have registered with CEDA and applied for access to the ARSF archive these data can be downloaded from: http://browse.ceda.ac.uk/browse/neodc/arsf/2014/RG13_06/RG13_06-2014_303_Maliau_Basin/LiDAR/flightlines/fw_laser/las1.3

You can also follow through with any of the other NERC-ARF datasets or other waveform LAS files you have.

  1. Convert LAS to SPD format
    First you need a WKT file to define the projection. This step is optional but is more reliable than reading from a LAS file. For the example the projection is UTM50N, you can download a WKT file using:

    wget https://bitbucket.org/petebunting/rsgis_scripts/raw/c8cf94528cdb58b753029df3bc631a2509740ad1/WKT/UTM_WGS84/UTM_WGS84_Z50_N.wkt
    

    Then convert to an unsorted SPD file (UPD).

    spdtranslate --if LAS --of UPD \
                 -x LAST_RETURN \
                 --input_proj UTM_WGS84_Z50_N.wkt \
                 -i LDR-FW-RG13_06-2014-303-05.LAS \
                 -o LDR-FW-RG13_06-2014-303-05.spd
    
  2. Subset SPD file (optional)

    As full waveform processing is quite intensive it is recommended to subset the data for the purpose of running though this tutorial, you can do this using the spdsubset command.

    spdsubset --xmin 494400 --ymin 524800 \
              --xmax 494800 --ymax 525000 \
              -i LDR-FW-RG13_06-2014-303-05.spd \
              -o LDR-FW-RG13_06-2014-303-05_subset.spd
    
  3. Decompose waveform

    One of the limitations of discrete systems is there is are only a given number of ‘points’ recorded (normally 2 – 4) and the rest of the information is lost. As full waveform data records the entire waveform it is possible to extract more returns after data are acquired. A common approach to this ‘Gaussian Decomposition’ which involves fitting Gaussian distributions to the peaks, within SPDLib this is available as the ‘spddecomp’ command.

    spddecomp --all --noise --threshold 25 \
              -i  LDR-FW-RG13_06-2014-303-05_subset.spd \
              -o  LDR-FW-RG13_06-2014-303-05_subset_decomp.spd
    

    This will still take around 5 minutes to run. If you decide to decompose the full dataset after, expect it to take an hour or so.

  4. Export returns to LAS file
    The final step for this part of the tutorial is to export the returns to a LAS file, using the spdtranslate command.

    spdtranslate --if SPD --of LAS \
                 -i LDR-FW-RG13_06-2014-303-05_subset_decomp.spd \
                 -o LDR-FW-RG13_06-2014-303-05_subset_decomp.las
    

Classifying ground returns and calculating LiDAR metrics using SPDLib are covered in part two

If you have further questions about using SPDLib please contact the mailing list (details available from https://lists.sourceforge.net/lists/listinfo/spdlib-develop). If you have any questions about working with NERC-ARF data contact the NERC-ARF Data Analysis Node (NERC-ARF-DAN) see https://nerc-arf-dan.pml.ac.uk/ or follow us on twitter: @NERC_ARF_DAN).

Create a CSV with the coordinates from geotagged photos

Recently I took a lot of photos on my phone during fieldwork (survey for the NERC-ARF LiDAR and hyperspectral calibration flights) and wanted to extract the GPS coordinates from each photo so I could load them into QGIS using the add delimited text dialogue. The aim wasn’t to have precise locations for each photo (GPS positions for each of the points surveyed were recorded separately) but to give a quick idea of the locations we’d visited before the main GPS data were processed.

I while ago (2012!) I wrote a script for this task. The script (CreateJPEGKMZ) requires pillow and the imagemagick command line tools.

It code isn’t particularly tidy (despite my recent updates) but it basically pulls out the GPS location from the EXIF tags and writes this to a CSV file. To create a KMZ a thumbnail image is created and a corresponding KML file. Both the thumbnail and KML are then zipped together to generate the KMZ which can be opened in GoogleEarth.

To use the script to write a CSV file:

CreateJPEGKMZ.py --outcsv gps_points.csv input_jpeg_files

To also create KMZ files for each photo:

CreateJPEGKMZ.py --outcsv gps_points.csv \
                 --outkmz output_kmz_files 
                 input_jpeg_files

This was also a good lesson in the benefits of having scripts in version control and publicly available.

Scalable image segmentation using RSGISLib

To for application to very large remote sensing datasets, an approach to “Scalable image segmentation” presented in [1] using RSGISLib. In the paper a 30 m spatial resolution satellite mosaic of Australia was segmented by splitting into tiles, processing each tile on a separate node of a HPC, merging and then performing a second segmentation to remove artefacts at tile boundaries.
For a full description of the approach see section 5.3 of the open access paper (available to view online at http://www.mdpi.com/2072-4292/6/7/6111/htm).

A version of this approach, designed to be used on a single machine, is now available in RSGISLib through the performTiledSegmentation function. The function is called in a similar way to the existing runShepherdSegmentation function.

For instructions on installing the latest version of RSGISLib see this post, skipping to section 7 if you already have Linux or OS X installed.

The code below presents an example of applying either the tiled or standard segmentation through the use of a command line flag:

#!/usr/bin/env python
"""
Apply segmentation to image
"""

from __future__ import print_function
import argparse
import shutil
import sys
import tempfile
from rsgislib.segmentation import tiledsegsingle
from rsgislib.segmentation import segutils

# Set values for clustering
NUM_CLUSTERS = 60
MIN_PIXELS = 100
DIST_THRESHOLD = 100

# Set values for tiles segmentation
TILE_WIDTH = 2000
TILE_HEIGHT = 2000

parser = argparse.ArgumentParser()
parser.add_argument("inputimage", nargs=1,
                    type=str, help="Input Image")
parser.add_argument("outputclumps", nargs=1,
                    type=str, help="Output clumps")
parser.add_argument("--tiled",
                    default=False,
                    action="store_true",
                    help="Use tiled segmentation")
args = parser.parse_args()

# Make temp directory for intermediate files
temp_dir = tempfile.mkdtemp(prefix='rsgislib_seg_')

if args.tiled:
    # If requested run tiled segmentation
     tiledsegsingle.performTiledSegmentation(args.inputimage[0],
                                             args.outputclumps[0],
                                             tmpDIR=temp_dir,
                                             tileWidth=TILE_WIDTH,
                                             tileHeight=TILE_HEIGHT,
                                             validDataThreshold=0.3,
                                             numClusters=NUM_CLUSTERS,
                                             minPxls=MIN_PIXELS,
                                             distThres=DIST_THRESHOLD,
                                             sampling=100, kmMaxIter=200)
else:
    # If not run standard
     segutils.runShepherdSegmentation(args.inputimage[0],
                                      args.outputclumps[0],
                                      tmpath=temp_dir,
                                      numClusters=NUM_CLUSTERS,
                                      minPxls=MIN_PIXELS,
                                      distThres=DIST_THRESHOLD,
                                      sampling=100, kmMaxIter=200)
shutil.rmtree(temp_dir)

Note, script edited for post – full version available from GitHub.

[1] Clewley, D.; Bunting, P.; Shepherd, J.; Gillingham, S.; Flood, N.; Dymond, J.; Lucas, R.; Armston, J.; Moghaddam, M. A Python-Based Open Source System for Geographic Object-Based Image Analysis (GEOBIA) Utilizing Raster Attribute Tables. Remote Sensing 2014, 6, 6111-6135. http://www.mdpi.com/2072-4292/6/7/6111