Category Archives: RSGISLib

Scalable image segmentation using RSGISLib

To for application to very large remote sensing datasets, an approach to “Scalable image segmentation” presented in [1] using RSGISLib. In the paper a 30 m spatial resolution satellite mosaic of Australia was segmented by splitting into tiles, processing each tile on a separate node of a HPC, merging and then performing a second segmentation to remove artefacts at tile boundaries.
For a full description of the approach see section 5.3 of the open access paper (available to view online at http://www.mdpi.com/2072-4292/6/7/6111/htm).

A version of this approach, designed to be used on a single machine, is now available in RSGISLib through the performTiledSegmentation function. The function is called in a similar way to the existing runShepherdSegmentation function.

For instructions on installing the latest version of RSGISLib see this post, skipping to section 7 if you already have Linux or OS X installed.

The code below presents an example of applying either the tiled or standard segmentation through the use of a command line flag:

#!/usr/bin/env python
"""
Apply segmentation to image
"""

from __future__ import print_function
import argparse
import shutil
import sys
import tempfile
from rsgislib.segmentation import tiledsegsingle
from rsgislib.segmentation import segutils

# Set values for clustering
NUM_CLUSTERS = 60
MIN_PIXELS = 100
DIST_THRESHOLD = 100

# Set values for tiles segmentation
TILE_WIDTH = 2000
TILE_HEIGHT = 2000

parser = argparse.ArgumentParser()
parser.add_argument("inputimage", nargs=1,
                    type=str, help="Input Image")
parser.add_argument("outputclumps", nargs=1,
                    type=str, help="Output clumps")
parser.add_argument("--tiled",
                    default=False,
                    action="store_true",
                    help="Use tiled segmentation")
args = parser.parse_args()

# Make temp directory for intermediate files
temp_dir = tempfile.mkdtemp(prefix='rsgislib_seg_')

if args.tiled:
    # If requested run tiled segmentation
     tiledsegsingle.performTiledSegmentation(args.inputimage[0],
                                             args.outputclumps[0],
                                             tmpDIR=temp_dir,
                                             tileWidth=TILE_WIDTH,
                                             tileHeight=TILE_HEIGHT,
                                             validDataThreshold=0.3,
                                             numClusters=NUM_CLUSTERS,
                                             minPxls=MIN_PIXELS,
                                             distThres=DIST_THRESHOLD,
                                             sampling=100, kmMaxIter=200)
else:
    # If not run standard
     segutils.runShepherdSegmentation(args.inputimage[0],
                                      args.outputclumps[0],
                                      tmpath=temp_dir,
                                      numClusters=NUM_CLUSTERS,
                                      minPxls=MIN_PIXELS,
                                      distThres=DIST_THRESHOLD,
                                      sampling=100, kmMaxIter=200)
shutil.rmtree(temp_dir)

Note, script edited for post – full version available from GitHub.

[1] Clewley, D.; Bunting, P.; Shepherd, J.; Gillingham, S.; Flood, N.; Dymond, J.; Lucas, R.; Armston, J.; Moghaddam, M. A Python-Based Open Source System for Geographic Object-Based Image Analysis (GEOBIA) Utilizing Raster Attribute Tables. Remote Sensing 2014, 6, 6111-6135. http://www.mdpi.com/2072-4292/6/7/6111

Copy a Shapefile to a Raster Attribute Table

The object based image analysis features in RSGISLib are based around storing object attributes as a Raster Attribute Table (RAT), as described in the following paper:

Clewley, D.; Bunting, P.; Shepherd, J.; Gillingham, S.; Flood, N.; Dymond, J.; Lucas, R.; Armston, J.; Moghaddam, M. A Python-Based Open Source System for Geographic Object-Based Image Analysis (GEOBIA) Utilizing Raster Attribute Tables. Remote Sensing 2014, 6, 6111-6135. (open access)

However, data to be used as part of the analysis are often stored as attributes of a shapefile. A RAT can be created from a shapefile using the gdal_rasterize command and the copyShapefile2RAT function in RSGISLib using the following steps:

  1. Rasterise vector
    This can done directly from the command line or by calling gdal_rasterize from a Python script using subprocess.

    import subprocess 
    
    input_vector = 'brig_re09_binary.shp'
    rasterised_vector = 'brig_re09_binary_raster.kea'
        
    rasterise_cmd = ['gdal_rasterize','-of','KEA',
                                      '-a','BGL',
                                      '-tr','30','30',
                                      input_vector,
                                      rasterised_vector]
    print('Running: ', ' '.join(rasterise_cmd))
    subprocess.call(rasterise_cmd)
    
    

    The above code sets the output raster pixel size to 30 x 30 m and uses the ‘BGL’ column from the shapefile (which is an integer) for the pixel values.

    Note this image is only used to get the extent and pixel size of the final RAT image. If you have another image you wish to match the extent and pixel size to this can be used instead of rasterising the vector, in this case skip this step and use the existing image instead of ‘rasterised_vector’.

  2. Create RAT and copy shapefile attributes
    from rsgislib import vectorutils
    import rsgislib
     
    input_vector = 'brig_re09_binary.shp'
    rasterised_vector = 'brig_re09_binary_raster.kea'
    output_rat_image = 'brig_re09_attributes.kea'
        
    vectorutils.copyShapefile2RAT(input_vector, rasterised_vector,output_rat_image)
    
    

    This will copy all the attributes from the input shapefile to a RAT. Note the output format is always KEA as it has support for large attribute tables and stores them with compression.

To view the attributes open the file in TuiView. The output RAT can then be used as any other RAT, e.g., attributing objects with image statistics or using as part of a classification. For more details see other posts on object based image analysis.

Create Colour Composites for ALOS PALSAR Tiles

A common way to visualise dual-polarisation SAR data is to display a colour composite of HH (Red), HV (Green) and HH/HV(Blue).

Python Functions

Such composites can be created in RSGISLib using the following steps:

  1. Create the ratio image using the bandMath.
    import rsgislib
    from rsgislib import imagecalc
    
    bandDefns = [imagecalc.BandDefn('hh', 'in_hh_file.kea', 1),
                 imagecalc.BandDefn('hv', 'in_hv_file.kea', 1)]
    imagecalc.bandMath('out_hhhv_ratio.kea', 'hh/hv', 'KEA', \
                      rsgislib.TYPE_32FLOAT, bandDefns) 
    
  2. Create the stack using stackImageBands
    import rsgislib
    from rsgislib import imageutils
    
    bands_list = ['in_hh_file.kea', 'in_hv_file.kea', 'out_hhhv_ratio.kea']
    band_names = ['HH','HV', 'HH/HV']
    gdaltype = rsgislib.TYPE_32FLOAT
    imageutils.stackImageBands(bands_list, band_names, 'out_hhhvratio_stack.kea', None, 
                                        0, 'KEA', rsgislib.TYPE_32FLOAT) 
    
  3. Stretch using stretchImage
    import rsgislib
    from rsgislib import imageutils
    
    imageutils.stretchImage('out_hhhvratio_stack.kea, 'out_hhhvratio_stack_scale.kea', \
           False, '', True, False, 'GTiff', 
           rsgislib.TYPE_8INT, imageutils.STRETCH_LINEARSTDDEV, 2)
    

    This will create a tiff with values between 0 – 255 (8 bit), suitable for display in GIS packages or graphics programs such as Photoshop / GIMP.

Script for ALOS PALSAR tiles

For the 1×1 degree 25 m resolution ALOS PALSAR tiles, which can be freely downloaded from http://www.eorc.jaxa.jp/ALOS/en/palsar_fnf/fnf_index.htm for 2007 – 2010. A script (create_palsar_tiles_rgb.py) is available from RSGIS Scripts on Bitbucket. The script will also untar files, and can be run on downloaded data using:

python create_palsar_tiles_rgb.py —unzip downloaded_palsar_files

The script will untar each tile into a separate directory, create temp files for the ratio image and stack and save a tiff in with the original date files in the following structure:

N63W165_10_MOS
     |-----N63W165_10_MOS.tar.gz - Original tar.gz
     |-----N63W165_10_MOS_composite.tif - Composite generated
     |-----N63W165_10_date       - Acquisition date of each pixel
     |-----N63W165_10_date.hdr  
     |-----N63W165_10_linci      - Local incidence angle for each pixel
     |-----N63W165_10_linci.hdr 
     |-----N63W165_10_mask       - Data / no-data mask
     |-----N63W165_10_mask.hdr 
     |-----N63W165_10_sl_HH      - HH data (DN)
     |-----N63W165_10_s1_HH.hdr 
     |-----N63W165_10_sl_HV      - HV data (DN)
     |-----N63W165_10_s1_HH.hdr 

The resulting tiff will be something like the example below (Alaska; 63N165W)

N63W165_10_MOS_composite_resize
Data Copyright JAXA

To apply the same stretch to multiple files a text file can be used with the minimum and maximum values for each band. For example:

#stddev
#band,img_min,img_max,out_min,out_max
1,48.4992,9286,0,255
2,396,4823.89,0,255
3,0.0181456,0.449104,0,255

This stretch file is specified using:

python create_palsar_tiles_rgb.py --unzip --stretchtxt stats.txt downloaded_palsar_files

It is also possible to create a stack using the coefficient of variance as the third band using:

# Coefficient of variance HH
python create_palsar_tiles_rgb.py --unzip --CoVHH downloaded_palsar_files 
# Coefficient of variance HV
python create_palsar_tiles_rgb.py --unzip --CoVHV downloaded_palsar_files

Object-based classification using Random Forests

In our recent paper on an open source system for object based image classification [1] we mentioned linking with scikit-learn [2] to apply different classification algorithms. This post presents an example using Random Forests to give an idea of all the steps required. Random Forests is an ensamble learning algorithm which utilises many decision trees, each of which ‘vote’ for the final class. To reduce correlation between trees only a subset of predictors (data layers) and training samples are used for each class. Random forests is increasing in popularity within remote sensing, an example of usage is the pixel-based classification of Alaska by Whitcombe et al 2009 [3]

Taking [3] as an example this post demonstrates how the classification can be applied at the object level.

  1. Install software
  2. As detailed in previous posts the software required can be installed under Linux/OS X through conda using

    conda install -c osgeo \
      rsgislib rios tuiview scikit-learn
    
  3. Define training data
  4. Random Forests works best with a large number of training samples. For this example we’re using the National Wetlands Inventory (NWI) which are available as an ESRI Shapefile. For each class an integer code (WETCLASS) has been assigned and stored within the attribute table. The polygons are converted to a raster, where the pixel ID is the class using:

    gdal_rasterize -of KEA -ot Byte -a WETCLASS \
         -tr 100 100 nwi_shp.shp nwi_raster.kea
    
  5. Stack bands
  6. To stack the bands (assumed to have the same projection and resolution) the stack bands command, within RSGISLib, is used.

    #/usr/bin/env python
    import rsgislib
    from rsgislib import imageutils
    
    # Segmentation files
    palsarList = ['palsar_hh.kea',
                'also_hv.kea']
    
    # All files
    datalayersList = palsarList + \
                     ['palsar_hh_tex.kea',
                      'palsar_hv_tex.kea',
                      'elevation.kea',
                      'slope.kea']
    
    # Band names
    bandNamesList = ['hh','hv',
                     'hh_tex','hv_tex'
                     'elevation', 'slope']
    
    # Out file type and format
    gdalformat = 'KEA'
    dataType = rsgislib.TYPE_32FLOAT
    
    # Create stack for segmentation
    outputImage = 'palsar_hhhv.kea'
    imageutils.stackImageBands(palsarList, None, \
         outputImage, None, 0, gdalformat, dataType)
    
    # Create stack of all data layers
    outputDataStack = 'classification_stack.kea'
    imageutils.stackImageBands(datalayersList, bandNamesList, \
         outputDataStack, None, 0, gdalformat, dataType)
    

    Two stacks are created, one for the classification, which contains all data layers, and a second for segmentation which contains only SAR data. As the training data raster is categorical it is kept as a separate layer, because a separate function is required to attribute the segments.

  7. Image segmentation
  8. The following code is used for segmentation (for more detail see earlier post)

    #/usr/bin/env python
    from rsgislib.segmentation import segutils
     
    inputImage = 'palsar_hhhv.kea'
    clumpsFile = 'palsar_hhhv_clumps_elim_final.kea'
    meanImage = 'palsar_hhhv_clumps_elim_final_mean.kea'
    
    # Run segmentation
    segutils.runShepherdSegmentation(inputImage, clumpsFile,
                        meanImage, numClusters=100, minPxls=100)
    

    The method is similar to that described in an earlier post, with the addition of a step to include categorical data.

  9. Attribute segments
  10. #!/usr/bin/env python
    
    from rsgislib import rastergis
    from rsgislib.rastergis import ratutils
    from rsgislib import imageutils
     
    dataStack = 'classification_stack.kea'
    classFile = 'nwi_raster.kea'
    clumpsFile = 'palsar_hhhv_clumps_elim_final.kea'
     
    # Attribute segments with data
    ratutils.populateImageStats(dataStack, clumpsFile,
                        calcMean=True)
    
    # Convert training data to RAT
    codeStats = list()
    codeStats.append(rastergis.BandAttStats(band=1, minField='wetCode'))
    rastergis.populateRATWithStats(classFile, classFile, codeStats)
    
    # Attribute segments with class
    rastergis.strClassMajority(clumpsFile, classFile, \
                  'wetCode', 'wetCode', False)
    
  11. Perform classification
  12. Much of the classification code is sorting the data from the RAT so it can be input into scikit-learn.

    #!/usr/bin/env python
    from rios import rat
    import osgeo.gdal as gdal
    import numpy as np
    from sklearn.ensemble import RandomForestClassifier
    
    # Open RAT
    inRatFile = 'palsar_hhhv_clumps_elim_final.kea'
    ratDataset = gdal.Open(inRatFile, gdal.GA_Update)
    
    # Set column names
    x_col_names = ['hh','hv',
                   'hh_tex','hv_tex'
                   'elevation', 'slope']
    
    y_col_name = 'wetCode'
    
    # Set up list to hold data
    X = []
    
    # Read in data from each column
    for colName in x_col_names:
        X.append(rat.readColumn(ratDataset, colName))
    
    # Read in training data
    y = rat.readColumn(ratDataset, y_col_name) 
    # Set NA values to 0
    y = np.where(y == b'NA',0,y)
    y = y.astype(np.int16)
    
    X.append(y)
    
    X = np.array(X)
    X = X.transpose()
    
    # Remove rows with 0 (NA) for wetCode
    X_train = X[X[:,-1] != 0]
    
    # Remove non-finite values
    X_train = X_train[np.isfinite(X_train).all(axis=1)]
    
    # Split into variables (X) and class (y)
    y_train = X_train[:,-1]
    X_train = X_train[:,0:-1]
    
    # Train Random Forests
    clf = RandomForestClassifier(n_estimators=500, max_features=3, \
          oob_score=True, n_jobs=6, verbose=2)
    
    clf.fit(X_train, y_train)
    
    # Set NaN values to 0
    X = np.where(np.isfinite(X),X,0)
    
    # Apply classification
    predictClass = clf.predict(X[:,0:-1])
    
    # Write out data to RAT
    rat.writeColumn(ratDataset, 'predictClass', predictClass)
    ratDataset = None
    

There are other algorithms in scikit-learn which can also be applied instead of Random Forests once the data is in the correct format. The big advantage of this system is the entire process can be applied within a single Python script so multiple algorithms / parameters can be easily tested and the performance evaluated.

References

[1] Clewley, D.; Bunting, P.; Shepherd, J.; Gillingham, S.; Flood, N.; Dymond, J.; Lucas, R.; Armston, J.; Moghaddam, M. A Python-Based Open Source System for Geographic Object-Based Image Analysis (GEOBIA) Utilizing Raster Attribute Tables. Remote Sensing 2014, 6, 6111-6135.
[2] Scikit-learn: Machine Learning in Python, Pedregosa et al., JMLR 12, pp. 2825-2830, 2011.
[3] Whitcomb, J., Moghaddam, M., McDonald, K., Podest, E., Kellndorfer, J., Wetlands Map of Alaska Using L-Band Radar Satellite Imagery, Canadian Journal of Remote Sensing, 2009, Vol. 35, pp. 54-72

Add a colour table in RSGISLib

For thematic rasters (e.g., classification) it is useful to save a colour scheme with the data for visualisation. This can be accomplished using a colour table saved as fields in an attribute table. There is a function in RSGISLib which will add colours to an existing RAT. To use the function on rasters which don’t already have an attribute table (i.e., the class is stored as the pixel value) one must be created and the pixel values copied to a column.

To accomplish this in RSGISLib the following is used:

import collections
from rsgislib import rastergis

classification='brigalow_regrowth_classification.kea'

# Add histogram (automatically adds attribute table)
rastergis.populateStats(classification, addclrtab=False, \
        calcpyramids=False, ignorezero=False)

# Add pixel values to attribute table
bandStats = []
bandStats.append(rastergis.BandAttStats(band=1, maxField='Class'))

rastergis.populateRATWithStats(classification, \
                                classification, bandStats)

# Add colour table
classcolours = {}
colourCat = collections.namedtuple('ColourCat', \
                        ['red', 'green', 'blue', 'alpha'])
classcolours[0] = colourCat(red=0, green=0, blue=0, alpha=0)
classcolours[1] = colourCat(red=255, green=0, blue=0, alpha=255)
classcolours[2] = colourCat(red=0, green=0, blue=255, alpha=255)
classcolours[3] = colourCat(red=0, green=200, blue=0, alpha=255)
classcolours[4] = colourCat(red=0, green=100, blue=0, alpha=255)
rastergis.colourClasses(classification, 'Class', classcolours)

# Add pyramids (for fast display)
rastergis.populateStats(classification, addclrtab=False, \
          calcpyramids=True, ignorezero=False)

If you don’t have an existing colour scheme you can add a random colour to each class by running ‘rastergis.populateStats’ and setting ‘addclrtab’ to True.

Note: if you add a colour table to a KEA file it will be recognised in TuiView and ArcMap (using the KEA driver) but not QGIS. Colour tables in ERDAS Imagine format work fine in QGIS.

Image segmentation & attribution utilities in RSGISLib

Included with RSGISLib are two command line tools to segment an image, and attribute each segment:

# Segmentation
rsgislibsegmentation.py --input jers1palsar_stack.kea \
--output jers1palsar_stack_clumps_elim_final.kea \
--outmeanimg jers1palsar_stack_clumps_elim_final_mean.kea \
-tmpath $PWD --numclusters 100 --minpxls 100

# Attribute segments
rsgislibattributerat.py --inimage jers1palsar_stack.kea \
--inclumps jers1palsar_stack_clumps_elim_final.kea \
--mean

To populate the image statistics, band names are used (where available), these can be set using the ‘setbandnames.py’ script from RSGIS Scripts.

These command line tools use the Python utility functions ‘runShepherdSegmentation‘ from segutils and ‘populateImageStats‘ from ratutils. These utility functions can be called directly from Python:

from rsgislib.rastergis import ratutils
from rsgislib.segmentation import segutils
from rsgislib import imageutils

inputImage = 'jers1palsar_stack.kea'
clumpsFile = 'jers1palsar_stack_clumps_elim_final.kea'
meanImage = 'jers1palsar_stack_clumps_elim_final_mean.kea'

# Set band names
bandNames = ['98_summer','98_winter','07_HH','07_HV']
imageutils.setBandNames(inputImage, bandNames)

# Run segmentation
segutils.runShepherdSegmentation(inputImage, clumpsFile,
                    meanImage, numClusters=100, minPxls=100)

# Attribute segments
ratutils.populateImageStats(inputImage, clumpsFile,
                    calcMean=True)

Note the latest version of RSGISLib (2.1.752) is required for this. Older versions don’t include the ‘setBandNames’ function and require all parameters to be set in the utility functions.

Multi-core processing with RSGISLib using tiles

Within RSGISLib there are functions within imageutils for tiling and mosaicking images. These can be combined to split large datasets up for processing using multiple cores or nodes on a HPC. The Python bindings for the createTiles function was written so it returns a list of all tiles created, this allows the list of tiles to be passed to a separate function for processing. Combining with the multiprocessing module in Python provides a simple way of processing large datasets on multiple cores with the createImageMosaic function called at the end to join the tiles back together.

The example below shows how to:

  1. Split an image into temporary tiles (in a directory created using the tempfile module).
  2. Run an RSGISLib function on each tile, in this case imageMath.
  3. Re-mosaic the tiles.
  4. Remove the temp files created.
# Import RSGISLib
import rsgislib
from rsgislib import imageutils
from rsgislib import imagecalc

# Import multiprocessing
from multiprocessing import Pool
# Import tempfile
import tempfile
# Import os
import os

outFormat = 'KEA'
outType = rsgislib.TYPE_32INT

def addOne(inImage):

    outputImage = inImage.replace('.kea','add1.kea')
    expression = 'b1+1'
    imagecalc.imageMath(inImage, outputImage,
                    expression, outFormat, outType)

    return outputImage

if __name__ == '__main__':

    inputImage = 'N06W053_PALSAR_08_HH_utm.kea'
    outImage = 'N06W053_PALSAR_08_HH_utm_addOne.kea'

    # Create temporary directory
    tempDIR = tempfile.mkdtemp(dir='.')
    outBase = os.path.join(tempDIR, 'tile')

    # Create Tiles
    width = 1000
    height = width
    overlap = 5 # Set a 5 pixel overlap between tiles
    offsettiling = 0
    ext='kea'
    temptiles = imageutils.createTiles(inputImage, outBase, width,
             height, overlap, offsettiling, outFormat, outType, ext)

    # Run process on tiles
    pool = Pool()
    temptilesP = pool.map(addOne, temptiles)

    # Mosaic tiles
    backgroundVal = 0.
    skipVal = 0.
    skipBand = 1
    overlapBehaviour = 0

    imageutils.createImageMosaic(temptilesP, outImage, backgroundVal,
               skipVal, skipBand, overlapBehaviour, outFormat, outType)

    # Remove temp tiles and DIR
    removeList = temptiles + temptilesP
    for tile in removeList:
        os.remove(tile)

    os.removedirs(tempDIR)

As all the functions try to write to stdout at the same time it looks a little messy (I still need to figure out a nice way to improve this). Also, although I didn’t run any tests the overhead of creating and mosaicking the tiles will likely make this image maths function (which is not particularly CPU intensive) take longer than running on the entire image in one go. However, it shows the potential for combining the RSGISLib Python bindings with the multiprocessing module.