Tag Archives: RSGISLib

Mosaic Environment Agency DTM/DSM tiles using RSGISLib

The Environment Agency have just made their high resolution LiDAR-derived Digital Surface Model (DSM) and Digital Terrain Model (DTM) data available under the Open Government Licence through http://environment.data.gov.uk/ds/survey

The files are downloaded as zipped archives containing ASCII format files. To create a mosaic from them I have created a script using:

  1. The archive reader functionality from TuiView, which can get a list of files within a zip archive and convert them to paths which can be read with GDAL using a virtual filesytem as described in an earlier post
  2. The ‘createImageMosaic’ function in RSGISLib to create the mosaic, changing the no data value from -9999 to 0
  3. The ‘assignProj’ function in RSGISLib to assign the correct projection to the mosaic
  4. The ‘popImageStats’ function in RSGISLib to calculate stats and overviews for fast display in TuiView and other programs

The script mosaic_ea_lidar.py is available to download and is run using:

python mosaic_ea_lidar.py -o sx36_dsm_mosaic.kea \
           ~/Downloads/LIDAR-DSM-1M-SX36.zip

Multiple zip files can be passed in at once to be added to the same mosaic. The output format is set based on the output file extension.

A similar script using using GRASS through the ARSF DEM Scripts is also available to download from here. Usage is the same.

Copy a Shapefile to a Raster Attribute Table

The object based image analysis features in RSGISLib are based around storing object attributes as a Raster Attribute Table (RAT), as described in the following paper:

Clewley, D.; Bunting, P.; Shepherd, J.; Gillingham, S.; Flood, N.; Dymond, J.; Lucas, R.; Armston, J.; Moghaddam, M. A Python-Based Open Source System for Geographic Object-Based Image Analysis (GEOBIA) Utilizing Raster Attribute Tables. Remote Sensing 2014, 6, 6111-6135. (open access)

However, data to be used as part of the analysis are often stored as attributes of a shapefile. A RAT can be created from a shapefile using the gdal_rasterize command and the copyShapefile2RAT function in RSGISLib using the following steps:

  1. Rasterise vector
    This can done directly from the command line or by calling gdal_rasterize from a Python script using subprocess.

    import subprocess 
    
    input_vector = 'brig_re09_binary.shp'
    rasterised_vector = 'brig_re09_binary_raster.kea'
        
    rasterise_cmd = ['gdal_rasterize','-of','KEA',
                                      '-a','BGL',
                                      '-tr','30','30',
                                      input_vector,
                                      rasterised_vector]
    print('Running: ', ' '.join(rasterise_cmd))
    subprocess.call(rasterise_cmd)
    
    

    The above code sets the output raster pixel size to 30 x 30 m and uses the ‘BGL’ column from the shapefile (which is an integer) for the pixel values.

    Note this image is only used to get the extent and pixel size of the final RAT image. If you have another image you wish to match the extent and pixel size to this can be used instead of rasterising the vector, in this case skip this step and use the existing image instead of ‘rasterised_vector’.

  2. Create RAT and copy shapefile attributes
    from rsgislib import vectorutils
    import rsgislib
     
    input_vector = 'brig_re09_binary.shp'
    rasterised_vector = 'brig_re09_binary_raster.kea'
    output_rat_image = 'brig_re09_attributes.kea'
        
    vectorutils.copyShapefile2RAT(input_vector, rasterised_vector,output_rat_image)
    
    

    This will copy all the attributes from the input shapefile to a RAT. Note the output format is always KEA as it has support for large attribute tables and stores them with compression.

To view the attributes open the file in TuiView. The output RAT can then be used as any other RAT, e.g., attributing objects with image statistics or using as part of a classification. For more details see other posts on object based image analysis.

Create Colour Composites for ALOS PALSAR Tiles

A common way to visualise dual-polarisation SAR data is to display a colour composite of HH (Red), HV (Green) and HH/HV(Blue).

Python Functions

Such composites can be created in RSGISLib using the following steps:

  1. Create the ratio image using the bandMath.
    import rsgislib
    from rsgislib import imagecalc
    
    bandDefns = [imagecalc.BandDefn('hh', 'in_hh_file.kea', 1),
                 imagecalc.BandDefn('hv', 'in_hv_file.kea', 1)]
    imagecalc.bandMath('out_hhhv_ratio.kea', 'hh/hv', 'KEA', \
                      rsgislib.TYPE_32FLOAT, bandDefns) 
    
  2. Create the stack using stackImageBands
    import rsgislib
    from rsgislib import imageutils
    
    bands_list = ['in_hh_file.kea', 'in_hv_file.kea', 'out_hhhv_ratio.kea']
    band_names = ['HH','HV', 'HH/HV']
    gdaltype = rsgislib.TYPE_32FLOAT
    imageutils.stackImageBands(bands_list, band_names, 'out_hhhvratio_stack.kea', None, 
                                        0, 'KEA', rsgislib.TYPE_32FLOAT) 
    
  3. Stretch using stretchImage
    import rsgislib
    from rsgislib import imageutils
    
    imageutils.stretchImage('out_hhhvratio_stack.kea, 'out_hhhvratio_stack_scale.kea', \
           False, '', True, False, 'GTiff', 
           rsgislib.TYPE_8INT, imageutils.STRETCH_LINEARSTDDEV, 2)
    

    This will create a tiff with values between 0 – 255 (8 bit), suitable for display in GIS packages or graphics programs such as Photoshop / GIMP.

Script for ALOS PALSAR tiles

For the 1×1 degree 25 m resolution ALOS PALSAR tiles, which can be freely downloaded from http://www.eorc.jaxa.jp/ALOS/en/palsar_fnf/fnf_index.htm for 2007 – 2010. A script (create_palsar_tiles_rgb.py) is available from RSGIS Scripts on Bitbucket. The script will also untar files, and can be run on downloaded data using:

python create_palsar_tiles_rgb.py —unzip downloaded_palsar_files

The script will untar each tile into a separate directory, create temp files for the ratio image and stack and save a tiff in with the original date files in the following structure:

N63W165_10_MOS
     |-----N63W165_10_MOS.tar.gz - Original tar.gz
     |-----N63W165_10_MOS_composite.tif - Composite generated
     |-----N63W165_10_date       - Acquisition date of each pixel
     |-----N63W165_10_date.hdr  
     |-----N63W165_10_linci      - Local incidence angle for each pixel
     |-----N63W165_10_linci.hdr 
     |-----N63W165_10_mask       - Data / no-data mask
     |-----N63W165_10_mask.hdr 
     |-----N63W165_10_sl_HH      - HH data (DN)
     |-----N63W165_10_s1_HH.hdr 
     |-----N63W165_10_sl_HV      - HV data (DN)
     |-----N63W165_10_s1_HH.hdr 

The resulting tiff will be something like the example below (Alaska; 63N165W)

N63W165_10_MOS_composite_resize
Data Copyright JAXA

To apply the same stretch to multiple files a text file can be used with the minimum and maximum values for each band. For example:

#stddev
#band,img_min,img_max,out_min,out_max
1,48.4992,9286,0,255
2,396,4823.89,0,255
3,0.0181456,0.449104,0,255

This stretch file is specified using:

python create_palsar_tiles_rgb.py --unzip --stretchtxt stats.txt downloaded_palsar_files

It is also possible to create a stack using the coefficient of variance as the third band using:

# Coefficient of variance HH
python create_palsar_tiles_rgb.py --unzip --CoVHH downloaded_palsar_files 
# Coefficient of variance HV
python create_palsar_tiles_rgb.py --unzip --CoVHV downloaded_palsar_files

Assign a projection to an image from a WKT file, Proj4 string or another image

Sometimes, during the course of data processing even through the spatial information is correct there can be problems with the projection information, either it it lost, not stored in a format the software can understand or is stored slightly differently to other datasets with the same projection. In these situations, the easiest solution is to just copy the correct projection across. Within RSGISLib there are some functions for this within imageutils. For example if you want to copy the projection from an existing image:

from rsgislib import imageutils
inputImage = 'image_without_projection.kea'
refImage = 'image_with_projection.kea'
imageutils.copyProjFromImage(inputImage, refImage)

You can also assign a projection using a Well Known Text (WKT) file (in this example ‘UTM50S.wkt’, stored in the same directory):

from rsgislib import imageutils
inputImage = 'image_without_projection.kea'
imageutils.assignProj(inputImage, wktFile='UTM50S.wkt')

Note: this syntax requires version of RSGISLib 2.3 or later

If you don’t have RSGISLib installed there is a script to assign a projection within PML’s ARSF Tools repository on GitHub which just requires the GDAL Python bindings.

Usage is simple, to assign a projection from a WKT file:

python assign_projection.py --wkt 'UTM50S.wkt' image_without_projection.kea

If you have a lot of files which need the projection assigning you can use:

python assign_projection.py --wkt 'UTM50S' *.kea

This will work under both Unix-like and Windows operating systems.

As well as a WKT file you can also use Proj4 strings, for example:

python assign_projection.py \
 --proj4 '+proj=utm +zone=55 +south +ellps=WGS84 +towgs84=0,0,0,0,0,0,0 +units=m +no_defs' \
  image_without_projection.kea

A list of WKT files and Proj4 strings are available from spatialreference.org. You can also get the WKT string from an existing dataset using:

gdalinfo image_with_projection.kea

Or a Proj4 string using:

gdalinfo -proj4 image_with_projection.kea

Clip and Reproject an image to a master file using RSGISLib and GDAL

Nathan Thomas

Through my research I handle a plethora of raster datasets, each with a variety of resolutions and projections that I have to clip, reproject and resample to a set of consistent parameters, derived from a ‘master’ dataset. This would usually require multiple processing steps and a series of inputs, such as the extents of the image and the input projection (possibly a WKT file). The greater the number of inputs that are required the greater the chance that exists of an error occurring.

Here is an alternative method using a simple combination of RSGISLib and GDAL, which requires very little user input. This method relieves the user from having to retrieve the parameters of the input dataset from the image file, if these parameters are unknown. This is particularly useful if a user wishes to clip a dataset at a series of different study sites, each with a different spatial extent and possibly even different projection. This method is aimed at being simple to implement and would suit a beginner in Python looking to get multiple input datasets to a set of standard parameters, without requiring the knowledge to open an image and retrieve the required extent, resolution and projection.

Create Copy Image

The first step implements a command in RSGISLib to create a blank copy of a master dataset, populated with 0 values. This creates a blank image of a single value that has the parameters of the input file such as projection information, extent and resolution. The number of bands can also be specified so that a 7-band image can be created with the parameters from a single band image input. The RSGISLib command is as follows:

import rsgislib

inMask = 'inMaskImage.kea'
outName = 'outImage.kea'
numberOfBands = 1
GDALformat = 'KEA'
GDALtype = rsgislib.TYPE_32FLOAT

rsgislib.imageutils.createCopyImage(inMask, outName, \
 numberOfBands, 0, GDALformat, GDALtype)

GDAL Reproject Image

The second step implements a GDAL command to open a second input dataset (the one to clip) as read only and effectively populate the blank image created in step one with the values from the second input dataset. The output of this is a clipped image with the parameters of the master input file. The process is as follows:

from osgeo import gdal

ClipFile = 'largeRaster.kea'
outName = 'outImage.kea'
interpolationMethod = gdal.GRA_CubicSpline

inFile = gdal.Open(ClipFile, gdal.GA_ReadOnly)
outFile = gdal.Open(outName, gdal.GA_Update)

gdal.ReprojectImage(inFile, outFile, None, None, \
     interpolationMethod)

Implementation

The script is simple to implement and requires a number of inputs:

  • ‘-m’ Input Mask – The input file with the desired parameters
  • ‘-I’ Input Image – The image you wish to clip
  • ‘-o’ Output Image – The output image
  • ‘-n’ Number of Bands – The number of bands that the clipped scene requires (ie, to clip an RGB dataset the number of bands would be 3)
  • ‘-of’ GDAL format – The format of the output image (i.e. KEA)

An example of the implementation may look like:

python CutReproject.py –m N00E103_HH.kea \
        –i GlobalSRTMmosaic.kea \
        –o N00E103_SRTM.kea –n 1 –of KEA

CutReproject.py is available to download from bitbucket.org/nathanmthomas/bucket-of-rs-and-gis-scripts/

Object-based classification using Random Forests

In our recent paper on an open source system for object based image classification [1] we mentioned linking with scikit-learn [2] to apply different classification algorithms. This post presents an example using Random Forests to give an idea of all the steps required. Random Forests is an ensamble learning algorithm which utilises many decision trees, each of which ‘vote’ for the final class. To reduce correlation between trees only a subset of predictors (data layers) and training samples are used for each class. Random forests is increasing in popularity within remote sensing, an example of usage is the pixel-based classification of Alaska by Whitcombe et al 2009 [3]

Taking [3] as an example this post demonstrates how the classification can be applied at the object level.

  1. Install software
  2. As detailed in previous posts the software required can be installed under Linux/OS X through conda using

    conda install -c osgeo \
      rsgislib rios tuiview scikit-learn
    
  3. Define training data
  4. Random Forests works best with a large number of training samples. For this example we’re using the National Wetlands Inventory (NWI) which are available as an ESRI Shapefile. For each class an integer code (WETCLASS) has been assigned and stored within the attribute table. The polygons are converted to a raster, where the pixel ID is the class using:

    gdal_rasterize -of KEA -ot Byte -a WETCLASS \
         -tr 100 100 nwi_shp.shp nwi_raster.kea
    
  5. Stack bands
  6. To stack the bands (assumed to have the same projection and resolution) the stack bands command, within RSGISLib, is used.

    #/usr/bin/env python
    import rsgislib
    from rsgislib import imageutils
    
    # Segmentation files
    palsarList = ['palsar_hh.kea',
                'also_hv.kea']
    
    # All files
    datalayersList = palsarList + \
                     ['palsar_hh_tex.kea',
                      'palsar_hv_tex.kea',
                      'elevation.kea',
                      'slope.kea']
    
    # Band names
    bandNamesList = ['hh','hv',
                     'hh_tex','hv_tex'
                     'elevation', 'slope']
    
    # Out file type and format
    gdalformat = 'KEA'
    dataType = rsgislib.TYPE_32FLOAT
    
    # Create stack for segmentation
    outputImage = 'palsar_hhhv.kea'
    imageutils.stackImageBands(palsarList, None, \
         outputImage, None, 0, gdalformat, dataType)
    
    # Create stack of all data layers
    outputDataStack = 'classification_stack.kea'
    imageutils.stackImageBands(datalayersList, bandNamesList, \
         outputDataStack, None, 0, gdalformat, dataType)
    

    Two stacks are created, one for the classification, which contains all data layers, and a second for segmentation which contains only SAR data. As the training data raster is categorical it is kept as a separate layer, because a separate function is required to attribute the segments.

  7. Image segmentation
  8. The following code is used for segmentation (for more detail see earlier post)

    #/usr/bin/env python
    from rsgislib.segmentation import segutils
     
    inputImage = 'palsar_hhhv.kea'
    clumpsFile = 'palsar_hhhv_clumps_elim_final.kea'
    meanImage = 'palsar_hhhv_clumps_elim_final_mean.kea'
    
    # Run segmentation
    segutils.runShepherdSegmentation(inputImage, clumpsFile,
                        meanImage, numClusters=100, minPxls=100)
    

    The method is similar to that described in an earlier post, with the addition of a step to include categorical data.

  9. Attribute segments
  10. #!/usr/bin/env python
    
    from rsgislib import rastergis
    from rsgislib.rastergis import ratutils
    from rsgislib import imageutils
     
    dataStack = 'classification_stack.kea'
    classFile = 'nwi_raster.kea'
    clumpsFile = 'palsar_hhhv_clumps_elim_final.kea'
     
    # Attribute segments with data
    ratutils.populateImageStats(dataStack, clumpsFile,
                        calcMean=True)
    
    # Convert training data to RAT
    codeStats = list()
    codeStats.append(rastergis.BandAttStats(band=1, minField='wetCode'))
    rastergis.populateRATWithStats(classFile, classFile, codeStats)
    
    # Attribute segments with class
    rastergis.strClassMajority(clumpsFile, classFile, \
                  'wetCode', 'wetCode', False)
    
  11. Perform classification
  12. Much of the classification code is sorting the data from the RAT so it can be input into scikit-learn.

    #!/usr/bin/env python
    from rios import rat
    import osgeo.gdal as gdal
    import numpy as np
    from sklearn.ensemble import RandomForestClassifier
    
    # Open RAT
    inRatFile = 'palsar_hhhv_clumps_elim_final.kea'
    ratDataset = gdal.Open(inRatFile, gdal.GA_Update)
    
    # Set column names
    x_col_names = ['hh','hv',
                   'hh_tex','hv_tex'
                   'elevation', 'slope']
    
    y_col_name = 'wetCode'
    
    # Set up list to hold data
    X = []
    
    # Read in data from each column
    for colName in x_col_names:
        X.append(rat.readColumn(ratDataset, colName))
    
    # Read in training data
    y = rat.readColumn(ratDataset, y_col_name) 
    # Set NA values to 0
    y = np.where(y == b'NA',0,y)
    y = y.astype(np.int16)
    
    X.append(y)
    
    X = np.array(X)
    X = X.transpose()
    
    # Remove rows with 0 (NA) for wetCode
    X_train = X[X[:,-1] != 0]
    
    # Remove non-finite values
    X_train = X_train[np.isfinite(X_train).all(axis=1)]
    
    # Split into variables (X) and class (y)
    y_train = X_train[:,-1]
    X_train = X_train[:,0:-1]
    
    # Train Random Forests
    clf = RandomForestClassifier(n_estimators=500, max_features=3, \
          oob_score=True, n_jobs=6, verbose=2)
    
    clf.fit(X_train, y_train)
    
    # Set NaN values to 0
    X = np.where(np.isfinite(X),X,0)
    
    # Apply classification
    predictClass = clf.predict(X[:,0:-1])
    
    # Write out data to RAT
    rat.writeColumn(ratDataset, 'predictClass', predictClass)
    ratDataset = None
    

There are other algorithms in scikit-learn which can also be applied instead of Random Forests once the data is in the correct format. The big advantage of this system is the entire process can be applied within a single Python script so multiple algorithms / parameters can be easily tested and the performance evaluated.

References

[1] Clewley, D.; Bunting, P.; Shepherd, J.; Gillingham, S.; Flood, N.; Dymond, J.; Lucas, R.; Armston, J.; Moghaddam, M. A Python-Based Open Source System for Geographic Object-Based Image Analysis (GEOBIA) Utilizing Raster Attribute Tables. Remote Sensing 2014, 6, 6111-6135.
[2] Scikit-learn: Machine Learning in Python, Pedregosa et al., JMLR 12, pp. 2825-2830, 2011.
[3] Whitcomb, J., Moghaddam, M., McDonald, K., Podest, E., Kellndorfer, J., Wetlands Map of Alaska Using L-Band Radar Satellite Imagery, Canadian Journal of Remote Sensing, 2009, Vol. 35, pp. 54-72

A Python-Based Open Source System for Geographic Object-Based Image Analysis (GEOBIA) Utilizing Raster Attribute Tables – Bonus Features

This is the first post, of what I hope will be a regular feature, with some ‘Bonus Features’ of recently published papers I have been involved with. The idea is to provide some of the details considered too technical (or trivial) for an academic publication, code snippets, a bit of backstory and other things I think might be of interest.

This first post is on:

Clewley, D.; Bunting, P.; Shepherd, J.; Gillingham, S.; Flood, N.; Dymond, J.; Lucas, R.; Armston, J.; Moghaddam, M. A Python-Based Open Source System for Geographic Object-Based Image Analysis (GEOBIA) Utilizing Raster Attribute Tables. Remote Sensing 2014, 6, 6111-6135.

The article is open access and is available to download from:

http://www.mdpi.com/2072-4292/6/7/6111

  1. Installing the software.

    Binaries for OS X and Linux are made available through conda. See the software page for more instructions.

  2. RFC40 and RSGISLib 2.1 / 2.2 differences

    Whilst we were writing the paper there were some major changes in RSGISLib due to the release of GDAL 1.11 which included the RFC40 changes proposed and implemented by Sam and Pete. These changes meant that rather than loading the entire RAT to memory (which was fast but the size of the RAT which could be processed was limited by the amount of available RAM), rows were accessed from disk (which was slower but removed the RAM limitation). For RSGISLib Pete and myself removed all the RAT functions after the 2.1 release and started adding them back to take advantage of the new RAT interface in GDAL 1.11. For some functions this required reworking the algorithm. After a pretty intense couple of days coding we managed to port most of the main functions across, and we’re gradually adding the rest. Therefore, some of the functions listed in the paper as available in 2.1 aren’t available in the latest release yet.

    In RIOS, Neil and Sam added a ‘ratapplier’ interface to process large RATs chunks at a time, similar to the ‘applier’ interface for processing images. More information is in the RIOS documentation. The ratapplier interface is backwards compatible with pre-RFC40 versions of GDAL (although all rows are loaded at once).

    The RFC40 changes in TuiView allowed massive RATs (e.g., segmentations for Australia and Alaska) to be easily visualised on moderate specification machines. Being able to open and navigate a RAT with 10s of millions of rows on a laptop really is an impressive feat!

  3. Software Comparison

    To run the segmentation in RSGISLib and OTB the following scripts were used (note OTB isn’t currently available in conda, for the paper we installed through the ubuntugis-unstable package repository).

    RSGISLib (Python):

    from rsgislib.segmentation import segutils
     
    inputImage = '../Data/naip_newhogansouth_2012_sub.tif'
    clumpsFile = 'naip_newhogansouth_2012_clumps_elim_final.kea'
      
    # Run segmentation
    segutils.runShepherdSegmentation(inputImage, clumpsFile,
                               numClusters=60, minPxls=100, 
                               distThres=100, bands=None, 
                               sampling=100, kmMaxIter=200)
    

    OTB (Bash):

    otbcli_Segmentation -in ../Data/naip_newhogansouth_2012_sub.tif \
        -mode.vector.out naip_newhogansouth_2012_otb_seg.shp
    

    To time the segmentation the UNIX ‘time’ command was used.

    There are lots of algorithms in OTB for segmentation so if you find the algorithm in RSGSILib doesn’t quite fit your requirements I’d highly recommend trying OTB. Although OTB can produce a raster output the vector output algorithm is able to process larger datasets and utilise multiple cores. To convert to a raster gdal_rasterize can be used. For example:

    gdal_rasterize -a DN -tr 30 30 -ot UInt32 \
       -of KEA segmentation_vector.shp \
       segmentation_raster.kea
    

    To convert to a RAT the following Python function can be used:

    from rsgislib import rastergis
    clumps='segmentation_raster.kea'
    pyramids=True
    colourtable=True
    rastergis.populateStats(clumps, colourtable, pyramids)
    

    This can then be used exactly the same as the segmentation produced in RSGISLib. We think the ability to use different segmentation algorithms, from different packages, is a real benefit of the system.

  4. Examples

    The ‘Change in Mangroves Extent’ example was part of a course Pete and Myself gave at JAXA’s 20th Kyoto and Carbon Meeting (agenda and presentations available here, full workshop available to download from SourceForge). For segmenting the image and attributing objects there are two utility functions available in RSGISLib, described in an earlier post. For the course and paper we used the old RIOS RAT interface. However, it is recommended to use the new ratapplier interface. As an example the classification of water would be:

    from rios import ratapplier
    import numpy
    
    def classifyWater(info, inputs, outputs):
        # Read the 1996 JERS-1 Mean dB values for the clumps
        # The column is represented as a numpy array of size 
        # block length x 1.
        HH96MeandB = inputs.inrat.HH96MeandB
        # Create a new numpy array with the same dimensions (i.e., length)
        # as the 'HH96MeandB' array. The data type has been defined as
        # an 8 bit integer (i.e., values from -128 to 128).
        # All pixel values will be initialised to zero
        Water96 = numpy.zeros_like(HH96MeandB, dtype=numpy.int8)
        # Similar to an SQL where selection the where numpy where function
        # allows a selection to be made. In this case all array elements 
        # with a 1996 HH value less than -12 dB are being selected and 
        # the corresponding elements in the Water96 array will be set to 1.
        Water96 = numpy.where((HH96MeandB < -12), 1, Water96)
        # Save out to column 'Water96'
        outputs.outrat.Water96 = Water96
    
    # Set up inputs and outputs for ratapplier
    inRats = ratapplier.RatAssociations()
    outRats = ratapplier.RatAssociations()
    
    inRats.inrat = ratapplier.RatHandle('N06W053_96-10_segs.kea')
    outRats.outrat = ratapplier.RatHandle('N06W053_96-10_segs.kea')
    
    print('Classifying water')
    ratapplier.apply(classifyWater, inRats, outRats)
    

If you have any questions or comments about the system described in the paper email the RSGISLib support group (rsgislib-support@googlegroups.com).