Tag Archives: RIOS

Calculations on large Raster Attribute Tables using ratapplier

The RIOS Library (http://rioshome.org) provides two methods of manipulating columns within a Raster Attribute Table (RAT):

  1. The ‘rat’ module can read an entire column to memory.
    from osgeo import gdal
    from rios import rat
    
    # Open RAT dataset
    rat_dataset = gdal.Open("clumps.kea", gdal.GA_Update)
    
    # Get columns with average red and NIR for each object
    red = rat.readColumn(rat_dataset, "RedAvg")
    nir = rat.readColumn(rat_dataset, "NIR1Avg")
    
    ndvi = (nir - red) / (nir + red)
    
    # Write out column
    rat.writeColumn(rat_dataset, "NDVI", ndvi)
    
    # Close RAT dataset
    rat_dataset = None
    
  2. The newer ‘ratapplier’ module is modelled after the ‘applier’ module for images and allows a function to be applied to chunks of rows, making it particularly useful for a RAT which is too large to load to memory.
    from rios import ratapplier
    
    def _ratapplier_calc_ndvi(info, inputs, outputs):
        """
        Calculate NDVI from RAT.
    
        Called by ratapplier
        """
        # Get columns with average red and NIR for each object
        # within block
        red = getattr(inputs.inrat, "RedAvg")
        nir = getattr(inputs.inrat, "NIR1Avg")
    
        # Calculate NDVI
        ndvi = (nir - red) / (nir + red)
    
        # Save to 'NDVI' column (will create if doesn't exist)
        setattr(outputs.outrat,"NDVI", ndvi)
    
    if __name__ == "__main__":
    
        # Set up rat applier for input / output
        in_rats = ratapplier.RatAssociations()
        out_rats = ratapplier.RatAssociations()
    
        # Pass in clumps file
        # Same file is used for input and output to write
        # to existing RAT
        in_rats.inrat = ratapplier.RatHandle("clumps.kea")
        out_rats.outrat = ratapplier.RatHandle("clumps.kea")
    
        # Apply function to all rows in chunks
        ratapplier.apply(_ratapplier_calc_ndvi, in_rats, out_rats)
    

Although using ratapplier looks slightly more complicated at first writing scripts to use it rather than the ‘rat’ interface means they will scale much better to larger datasets.

Both these examples are available to download from https://bitbucket.org/snippets/danclewley/7E666

Using appliercontrols for more control over output in RIOS

For more control over writing files in RIOS, you can use the controls class, documented here. The basic usage is to create a ‘controls’ class and pass this to the ‘apply’ function:

from rios import applier
infiles = applier.FilenameAssociations()
infiles.inimage = 'in_image.kea'

outfiles = applier.FilenameAssociations()
outfiles.outimage = 'out_image.kea'

controls = applier.ApplierControls()
applier.apply(userFunc, infiles, outfiles, controls=controls)

Display progress

from rios import cuiprogress
controls.progress = cuiprogress.CUIProgressBar()

Set output driver and creation options

controls.setOutputDriverName("ENVI")
controls.setCreationOptions(["INTERLEAVE=BIL"])

For more information on this, see previous post. For supported creations options for each driver see http://www.gdal.org/formats_list.html.

Re-project data on-the-fly
If the data are different projections you can reproject them to a reference image using:

controls.setReferenceImage(infiles.inimage)
controls.setResampleMethod("near")

Where any of the resampling methods supported by GDAL can be used. A list is available in the gdalwarp documentation.

Set output layer as thematic

controls.setThematic(True)

Set output later names

controls.setLayerNames(['HH','VV','HV'])

Set no data value

controls.setStatsIgnore(0)

This sets the global default value to use as the null value when calculating stats.

Other
For a full list open an interactive Python session and type:

help(rios.applier.ApplierControls)

Object-based classification using Random Forests

In our recent paper on an open source system for object based image classification [1] we mentioned linking with scikit-learn [2] to apply different classification algorithms. This post presents an example using Random Forests to give an idea of all the steps required. Random Forests is an ensamble learning algorithm which utilises many decision trees, each of which ‘vote’ for the final class. To reduce correlation between trees only a subset of predictors (data layers) and training samples are used for each class. Random forests is increasing in popularity within remote sensing, an example of usage is the pixel-based classification of Alaska by Whitcombe et al 2009 [3]

Taking [3] as an example this post demonstrates how the classification can be applied at the object level.

  1. Install software
  2. As detailed in previous posts the software required can be installed under Linux/OS X through conda using

    conda install -c osgeo \
      rsgislib rios tuiview scikit-learn
    
  3. Define training data
  4. Random Forests works best with a large number of training samples. For this example we’re using the National Wetlands Inventory (NWI) which are available as an ESRI Shapefile. For each class an integer code (WETCLASS) has been assigned and stored within the attribute table. The polygons are converted to a raster, where the pixel ID is the class using:

    gdal_rasterize -of KEA -ot Byte -a WETCLASS \
         -tr 100 100 nwi_shp.shp nwi_raster.kea
    
  5. Stack bands
  6. To stack the bands (assumed to have the same projection and resolution) the stack bands command, within RSGISLib, is used.

    #/usr/bin/env python
    import rsgislib
    from rsgislib import imageutils
    
    # Segmentation files
    palsarList = ['palsar_hh.kea',
                'also_hv.kea']
    
    # All files
    datalayersList = palsarList + \
                     ['palsar_hh_tex.kea',
                      'palsar_hv_tex.kea',
                      'elevation.kea',
                      'slope.kea']
    
    # Band names
    bandNamesList = ['hh','hv',
                     'hh_tex','hv_tex'
                     'elevation', 'slope']
    
    # Out file type and format
    gdalformat = 'KEA'
    dataType = rsgislib.TYPE_32FLOAT
    
    # Create stack for segmentation
    outputImage = 'palsar_hhhv.kea'
    imageutils.stackImageBands(palsarList, None, \
         outputImage, None, 0, gdalformat, dataType)
    
    # Create stack of all data layers
    outputDataStack = 'classification_stack.kea'
    imageutils.stackImageBands(datalayersList, bandNamesList, \
         outputDataStack, None, 0, gdalformat, dataType)
    

    Two stacks are created, one for the classification, which contains all data layers, and a second for segmentation which contains only SAR data. As the training data raster is categorical it is kept as a separate layer, because a separate function is required to attribute the segments.

  7. Image segmentation
  8. The following code is used for segmentation (for more detail see earlier post)

    #/usr/bin/env python
    from rsgislib.segmentation import segutils
     
    inputImage = 'palsar_hhhv.kea'
    clumpsFile = 'palsar_hhhv_clumps_elim_final.kea'
    meanImage = 'palsar_hhhv_clumps_elim_final_mean.kea'
    
    # Run segmentation
    segutils.runShepherdSegmentation(inputImage, clumpsFile,
                        meanImage, numClusters=100, minPxls=100)
    

    The method is similar to that described in an earlier post, with the addition of a step to include categorical data.

  9. Attribute segments
  10. #!/usr/bin/env python
    
    from rsgislib import rastergis
    from rsgislib.rastergis import ratutils
    from rsgislib import imageutils
     
    dataStack = 'classification_stack.kea'
    classFile = 'nwi_raster.kea'
    clumpsFile = 'palsar_hhhv_clumps_elim_final.kea'
     
    # Attribute segments with data
    ratutils.populateImageStats(dataStack, clumpsFile,
                        calcMean=True)
    
    # Convert training data to RAT
    codeStats = list()
    codeStats.append(rastergis.BandAttStats(band=1, minField='wetCode'))
    rastergis.populateRATWithStats(classFile, classFile, codeStats)
    
    # Attribute segments with class
    rastergis.strClassMajority(clumpsFile, classFile, \
                  'wetCode', 'wetCode', False)
    
  11. Perform classification
  12. Much of the classification code is sorting the data from the RAT so it can be input into scikit-learn.

    #!/usr/bin/env python
    from rios import rat
    import osgeo.gdal as gdal
    import numpy as np
    from sklearn.ensemble import RandomForestClassifier
    
    # Open RAT
    inRatFile = 'palsar_hhhv_clumps_elim_final.kea'
    ratDataset = gdal.Open(inRatFile, gdal.GA_Update)
    
    # Set column names
    x_col_names = ['hh','hv',
                   'hh_tex','hv_tex'
                   'elevation', 'slope']
    
    y_col_name = 'wetCode'
    
    # Set up list to hold data
    X = []
    
    # Read in data from each column
    for colName in x_col_names:
        X.append(rat.readColumn(ratDataset, colName))
    
    # Read in training data
    y = rat.readColumn(ratDataset, y_col_name) 
    # Set NA values to 0
    y = np.where(y == b'NA',0,y)
    y = y.astype(np.int16)
    
    X.append(y)
    
    X = np.array(X)
    X = X.transpose()
    
    # Remove rows with 0 (NA) for wetCode
    X_train = X[X[:,-1] != 0]
    
    # Remove non-finite values
    X_train = X_train[np.isfinite(X_train).all(axis=1)]
    
    # Split into variables (X) and class (y)
    y_train = X_train[:,-1]
    X_train = X_train[:,0:-1]
    
    # Train Random Forests
    clf = RandomForestClassifier(n_estimators=500, max_features=3, \
          oob_score=True, n_jobs=6, verbose=2)
    
    clf.fit(X_train, y_train)
    
    # Set NaN values to 0
    X = np.where(np.isfinite(X),X,0)
    
    # Apply classification
    predictClass = clf.predict(X[:,0:-1])
    
    # Write out data to RAT
    rat.writeColumn(ratDataset, 'predictClass', predictClass)
    ratDataset = None
    

There are other algorithms in scikit-learn which can also be applied instead of Random Forests once the data is in the correct format. The big advantage of this system is the entire process can be applied within a single Python script so multiple algorithms / parameters can be easily tested and the performance evaluated.

References

[1] Clewley, D.; Bunting, P.; Shepherd, J.; Gillingham, S.; Flood, N.; Dymond, J.; Lucas, R.; Armston, J.; Moghaddam, M. A Python-Based Open Source System for Geographic Object-Based Image Analysis (GEOBIA) Utilizing Raster Attribute Tables. Remote Sensing 2014, 6, 6111-6135.
[2] Scikit-learn: Machine Learning in Python, Pedregosa et al., JMLR 12, pp. 2825-2830, 2011.
[3] Whitcomb, J., Moghaddam, M., McDonald, K., Podest, E., Kellndorfer, J., Wetlands Map of Alaska Using L-Band Radar Satellite Imagery, Canadian Journal of Remote Sensing, 2009, Vol. 35, pp. 54-72

Create a slope layer from a DEM

To calculate slope from a DEM the gdaldem utility can be used:

gdaldem slope -of KEA in_dem.kea out_slope.kea

There is an option to pass in a scale to convert the horizontal spacing from degrees to metres. However, if the DEM covers a range of latitudes it is desirable to convert the horizontal spacing on a per-pixel basis. To accomplish this I wrote a Python script, with RIOS used to read and write out data in blocks, making it efficient to process large datasets.

Implementation
As the slope calculation requires looping through a block of data, a task which is slow in pure Python, two strategies are used to improve the speed:

  1. Numba is used to compile the Python code – which is a lot faster than pure Python
  2. Fortran code is used to perform the actual slope calculation, compiled as a Python module using f2py

To provide a comparison of the time required for each implementation slope was calculated from a 1800 x 1800 pixel DEM using each method:

Implementation Time (s)
Pure Python 136.7
Numba 2.6
Fortran 2.1

Tests using Python 3.4 with Numba 0.13.4 running under OS X with 1.8 GHz Intel i5 processor. The gfortran Fortran compiler was used. Average runtime over 3 runs.

So the Fortran version is slightly faster than the Numba version but only slightly and required a lot more effort to implement. Both are significantly faster than the pure Python version. The code provides a nice example of applying more complicated algorithms to images with RIOS used to handle the I/O.

To account for noise in the DEM there is also a version of the slope calculation which will use least squares fitting to fit a plane over a window of pixels and calculate the slope from this. As the Python version requires calls to the NumPy linear fitting code there is no improvement using Numba. The Fortran version uses Accelerate under OS X or ATLAS under Linux for the plane fitting.

The code is available to download from https://github.com/MiXIL/calcSlopeDegrees

Usage
To calculate slope from a DEM where the horizontal sloping is in degrees:

calcSlopeDegrees.py --spacing_degrees inDEM.kea outSlope.kea

To calculate slope by fitting a plane over 9 x 9 window:

calcSlopeDegrees.py --plane_ls --window_size 9 inDEM.kea outSlope.kea

A Python-Based Open Source System for Geographic Object-Based Image Analysis (GEOBIA) Utilizing Raster Attribute Tables – Bonus Features

This is the first post, of what I hope will be a regular feature, with some ‘Bonus Features’ of recently published papers I have been involved with. The idea is to provide some of the details considered too technical (or trivial) for an academic publication, code snippets, a bit of backstory and other things I think might be of interest.

This first post is on:

Clewley, D.; Bunting, P.; Shepherd, J.; Gillingham, S.; Flood, N.; Dymond, J.; Lucas, R.; Armston, J.; Moghaddam, M. A Python-Based Open Source System for Geographic Object-Based Image Analysis (GEOBIA) Utilizing Raster Attribute Tables. Remote Sensing 2014, 6, 6111-6135.

The article is open access and is available to download from:

http://www.mdpi.com/2072-4292/6/7/6111

  1. Installing the software.

    Binaries for OS X and Linux are made available through conda. See the software page for more instructions.

  2. RFC40 and RSGISLib 2.1 / 2.2 differences

    Whilst we were writing the paper there were some major changes in RSGISLib due to the release of GDAL 1.11 which included the RFC40 changes proposed and implemented by Sam and Pete. These changes meant that rather than loading the entire RAT to memory (which was fast but the size of the RAT which could be processed was limited by the amount of available RAM), rows were accessed from disk (which was slower but removed the RAM limitation). For RSGISLib Pete and myself removed all the RAT functions after the 2.1 release and started adding them back to take advantage of the new RAT interface in GDAL 1.11. For some functions this required reworking the algorithm. After a pretty intense couple of days coding we managed to port most of the main functions across, and we’re gradually adding the rest. Therefore, some of the functions listed in the paper as available in 2.1 aren’t available in the latest release yet.

    In RIOS, Neil and Sam added a ‘ratapplier’ interface to process large RATs chunks at a time, similar to the ‘applier’ interface for processing images. More information is in the RIOS documentation. The ratapplier interface is backwards compatible with pre-RFC40 versions of GDAL (although all rows are loaded at once).

    The RFC40 changes in TuiView allowed massive RATs (e.g., segmentations for Australia and Alaska) to be easily visualised on moderate specification machines. Being able to open and navigate a RAT with 10s of millions of rows on a laptop really is an impressive feat!

  3. Software Comparison

    To run the segmentation in RSGISLib and OTB the following scripts were used (note OTB isn’t currently available in conda, for the paper we installed through the ubuntugis-unstable package repository).

    RSGISLib (Python):

    from rsgislib.segmentation import segutils
     
    inputImage = '../Data/naip_newhogansouth_2012_sub.tif'
    clumpsFile = 'naip_newhogansouth_2012_clumps_elim_final.kea'
      
    # Run segmentation
    segutils.runShepherdSegmentation(inputImage, clumpsFile,
                               numClusters=60, minPxls=100, 
                               distThres=100, bands=None, 
                               sampling=100, kmMaxIter=200)
    

    OTB (Bash):

    otbcli_Segmentation -in ../Data/naip_newhogansouth_2012_sub.tif \
        -mode.vector.out naip_newhogansouth_2012_otb_seg.shp
    

    To time the segmentation the UNIX ‘time’ command was used.

    There are lots of algorithms in OTB for segmentation so if you find the algorithm in RSGSILib doesn’t quite fit your requirements I’d highly recommend trying OTB. Although OTB can produce a raster output the vector output algorithm is able to process larger datasets and utilise multiple cores. To convert to a raster gdal_rasterize can be used. For example:

    gdal_rasterize -a DN -tr 30 30 -ot UInt32 \
       -of KEA segmentation_vector.shp \
       segmentation_raster.kea
    

    To convert to a RAT the following Python function can be used:

    from rsgislib import rastergis
    clumps='segmentation_raster.kea'
    pyramids=True
    colourtable=True
    rastergis.populateStats(clumps, colourtable, pyramids)
    

    This can then be used exactly the same as the segmentation produced in RSGISLib. We think the ability to use different segmentation algorithms, from different packages, is a real benefit of the system.

  4. Examples

    The ‘Change in Mangroves Extent’ example was part of a course Pete and Myself gave at JAXA’s 20th Kyoto and Carbon Meeting (agenda and presentations available here, full workshop available to download from SourceForge). For segmenting the image and attributing objects there are two utility functions available in RSGISLib, described in an earlier post. For the course and paper we used the old RIOS RAT interface. However, it is recommended to use the new ratapplier interface. As an example the classification of water would be:

    from rios import ratapplier
    import numpy
    
    def classifyWater(info, inputs, outputs):
        # Read the 1996 JERS-1 Mean dB values for the clumps
        # The column is represented as a numpy array of size 
        # block length x 1.
        HH96MeandB = inputs.inrat.HH96MeandB
        # Create a new numpy array with the same dimensions (i.e., length)
        # as the 'HH96MeandB' array. The data type has been defined as
        # an 8 bit integer (i.e., values from -128 to 128).
        # All pixel values will be initialised to zero
        Water96 = numpy.zeros_like(HH96MeandB, dtype=numpy.int8)
        # Similar to an SQL where selection the where numpy where function
        # allows a selection to be made. In this case all array elements 
        # with a 1996 HH value less than -12 dB are being selected and 
        # the corresponding elements in the Water96 array will be set to 1.
        Water96 = numpy.where((HH96MeandB < -12), 1, Water96)
        # Save out to column 'Water96'
        outputs.outrat.Water96 = Water96
    
    # Set up inputs and outputs for ratapplier
    inRats = ratapplier.RatAssociations()
    outRats = ratapplier.RatAssociations()
    
    inRats.inrat = ratapplier.RatHandle('N06W053_96-10_segs.kea')
    outRats.outrat = ratapplier.RatHandle('N06W053_96-10_segs.kea')
    
    print('Classifying water')
    ratapplier.apply(classifyWater, inRats, outRats)
    

If you have any questions or comments about the system described in the paper email the RSGISLib support group (rsgislib-support@googlegroups.com).

Plot 2D Histogram of Pixel Values from Two Images

Recently I wanted to plot the pixel values of two images against each other. I though it would be good to combine extracting the pixel values (using RIOS) and plotting the data (using matplotlib) in a single script.

The script I wrote (two_band_scatter_plot.py) is available to download from the RSGIS Scripts repository

There are three main steps:

  1. RIOS is used to iterate through the image in small blocks, for each block only pixels that have data in each band are retained.
  2. The numpy.histogram2d is used to produce a 2D histogram.
  3. The histogram is plotted using matplotlib.

The 2D histogram was based on code from http://oceanpython.org/2013/02/25/2d-histogram/. As there are a lot of points, displaying as a 2D histogram makes the information easier to see.

The simplest usage of the script is:

python two_band_scatter_plot.py \
--imageA imageA.kea \
--imageB imageB.kea \
--plot outplot.pdf 

Which will plot the first band of each image. To change the bands used, labels for the plot and scaling additional flags can be passed in.

python two_band_scatter_plot.py \
--imageA imageA.kea \
--imageB imageB.kea \
--bandA 1 --bandB 1 \
--plot outplot.pdf \
--plotMin 0 --plotMax 0.5 \ 
--labelA "13157 - 004 (HH)" \
--labelB "13157 - 006 (HH)" \

The output looks something like this:
TonziR_13157_003_vs_007_hh

The script assumes both images are the same projection and resolution, and RIOS takes care of the spatial information. It would be possible to adapt so RIOS resamples the data, see the RIOS Wiki for more details.

Convert SSURGO soil data to a Raster Attribute Table

SSURGO (Soil Survey Geographic database) provides soil information across the United States. The data is provide as Shapefiles with the mapping units. The attributes for each polygon are stored as a text files, which need to be imported into an Access database and linked with the shapefile.

An alternative for working with SSURGO data is to convert the shapefile to a raster, parse the text files and store the attributes for each mapping unit as a Raster Attribute Table (RAT).

To do this the following steps are required:

  1. Use gdal_rasterize to create a raster.
  2. Use RSGISLib to convert to a RAT.
  3. Add a column for each attribute using RIOS.

A Python script to perform these steps can be downloaded from https://github.com/MiXIL/SSURGO-Utilities.

An example of usage is:

python convertSSURGO2RAT.py --indir CA669 \
  --colname claytotal_ \
  --outformat KEA

A list of all available columns can be viewed using:

python convertSSURGO2RAT.py --printcols

To export the columns as a standard raster (using RSGISLib) pass in the ‘–raster’ flag.

SSURGO data can be downloaded from the USDA NRCS Geospatial Data Portal (http://datagateway.nrcs.usda.gov/)