Tag Archives: Python

Create a slope layer from a DEM

To calculate slope from a DEM the gdaldem utility can be used:

gdaldem slope -of KEA in_dem.kea out_slope.kea

There is an option to pass in a scale to convert the horizontal spacing from degrees to metres. However, if the DEM covers a range of latitudes it is desirable to convert the horizontal spacing on a per-pixel basis. To accomplish this I wrote a Python script, with RIOS used to read and write out data in blocks, making it efficient to process large datasets.

Implementation
As the slope calculation requires looping through a block of data, a task which is slow in pure Python, two strategies are used to improve the speed:

  1. Numba is used to compile the Python code – which is a lot faster than pure Python
  2. Fortran code is used to perform the actual slope calculation, compiled as a Python module using f2py

To provide a comparison of the time required for each implementation slope was calculated from a 1800 x 1800 pixel DEM using each method:

Implementation Time (s)
Pure Python 136.7
Numba 2.6
Fortran 2.1

Tests using Python 3.4 with Numba 0.13.4 running under OS X with 1.8 GHz Intel i5 processor. The gfortran Fortran compiler was used. Average runtime over 3 runs.

So the Fortran version is slightly faster than the Numba version but only slightly and required a lot more effort to implement. Both are significantly faster than the pure Python version. The code provides a nice example of applying more complicated algorithms to images with RIOS used to handle the I/O.

To account for noise in the DEM there is also a version of the slope calculation which will use least squares fitting to fit a plane over a window of pixels and calculate the slope from this. As the Python version requires calls to the NumPy linear fitting code there is no improvement using Numba. The Fortran version uses Accelerate under OS X or ATLAS under Linux for the plane fitting.

The code is available to download from https://github.com/MiXIL/calcSlopeDegrees

Usage
To calculate slope from a DEM where the horizontal sloping is in degrees:

calcSlopeDegrees.py --spacing_degrees inDEM.kea outSlope.kea

To calculate slope by fitting a plane over 9 x 9 window:

calcSlopeDegrees.py --plane_ls --window_size 9 inDEM.kea outSlope.kea

Add a colour table in RSGISLib

For thematic rasters (e.g., classification) it is useful to save a colour scheme with the data for visualisation. This can be accomplished using a colour table saved as fields in an attribute table. There is a function in RSGISLib which will add colours to an existing RAT. To use the function on rasters which don’t already have an attribute table (i.e., the class is stored as the pixel value) one must be created and the pixel values copied to a column.

To accomplish this in RSGISLib the following is used:

import collections
from rsgislib import rastergis

classification='brigalow_regrowth_classification.kea'

# Add histogram (automatically adds attribute table)
rastergis.populateStats(classification, addclrtab=False, \
        calcpyramids=False, ignorezero=False)

# Add pixel values to attribute table
bandStats = []
bandStats.append(rastergis.BandAttStats(band=1, maxField='Class'))

rastergis.populateRATWithStats(classification, \
                                classification, bandStats)

# Add colour table
classcolours = {}
colourCat = collections.namedtuple('ColourCat', \
                        ['red', 'green', 'blue', 'alpha'])
classcolours[0] = colourCat(red=0, green=0, blue=0, alpha=0)
classcolours[1] = colourCat(red=255, green=0, blue=0, alpha=255)
classcolours[2] = colourCat(red=0, green=0, blue=255, alpha=255)
classcolours[3] = colourCat(red=0, green=200, blue=0, alpha=255)
classcolours[4] = colourCat(red=0, green=100, blue=0, alpha=255)
rastergis.colourClasses(classification, 'Class', classcolours)

# Add pyramids (for fast display)
rastergis.populateStats(classification, addclrtab=False, \
          calcpyramids=True, ignorezero=False)

If you don’t have an existing colour scheme you can add a random colour to each class by running ‘rastergis.populateStats’ and setting ‘addclrtab’ to True.

Note: if you add a colour table to a KEA file it will be recognised in TuiView and ArcMap (using the KEA driver) but not QGIS. Colour tables in ERDAS Imagine format work fine in QGIS.

Plot 2D Histogram of Pixel Values from Two Images

Recently I wanted to plot the pixel values of two images against each other. I though it would be good to combine extracting the pixel values (using RIOS) and plotting the data (using matplotlib) in a single script.

The script I wrote (two_band_scatter_plot.py) is available to download from the RSGIS Scripts repository

There are three main steps:

  1. RIOS is used to iterate through the image in small blocks, for each block only pixels that have data in each band are retained.
  2. The numpy.histogram2d is used to produce a 2D histogram.
  3. The histogram is plotted using matplotlib.

The 2D histogram was based on code from http://oceanpython.org/2013/02/25/2d-histogram/. As there are a lot of points, displaying as a 2D histogram makes the information easier to see.

The simplest usage of the script is:

python two_band_scatter_plot.py \
--imageA imageA.kea \
--imageB imageB.kea \
--plot outplot.pdf 

Which will plot the first band of each image. To change the bands used, labels for the plot and scaling additional flags can be passed in.

python two_band_scatter_plot.py \
--imageA imageA.kea \
--imageB imageB.kea \
--bandA 1 --bandB 1 \
--plot outplot.pdf \
--plotMin 0 --plotMax 0.5 \ 
--labelA "13157 - 004 (HH)" \
--labelB "13157 - 006 (HH)" \

The output looks something like this:
TonziR_13157_003_vs_007_hh

The script assumes both images are the same projection and resolution, and RIOS takes care of the spatial information. It would be possible to adapt so RIOS resamples the data, see the RIOS Wiki for more details.

Import CSV into Python using Pandas

One of the features I like about R is when you read in a CSV file into a data frame you can access columns using names from the header file. The Python Data Analysis Library (pandas) aims to provide a similar data frame structure to Python and also has a function to read a CSV. Once pandas has been installed a CSV file can be read using:

import pandas
data_df = pandas.read_csv('in_data.csv')

To get the names of the columns use:

print(data_df.columns)

And to access columns use:

colHH = data_df['colHH']

Or if the column name is a valid Python variable name:

colHH = data_df.colHH

This is only a tiny part of pandas, there are lots of features available (which I’m just getting into). One interesting one is the ability to create pivot table reports from a data frame, similar to Excel.

Export quicklooks with vector overlay using TuiView

Often when you have a large stack of images (e.g., a time series) it’s useful to create quick looks so you can flick through them. Normally I use gdal_translate with the ‘-scale’ and ‘-outsize’ flags, as described in a previous post.

Recently I wanted to quickly export all bands of an image for a presentation, with some points overlain. The TuiView manual describes how it’s possible to automate the ‘Save Current Display’ menu option using Python, which sounded ideal for this task.

Following the instructions to export a single image in the manual, I added a loop to export each band and name the output file with the band name, which produced exactly the result I needed. As this is something I’m likely to need to do again, I tidied the script up, made it a bit more general and added some more of the options available. The script can be downloaded from RSGIS Scripts on Bitbucket.

To export all bands in an image (original task):

python export_quicklook_tuiview.py --inimage tonzi_data_layers.kea \
       --outimage tonzi_data_layers_ql.png \
       --invector sensor_locations.shp \
       --allbands

I also added the options to export an image using the default stretch:

python export_quicklook_tuiview.py --inimage tonzi_data_layers.kea \
       --invector sensor_locations.shp \
       --outimage tonzi_data_layers_ql.png

Or if the data has already been stretched and RGB mapped to bands 1,2,3

python export_quicklook_tuiview.py --inimage tonzi_data_2sd_rgb.kea \
       --invector sensor_locations.shp \
       --outimage tonzi_data_rgb_2sd_ql.png --rgb --nostretch

To add text to the quick looks (e.g., the filename) imagemagick can be used:

convert in_image.png -gravity north -strokewidth 10 \
-pointsize 100 -annotate 0 "Title" out_image_with_title.png

To run for all images in a directory GNU Parallel can be used:

ls *kea | parallel \
python export_quicklook_tuiview.py --inimage {} \
       --invector sensor_locations.shp \
       --outimage {.}_ql.png

Recursively finding files in Python using the find command

The find command on a UNIX system is great for searching through directories for files. There are lots of option available but the simplest usage is:

find PATH -name 'SEARCH'

Combined with the subprocess module in Python, it’s easy to use this search capability within a Python script:

import subprocess

# Set up find command
findCMD = 'find . -name "*.kea"'
out = subprocess.Popen(findCMD,shell=True,stdin=subprocess.PIPE, 
                        stdout=subprocess.PIPE,stderr=subprocess.PIPE)
# Get standard out and error
(stdout, stderr) = out.communicate()

# Save found files to list
filelist = stdout.decode().split()

The above code can be used to get the output from any command line tool in Python.

Update
As noted in the comment by Sam, a more portable way of finding files is to use the os.walk function combined with fnmatch:

import os, fnmatch

inDIR = '/home/dan/'
pattern = '*kea'
fileList = []

# Walk through directory
for dName, sdName, fList in os.walk(inDIR):
    for fileName in fList:
        if fnmatch.fnmatch(fileName, pattern): # Match search string
            fileList.append(os.path.join(dName, fileName))

Converting HDF5 format AirMOSS data to a GDAL image

As NASA’s preferred file format a lot of remote sensing data is now delivered in HDF5 format. However, as there are multiple ways of storing geospatial information within HDF5 reading the data with GDAL often requires some effort. Luckily with the h5py and GDAL Python libraries converting to another GDAL format is easy.

The instructions here are for reading Synthetic Aperture Radar (SAR) data from the AirMOSS mission, a polarimetric P-band system. More details about the AirMOSS mission and links to download data are available from airmoss.jpl.nasa.gov (registration required to download data). However, the general process should be similar for most HDF5 formats.

  1. Import file
  2. import h5py
    
    inHDF5File = "indata.h5"
    
    # Read in data
    indata = h5py.File(inHDF5File)
    
    
  3. Get some information about data
  4. # List keys
    for key in indata.keys():
        print(key)
    
    # List items
    for item in indata.items():
        print(item)
    
    # List attributes
    for att in indata.attrs:
        print(att)
    
    
  5. Extract data
  6. hhData = indata['HHHH']
    vvData = indata['VVVV']
    hvData = indata['HVHV']
    
  7. Get geospatial information
  8. For AirMOSS data the geospatial information are stored as attributes in the HDF5 file and are read using ‘indata.attrs’

    # Set image size
    inXSize = hhData.shape[1]
    inYSize = hhData.shape[0]
    
    # Get coordinates
    minLon = indata.attrs['northwest longitude'] 
    maxLat = indata.attrs['northwest latitude']
    
    # Get pixel resolution
    pixelX = indata.attrs['longitude spacing']
    pixelY = indata.attrs['latitude spacing']
    
  9. Create new GDAL file
  10. You can select any format GDAL can write to as the output format, I’ve chosen to stick with HDF5 and go with the KEA format (which provides an implementation of the full GDAL specification).

    # Set Geospatial Information
    geoTransform = []
    for i in range(6):
        geoTransform.append(0.0)
    geoTransform[0] = minLon # top left x 
    geoTransform[1] = pixelX
    geoTransform[2] = 0
    geoTransform[3] = maxLat # top left y 
    geoTransform[4] = 0
    geoTransform[5] = pixelY
    
    srs = osr.SpatialReference()
    srs.SetWellKnownGeogCS("WGS84")
        
    # Creat output image
    numBands = 3
    gdalDriver = getGDALFormatFromExt(outFileName)
    driver = gdal.GetDriverByName(gdalDriver)
    metadata = driver.GetMetadata()
    newDataset = driver.Create(outFileName, inXSize, inYSize, numBands, gdal.GDT_Float32)
    newDataset.SetGeoTransform(geoTransform)
    newDataset.SetProjection(srs.ExportToWkt())
    
  11. Copy data to new file
  12. newDataset.GetRasterBand(1).WriteArray(hhData)
    newDataset.GetRasterBand(1).SetDescription('HH')
    
    # Close dataset
    newDataset = None
    

    The data should now be readable in anything which can read the output format you selected (e.g., TuiView.)

    The full script can be downloaded from https://github.com/MiXIL/AirMOSS-GDAL-Utilities/, there are some differences in the final script, a loop is used to read and write each band, rather than loading all the data to memory and the no data value of -9999 is changed to 0.

Reading MATLAB files with Python

If you need to read MATLAB (.mat) data files, there is a function within scipy.io which allows you to do this. Usage is simple and well explained in the tutorial:

  1. Import file:
  2. from scipy import io
    
    inMATFile = 'ssurgo_data.mat'
    soildata = io.loadmat(inMATFile)
    
  3. Get a list of keys:
  4. soildata.keys()
    
  5. Extract data to a NumPy array:
  6. soildata_varname = soildata['varname']
    

Reading tabular data in a non-standard format

This post was going to be called ‘faffing around with data’, which might summarise many peoples feelings towards dealing with data that isn’t in the format they need it to be in. An earlier post described adding an ENVI header to binary files so they could be read into GDAL. This deals post details using Python to read tabular data stored in a text file, although many of the functions would apply to extracting any data from a text file.

Built in functions

Before setting out it’s a good idea to confirm the data can’t be read with any of the functions available in Python. If it opens in Excel, it’s likely these will work and they may be an easier option (no point reinventing the wheel!)

The CSV module in the standard Python library is really easy to use and can handle various delimiters:

import csv

# Open file
f = open('soil_moisture.csv', 'rU')
csvfile = csv.reader(f, delimiter=',')

# Skip first line (header)
headerline = next(csvfile)

# Iterate through file
for line in csvfile:
 print('First col = ',line[0])

f.close()

In numpy the genfromtxt function can be used to read a file directly to a numpy array.

smdata = np.genfromtxt('soil_moisture.csv', delimiter=',',skip_header=1)

Depending on how many files you have to process, a bit of clever use of find and replace in a text editor or using the sed command might allow you to get your data into a format that can be read by the built in functions. If this does end up being too messy it will at least provide some insight into the steps needed to write a script to read the data.

Writing a Custom Reader

The basic structure of a custom reader is to iterate through each line in the file and split it into columns.


# Open file
infile = open('soil_moisture.csv', 'rU')

# Read header to separate variable (repeat for multiple lines)
header1 = next(file)

# Iterate through file
for line in infile:
    # Count delimiters (:)
    ncolon = line.count(':')
    # Check we have 3 (4 columns)
    if ncolon == 3:
        # Seperate
        elements = line.split(':')
        print('First col = ', elements[0])
infile.close()

If the data is fixed width and separated by multiple spaces, I use the re module in Python to replace consecutive spaces with another character.

import re
line = 'tonzi  2013   12  18 1  1 9  1   20   15.69'
linesub = re.sub('\s+',':',line)
elements = linesub.split(':')

Where ‘\s+’ is a regular expression for one or more spaces. When choosing the character to use as a replacement, make sure it’s not used anywhere else in the file.

Time Data

Time can be stored in a range of different formats. Python offers ways of storing and manipulating time and data data as part of the time module. One of the easiest ways of getting time from a text file and into the Python representation is to use the ‘struct_time’ function. If each component of the date (year, month etc.,) is stored as a separate column you can use the following:

import time
year = int('20' + line[1])
month = int(line[2])
day = int(line[3])
hour = int(line[4])
minute = int(line[5])
second = int(line[6])

mTimeTS = time.struct_time((year,month,day,hour,minute,second,0,0,0))

If the date is stored in a single column the ‘strptime’ function can be used to convert to a Python time structure. For example:

import time
timeStr = '2014-07-10 08:14:01'
mTimeTS = time.strptime(timeStr,'%Y-%m-%d %H:%M:%S')

Writing data out

The CSV library, mentioned earlier, can also be used to write formatted data out to a CSV file.

outfile = open('outdata.csv', 'w')

outcsv = csv.writer(outfile)

# Write header row
outcsv.writerow(['x','y'])

# Itterate through input file
for line in infile:
    # Read and split data into 'elements'
    # as descriped earlier
    
    # Write first two columns out to CSV
    outcsv.writerow([elements[0],elements[1]])

outfile.close()

Multi-core processing with RSGISLib using tiles

Within RSGISLib there are functions within imageutils for tiling and mosaicking images. These can be combined to split large datasets up for processing using multiple cores or nodes on a HPC. The Python bindings for the createTiles function was written so it returns a list of all tiles created, this allows the list of tiles to be passed to a separate function for processing. Combining with the multiprocessing module in Python provides a simple way of processing large datasets on multiple cores with the createImageMosaic function called at the end to join the tiles back together.

The example below shows how to:

  1. Split an image into temporary tiles (in a directory created using the tempfile module).
  2. Run an RSGISLib function on each tile, in this case imageMath.
  3. Re-mosaic the tiles.
  4. Remove the temp files created.
# Import RSGISLib
import rsgislib
from rsgislib import imageutils
from rsgislib import imagecalc

# Import multiprocessing
from multiprocessing import Pool
# Import tempfile
import tempfile
# Import os
import os

outFormat = 'KEA'
outType = rsgislib.TYPE_32INT

def addOne(inImage):

    outputImage = inImage.replace('.kea','add1.kea')
    expression = 'b1+1'
    imagecalc.imageMath(inImage, outputImage,
                    expression, outFormat, outType)

    return outputImage

if __name__ == '__main__':

    inputImage = 'N06W053_PALSAR_08_HH_utm.kea'
    outImage = 'N06W053_PALSAR_08_HH_utm_addOne.kea'

    # Create temporary directory
    tempDIR = tempfile.mkdtemp(dir='.')
    outBase = os.path.join(tempDIR, 'tile')

    # Create Tiles
    width = 1000
    height = width
    overlap = 5 # Set a 5 pixel overlap between tiles
    offsettiling = 0
    ext='kea'
    temptiles = imageutils.createTiles(inputImage, outBase, width,
             height, overlap, offsettiling, outFormat, outType, ext)

    # Run process on tiles
    pool = Pool()
    temptilesP = pool.map(addOne, temptiles)

    # Mosaic tiles
    backgroundVal = 0.
    skipVal = 0.
    skipBand = 1
    overlapBehaviour = 0

    imageutils.createImageMosaic(temptilesP, outImage, backgroundVal,
               skipVal, skipBand, overlapBehaviour, outFormat, outType)

    # Remove temp tiles and DIR
    removeList = temptiles + temptilesP
    for tile in removeList:
        os.remove(tile)

    os.removedirs(tempDIR)

As all the functions try to write to stdout at the same time it looks a little messy (I still need to figure out a nice way to improve this). Also, although I didn’t run any tests the overhead of creating and mosaicking the tiles will likely make this image maths function (which is not particularly CPU intensive) take longer than running on the entire image in one go. However, it shows the potential for combining the RSGISLib Python bindings with the multiprocessing module.