Tag Archives: os x

UNIX Commands I wish I’d known earlier

I’ve been using command line UNIX / Linux for a while but like many people have just picked up bits as and when I’ve needed them. Here are some tips I wish I’d known when I started out.

  1. Use tab to autocomplete.
  2. This is one of the ones that’s really basic but not mentioned in some tutorials. As you start typing a command or file path just hit tab to complete. You just need to type enough letters for the command / path to be identified.

  3. Use Ctrl+r to search previous commands
  4. You can use the up arrow to cycle through previous commands, a slightly cooler trick is typing ctrl+r to search through your command history. Just press and start typing.

  5. Learn to use an editor from the command line (It doesn’t have to be vi!)
  6. If you’re logged into a server using ssh, being able to quickly open a text file and edit a few lines is very handy, and a lot easier than downloading the file, editing and uploading again. A lot of guides use vi, which is amazingly powerful, pretty much universally available but has a really steep learning curve. I love vi (or more accurately vim, which is an updated version) but it took a lot of effort to get to a stage where I was proficient enough for it to be useful. There are much more user friendly alternatives such as nano or ne. Nano is installed with OS X and most Linux distributions, ne will need to be installed but has more features and familiar commands (ctrl+s to save etc.), you can double-tab escape to bring up a menu bar.

  7. Use tmux, screen or byobu to keep sessions running when you log out.
  8. If you’re logged into a machine over ssh and running an interactive proccess it will often stop when you close the ssh connection. Using GNU screen, tmux or byobu, will allow you to detach your session and the process will continue in the background. You can reattach to check the progress. These also allow you to have multiple terminals within the same ssh session.

  9. Think before you press enter.
  10. An obvious one but on a UNIX system with the right commands you can do pretty much anything, this does mean you can do some ridiculously stupid things, especially with the sudo command. Poor use of rm with wild characters has caught me out before (luckily by this time I’d developed somewhat of a paranoid backup system!). You can always use ls before running rm to check which files will be removed.

    If you’re worried about breaking something on your computer learning the command line, you could set up a virtual machine (e.g., using VirtualBox) with linux while your learning and then if anything goes badly wrong you can just delete the machine and start again.

  11. Googling bits as you need them is no substitute for actually sitting down and learning it.
  12. If it looks like your going to be spending a lot of time using the command line of a UNIX / Linux system (and it’s a very useful skill to have), as with anything worth learning you need to invest the time. There are lots of tutorials on the internet and books available, you may find some more suitable than others. My personal favourite is The Linux Command Line by William Shotts, you can download the PDF for free or buy the hard copy, more information is available here

Managing Software & Libraries with EnvMaster

If you’ve compiled software from source you’re probably used to the following sequence:

sudo make install

Which will configure the software to look for libraries and install itself to the default location (normally /usr/local), make and install. As root privileges are required to install to /usr/local, sudo is required.

This is fine unless:

  1. You’re not in the sudo’ers list (e.g., on a shared computer you’re not an administrator for).
  2. You want to have different versions of things (e.g., stable and development versions).

then things start to get complicated. Installing all the software to a single folder you have permission to write to, e.g., ~/software, by passing the –prefix=~/software flag to configure, then adding ~/software/bin to your $PATH and ~/software/lib to $LD_LIBRARY_PATH would solve the first problem but will still cause problems if you want different versions of things.

Ideally you’d install everything into it’s own directory something like:


which means you’re going to be spending a lot of time hacking around with environmental variables!

This is where EnvMaster comes in, it allows you to install different versions of software and libraries, in a nicely organised directory structure, and sorts out all the paths for you. When it’s properly set up you can load and swap software / libraries round using:

# Load in RSGISLib
envmaster load rsgislib

# Swap to use the developement version of RSGISLib
envmaster swap rsgislib/2.0.0 rsgislib/20131019

To install EnvMaster clone the source from the EnvMaster Bitbucket page and install:

hg clone https://bitbucket.org/chchrsc/envmaster envmaster

export ENVMASTER_ROOT=~/software/envmaster
python setup.py install --prefix=$ENVMASTER_ROOT

Note: as with the rest of the examples, I’m installing to ~/software (where ~ is your home directory). You can use anywhere you have permission to write, just do a mental find and replace, wherever you see ~/software with the path your using. We normally install things to /share/osgeo/fw/

EnvMaster uses a corresponding text file for each library, these need to be stored in a separate directory (ideally on there own). We’ll make one called modules.

mkdir -p ~/software/modules

Once this is set up, create a text file (lets call it ~/software/setupenvmaster) containing the following:

export PYTHONPATH=$ENVMASTER_ROOT/lib/python2.7/site-packages
# Set up path to modules files
export ENVMASTERPATH=~/software/modules
# Initialize EnvMaster
. $ENVMASTER_ROOT/init/bash

And source it using:

. ~/software/setupenvmaster

If you add this line to .bashrc / .bash_profile envmaster will be available in every new terminal.

Note, if you used a different version of python to install envmaster (e.g., python3.3) you need to change pythonX.X to reflect this.

Now if you type

envmaster avail

You should see ‘~/software/modules’ and ‘No module files found’.

EnvMaster is now all set up and it’s time to start installing things.

Let’s install GDAL, first download from here and untar.

./configure --prefix=/home/dan/software/gdal/1.10.1
make install

Note, you probably want to install GDAL with other options (e.g., HDF5), see the RSGISLib documentation for the options recommended if building for RSGISLib.

You then need to set up the files for EnvMaster. Within ~/software/modules make a directory called gdal and create two text files, one called 1.10.1 (the version of gdal) and the other called version.py. In 1.10.1 put the following:



The first line tells EnvMaster this is an EnvMaster file, module.SetAll() sets environmental variables (PATH etc.,) based on the contents of ~/software/gdal/1.10.1.
In version.py put the following:


version = '1.10.1'

This tells EnvMaster the default version of GDAL is 1.10.1

If you run envmaster avail again it should list gdal. You can load GDAL using:

envmaster load gdal

To unload GDAL (and unset paths) use:

envmaster unload gdal

You can try running gdal_translate to check.
To see the environmental variables GDAL set use:

envmaster disp gdal

As well adding the path to general variables (e.g., PATH), EnvMaster has created variables specific to GDAL (e.g., GDAL_LIB_PATH), these are useful when linking from other software.

To see the envmaster modules you have loaded you can use:

envmaster list

There are loads more options available in EnvMaster than shown here. Whilst it does take a bit of time to set up, it allows you to build a highly organised and very flexible system. The user guide for EnvMaster, in LyX format, is available with the source code from here.

A comprehensive list of instructions for building software with EnvMaster under Linux is available from Landcare (here). Note: These were developed for their system and have very generously been made available. Use at your own risk.
Pete Bunting’s instructions for building under OS X are also available here, these have been tested under OS X 10.9.

View the first lines in a text file

A really simple, but useful, UNIX command is head, which will display the first 5 lines in a text file, great for quickly checking outputs. You can also use the n flag to specify the number of line.

# View first 5 lines
head bigtextfile.csv

# View first 10 lines
head -n 10 bigtextfile.csv

There is also a coresponding tail command to view the last lines.

# View first 5 lines
tail bigtextfile.csv

# View first 10 lines
tail -n 10 bigtextfile.csv

GNU Parallel

GNU Parallel is a utility for executing commands in parallel. It provides a really easy way of  running a command over multiple files and utilising multiple cores, using only a single line.

You can download the latest version from:


Or check the package manager for your distro if you’re on linux.

Installation should just be:

sudo make install

There are lots of options and different ways of using parallel, here are a couple of examples:

1. Uncompress and untar all files in a directory.

ls *tar.gz | parallel tar -xf

2. Recursively find RSGISLib XML scripts and run using two cores (-j 2)

find ./ -name 'run_*.xml' | parallel -j 2 rsgisexe -x

3. Find all KEA files and create stats and pyramids using gdalcalcstats from https://bitbucket.org/chchrsc/gdalutils/

ls *kea | gdalcalstats {} -ignore 0

4. Convert KEA (.kea) files to ERDAS imagine format using gdal_translate, removing ‘.kea’ and adding ‘_img.img’

ls *kea | parallel "gdal_translate -of HFA {} {.}_img.img"

# Calculate stats and pyramids after translating
ls *kea | parallel "gdal_translate -of HFA {} {.}_img.img; \

gdalcalcstats {.}_img.img -ignore 0"

Backup with rsync

There are a lot of great tools out there for back up. However, for large remote sensing datasets not all are appropriate. I use the command line tool rsync to backup my data to external hard drives. It only copies data that has changed making it efficient to run.

The command I use is:

rsync -r -u  -p -t --delete --force --progress \

/data/Australia/ /media/Backup1/Australia/

This will recursively search (-r), updating only files that have changed (-u) preserving permissions (-p) and time stamp (-t). Files that are on the back up drive that no longer exist are removed (–delete), including directories (–force). As it can take a while to run, I print the progress to the screen (–progress).

My system is to backup the data on my office computer regularly (weekly or after getting / processing new data) to external hard drives, which I keep at home. To save having to remember the command I have a shell script (backup.sh) in the root directory of the external hard drives. This system works well for me as I my lab has a NAS drive all the data is stored on and all my scripts are stored separate from the data and backed up a lot more regularly.

If you leave your external hard drive connected to your computer you can create a cron job to run your backup script at regular intervals. To create a cron job use:

crontab -e

This will open a text file for editing, the comments (lines beginning with #) explaining the format. To backup up at 4 pm every day, add the following line.

0 16 * * * sh /media/Backup1/backup.sh

Remembering to change path of the backup script. You can change the time (second number) as needed. You can duplicate the line and set a different time to run twice (e.g., in the morning and afternoon).

There are many options for rsync so you can customise the backup to suit your requirements. To see these options type:

rsync --help

As rsync only updates files that have changed and preserves the time stamp, I also use it if I have a folder with lots of large files to copy. Then, if the copy gets interrupted it can be resumed.