Tag Archives: UNIX

Add all scripts within a repository to $PATH using envmaster

I have a couple of general scripts repositories for myself and shared with colleagues. These are for scripts which are useful but don’t fit into existing projects or justify having their own repository. An example is rsgis_scripts on bitbucket. The scripts are split into different directories which aren’t available on the main path. To make them available when needed I use EnvMaster (described in a previous post). I have an envmaster module which will search the repository for folders containing executables and add them to $PATH. It will also add folders containing ‘__init__.py’ to $PYTHONPATH.

#%EnvMaster1.0
import os
import glob
 
REPO_PATH = "/home/dan/Documents/Development/rsgis_scripts"
 
# Walk through directory
for d_name, sd_name, f_list in os.walk(REPO_PATH):
    # Ignore hidden directories
    if not d_name.startswith(".") and not ".git" in d_name and not ".hg" in d_name:
        file_list = glob.glob(os.path.join(d_name,"*"))
        # Check if a directory contains executable files
        if True in [(os.path.isfile(f) and os.access(f, os.X_OK)) for f in file_list]:
            module.setBin(d_name)
        # Check for Python libraries
        if len(glob.glob(os.path.join(d_name,"*","__init__.py"))) > 0:
            module.setPython(d_name)

The script is saved as ‘rsgis_scripts’ in ‘$ENVMASTERPATH’.
To load the module and prepend all the folders to ‘$PATH’ use:

envmaster load rsgis_scripts

They will remain on $PATH until you close the terminal or unload the module using:

envmaster unload rsgis_scripts

This script is just an example, it would be easy to modify for different use cases. If you have multiple repositories a module file can be created for each one.

Bash one-liner to untar downloaded Landsat data into separate directories

To make it easier to read I’ve split this into separate lines to post. You could remove the ‘;\’ and it create a multi-line bash script.

When you download Landsat data it comes as a .tar.gz archive (or .tar.bz if you download from Google). To uncompress the files into a separate folder for each scene the following series of bash commands can be used:

for scene in `ls *tar.gz | sed 's/.tar.gz//'`;\
   do echo "Uncompressing ${scene}";\
   mkdir ${scene};\
   mv ${scene}.tar.gz ${scene};\
   cd ${scene};\
   tar -xvf ${scene}.tar.gz;\
   cd -;\
done

This will list all the files matching ‘*.tar.gz’, remove the extension (using sed), print the scene name (echo), make a directory (mkdir), move the .tar.gz archive into it (mv), untar (tar -xvf) and then change back to the original directory (cd -).

You can also use the ‘arcsiextractdata.py’ tool, as detailed in an earlier post

Recessively search for a string within files using grep

To recursively search all files in a directory for ‘SearchText’ and print out matching files you can use the grep command with the ‘–files-with-matches / -l’ and ‘–recursive / -r’ flags. For example, to search the current directory (‘.’):

grep -rl "SearchText" .

Note, if you run this on a directory with lots of subdirectories (e.g., your home folder) it will take a long time to run.

You can specify to only include files matching a pattern using the ‘–include’ flag:

grep -rl --include "*py" "SearchText" .

or exclude files matching a pattern using the ‘–exclude’ flag.

grep -rl --exclude "*pyc" "SearchText" .

For more info see the man page (available online here) or the full manual available at http://www.gnu.org/software/grep/manual/.

Recursively finding files in Python using the find command

The find command on a UNIX system is great for searching through directories for files. There are lots of option available but the simplest usage is:

find PATH -name 'SEARCH'

Combined with the subprocess module in Python, it’s easy to use this search capability within a Python script:

import subprocess

# Set up find command
findCMD = 'find . -name "*.kea"'
out = subprocess.Popen(findCMD,shell=True,stdin=subprocess.PIPE, 
                        stdout=subprocess.PIPE,stderr=subprocess.PIPE)
# Get standard out and error
(stdout, stderr) = out.communicate()

# Save found files to list
filelist = stdout.decode().split()

The above code can be used to get the output from any command line tool in Python.

Update
As noted in the comment by Sam, a more portable way of finding files is to use the os.walk function combined with fnmatch:

import os, fnmatch

inDIR = '/home/dan/'
pattern = '*kea'
fileList = []

# Walk through directory
for dName, sdName, fList in os.walk(inDIR):
    for fileName in fList:
        if fnmatch.fnmatch(fileName, pattern): # Match search string
            fileList.append(os.path.join(dName, fileName))