Tag Archives: Downloading

A script to batch download files using PycURL

When downloading a lot of large files (e.g., remote sensing data), it is difficult to do using a browser as you don’t want to set all the files downloading at once and sitting round waiting for one download to finish and another to start is tedious. Also you might want to download files to somewhere other than your ‘Downloads’ folder.

To make downloading files easier you can use the CURLDownloadFileList.py available on Bitbucket.

The script uses PycURL, which is available to install through conda using:

conda install pycurl

It takes a text file with a list of files to download as input, if you were downloading files using a browser you can right click on the line and select ‘Copy Link Location’ (the exact text will vary depending on the browser you are using) instead of downloading the file. Some files follow a logical pattern so you could use this to get a couple of links and then just copy and paste changing as required (e.g., the number of the file, location, date etc.,).

Once you have the list of files to download the script is run using:

python CURLDownloadFileList.py \
     --filelist ~/Desktop/JAXAFileNamesCut.txt \
     --failslist ~/Desktop/JAXAFileNamesFails.txt \
     --outputpath ~/Desktop/JAXA_2010_PALSAR/ 

Any downloads which fail will be added to the file specified by the ‘–failslist’ argument.

By default the script will check if a file has already been downloaded and won’t download it again, you can skip this check using ‘–nofilecheck’. It is also possible to set time to pause between downloads with ‘–pause’, to avoid rejection from a server when doing big downloads. For all the available options run:

python CURLDownloadFileList.py  --help

As it is running from the command line, you can set it running on one machine (e.g., a desktop at the office) and check on the progress remotely (e.g., from your laptop at home) using ssh. So the script keeps running when you you close the session you can run within GNU Screen, which is installed by default on OS X. To start it type:

screen

then type ctrl+a ctrl+d to detach the session. You can reattach using:

screen -R

to check the progress of your downloads.
Alternatively you can use tmux. However this isn’t available by default.