Using HPC Environments

Note

Prerequisites: This section uses the Myriad compute cluster to run Apptainer. An active account on Myriad is required. Before using Apptainer on another HPC system, consult the user guide for that system; but most of this content will apply to any system that supports Apptainer.

Conducting analyses on high performance computing clusters happens through very different patterns of interaction than running analyses on a VM or on your own laptop. When you login, you are on a node that is shared with lots of people. Trying to run jobs on that node is not “high performance” at all. Those login nodes are just intended to be used for moving files, editing files, and launching jobs.

Most jobs on a HPC cluster are neither interactive, nor realtime. When you submit a job to the scheduler, you must tell it what resources you need (e.g. how many nodes, what type of nodes) and what you want to run. Then the scheduler finds resources matching your requirements, and runs the job for you when it can.

For example, if you want to run the command:

apptainer exec docker://python:latest /usr/local/bin/python --version

On a HPC system, your job submission script would look something like:

#!/bin/bash -l

# Batch script to run a serial job under SGE.

# Request ten minutes of wallclock time (format hours:minutes:seconds).
#$ -l h_rt=0:10:0

# Request 1 gigabyte of RAM (must be an integer followed by M, G, or T)
#$ -l mem=1G

# Set the name of the job.
#$ -N Apptainer

# Set the working directory to somewhere in your scratch space.
# This is a necessary step as compute nodes cannot write to $HOME.
# Replace "<your_UCL_id>" with your UCL user ID.
#$ -wd /home/<your_UCL_id>/Scratch/apptainer

module load apptainer
apptainer exec docker://python:latest /usr/local/bin/python --version

This example is for the SGE scheduler as implemented at UCL. Each of the #$ lines looks like a comment to the bash kernel, but the scheduler reads all those lines to know what resources to reserve for you. These lines are called directives.

Note

Every HPC cluster is a little different, but they almost universally have a “User’s Guide” that serves both as a quick reference for helpful commands and contains guidelines for how to be a “good citizen” while using the system. For UCL Research Computing’s Myriad system, the user guide is at: UCL Research Computing Documentation

How do HPC systems fit into the development workflow?

A couple of things to consider when using HPC systems:

Using ‘sudo’ is not allowed on HPC systems, and building an Apptainer container from scratch requires sudo. That means you have to build your containers on a different development system, which is why we started this course developing Docker on your own laptop. You can pull a docker image on HPC systems.
If you need to edit text files, command line text editors don’t support using a mouse, so working efficiently has a learning curve. There are text editors that support editing files over SSH. This lets you use a local text editor and just save the changes to the HPC system.

In general, most developers that work with containers develop their code locally and then deploy their containers to HPC systems to do analyses at scale. If the containers are written in a way that accommodates the small differences between the Docker and Apptainer runtimes, the transition is fairly seamless.

Differences between Docker and Apptainer

Host Directories

Docker: None by default. Use -v <source>:<destination> to mount a source host directory to an arbitrary destination within the container.

Apptainer: Mounts your current working directory, $HOME directory, and some system directories by default. Other defaults may be set in a system-wide configuration. The --bind flag is supported but rarely used in practice.

User ID

Docker: Defined in the Dockerfile, but containers run as root unless a different user is defined or specified on the command line. This user ID only exists within the container, and care must be taken when working with files on the host filesystem to make sure permissions are set correctly.

Apptainer: Containers are run in “userspace”, so you are the same user and user ID both inside and outside the container.

Image Format

Docker: Containers are stored in layers and managed in a repository by Docker. The docker images command will show you what containers are on your local machine and images are always referenced by their repository and tag name.

Apptainer: Containers are files. Apptainer can build a container on the fly if you specify a repository, but ultimately they are stored as individual files, with all the benefits and dangers inherent to files.

Running a Batch Job on Myriad

Please login to the Myriad system, just like we did at the start of the previous section. You should be on one of the login nodes of the system.

We will not be editing much text directly on Myriad, but we need to do a little. If you have a text editor you prefer, use it for this next part. If not, the nano text editor is probably the most accessible for those new to Linux.

Create a file called “pi.sh”:

$ cd
$ cd Scratch
$ mkdir containers-at-arc
$ cd containers-at-arc
$ nano pi.slurm

Those commands should open a new file in the nano editor. Either type in (or copy and paste) the following job script.

#!/bin/bash -l

# Batch script to run a serial job under SGE.

# Request ten minutes of wallclock time (format hours:minutes:seconds).
#$ -l h_rt=0:10:0

# Request 1 gigabyte of RAM (must be an integer followed by M, G, or T)
#$ -l mem=1G

# Set the name of the job.
#$ -N calculate-pi

# Set the working directory to somewhere in your scratch space.
# This is a necessary step as compute nodes cannot write to $HOME.
# Replace "<your_UCL_id>" with your UCL user ID.
#$ -wd /home/<your_UCL_id>/Scratch/containers-at-arc

module load apptainer

echo "running the lolcow container:"
apptainer run docker://godlovedc/lolcow:latest

echo "estimating the value of Pi:"
apptainer exec docker://<your_DockerHub_username>/pi-estimator:0.1 pi.py 10000000

Don’t forget to replace <your_DockerHub_Username> with your DockerHub username! If you didn’t publish a pi-estimator container from the previous sections, you are welcome to use “cdkharris” as the username to pull Camilla Harris’s container.

Once you are done, try submitting this file as a job to SGE.

$ qsub pi.sh

You can check the status of your job with the command qstat.

Once your job has finished, take a look at the output:

$ cat calculate-pi.o*

If your containers ran successfully, then congratulations! While this was just a toy example, you have now gone through all the motions of a development lifecycle:

capturing your code and requirements as a Docker recipe
deploying your own code to run on your laptop and a HPC system
using someone else’s container both on your laptop and a HPC system
publishing your code to DockerHub so that it can be shared with others