Accessing Compute Resources

The Lovelace Cluster uses Slurm to schedule jobs and allocate resources.

Writing a submission script is typically the most convenient way to submit your job to the job submission system. Example submission scripts (with explanations) for the most common job types are provided below.

Interactive jobs are also available and can be particularly useful for developing and debugging applications. More details are available below.

If you have any questions on how to run jobs on Lovelace do not hesitate to contact the HPC Support at hpcsupport@plymouth.ac.uk.

Using Slurm

You typically interact with Slurm by (1) specifying Slurm directives in job submission scripts (see examples below) and (2) issuing Slurm commands from the login nodes.

There are three key commands used to interact with the Slurm on the command line:

sbatch
squeue
scancel

Check the Slurm man page for more advanced commands:

man slurm

The sbatch command

The sbatch command submits a job to Slurm:

sbatch job_script

This will submit your job script job_script to the job-queues. See the sections below for details on how to write job scripts.

The squeue command

Use the command below to view the job queue. For example:

squeue

will list all jobs on Lovelace.

You can view just your jobs by using:

squeue --me

The scancel command

Use this command to delete a job from Lovelace’s job queue. For example:

scancel <jobid>

will remove the job with ID <jobid> from the queue.

Queues

Please note that Slurm job scheduler uses the term ‘partitions’ to refer to queues, and therefore you may see the word partition used interchangeably both here and on other sites.

There are eight partitions available on the Lovelace cluster. These correspond to two sets of four partiitons. These are the standard queues (cpu_shared, cpu, cpu_highmem, gpu_h100, gpu_l40s) and the billed queues (cpu-billed, cpu_highmem-billed, gpu_h100-billed, gpu_l40s-billed).

You may only schedule task on the billed queues if there is funding assosciated with your account. Jobs submitted to these queues will be prioritised over those submitted to the standard queue.

`cpu_shared`

To run on the cpu_shared queue, add the following to the header of your submission script:

#SBATCH -p cpu_shared
#SBATCH --cpus-per-task 1

This specifies that your job requires exactly one CPU core. If your workload supports multithreading/multiprocessing, you may request a higher number of cores by increasing the value of the --cpus-per-task parameter. For example, if your workload users three cores, you could specify --cpus-per-task 3. A value of up to 64 is supported on the Lovelace cluster.

`cpu`

To run on the cpu queue, add the following to the header of your submission script :

#SBATCH -p cpu

This queue will always allocate a full 64 core node exclusively to your job.

`cpu_highmem`

To run on the cpu_highmem queue add the following to the header of your submission script :

#SBATCH -p sbatch

This queue will always allocate a full 64 core node exclusively to your job.

`gpu_h100`

To run on the gpu_h100 queue add the following to the header of your submission script :

#SBATCH -p gpu_h100
#SBATCH --gpus 1

Each job must request at least one GPU if running on this partition.

`gpu_l40s`

To run on the gpu_l40s queue add the following to the header of your submission script :

#SBATCH -p gpu_l40s
#SBATCH --gpus 1

Each job must request at least one GPU if running on this partition.

Sometimes some nodes are “down” and less nodes are available.

If you have special request, contact hpcsupport@plymouth.ac.uk.

Time Limits

Time limits will be applied to all jobs on the free queues (cpu_shared, cpu, cpu_highmem, gpu_h100, gpu_l40s) on the Lovelace cluster. By default, jobs will have a Time Limit of 1 hour. This means that, by default, if your job runs for longer than 1 hour, it will automatically be terminated. In the event that you require your job to run for more than 1 hour, you must add a header specifiying a time limit such as:

#SBATCH --time "12:30:15"

This example increases the time limit from the default limit of 1 hour to a limit of 12 hours, 30 minutes, and 15 seconds.

You can specify a time limit of up to 3 days or, equivalently, 72 hours. The time limit is also used by the job scheduler to, when a reservation is scheduled (e.g. for a training event or for teaching), decide whether to schedule the job in advance of the reservation. As such, a job with a lower time limit specified may sometimes be scheduled more quickly than one that has a higher time limit specified.

Accounts

Users are allocated to Slurm Accounts based on the projects they are part of. Users that are only part of a single project (and thus a single Slurm Account) will have jobs automatically allocated against this project for accounting purposes. Users that are part of multiple projects should include a line in their submission scripts that sets the account for the job as follows.

#SBATCH --account=<account name>

Output from Slurm jobs

Slurm produces the output of your job submission in a file with the format slurm-<Job ID>.out. This file will be created in the working directory from which you submitted the job with sbatch. You can view the job using the less utility as below:

less -r slurm-<Job ID>.out

Examples of Job Submission Scripts

Some examples are given below:

Run a Python Script using Numpy on a CPU Node

The job submission below assumes that the script is called myscript.py and stored in your home folder.

#!/bin/bash
#SBATCH -p cpu

cd
module load py-numpy
python3 myscript.py

If the submission script is called py-numpy.sbatch, you can submit it by running:

sbatch py-numpy.sbatch

Run a GPU Application on a GPU Node

This job runs the nvidia-smi utility on a GPU node with a GPU, returning details about the allocated GPU

#!/bin/bash
#SBATCH -p gpu_l40s
#SBATCH --gpus 1

nvidia-smi