Running Containerised MPI Workloads with Slurm
Running Containerised MPI Workloads is not recommended for most users do to its complexity. It is possible however, provides some advantages in reproducibility, and allows use of applications that do not support the environment on the Lovelace cluster (e.g. an application does not support the current version of the Red Hat Enterprise Linux distribution).
This will consist of building a container that is compatibile with the host environment (including Slurm, PMIx, and OFED) with Podman. We extend the container to include the MPI application using the example of HiRep. We will then convert the container to a Singularity container - this makes it easier to allow the container to access MPI, Network and Device information from the host as this is allowed by default by Singularity but restricted by Podman. We will finally write a Job Submission script to run the container and the workload within the container.
Building the OpenMPI Container
We follow the process given in Podman. We start by creating a folder called openmpi
with a file called Dockerfile
with the contents below:
FROM registry.access.redhat.com/ubi9:9.5
RUN dnf groupinstall -y "Development Tools"
RUN dnf install -y https://dl.fedoraproject.org/pub/epel/epel-release-latest-9.noarch.rpm && crb enable && dnf config-manager --set-enabled codeready-builder-for-rhel-9-x86_64-rpms
RUN dnf install -y perl-sigtrap lsof pciutils ethtool gcc-gfortran tcl numactl-libs pciutils-libs tk libnl3
ADD MLNX_OFED_LINUX-24.10-2.1.8.0-rhel9.5-x86_64.tgz /
RUN yes | /MLNX_OFED_LINUX-24.10-2.1.8.0-rhel9.5-x86_64/mlnxofedinstall --user-space-only --without-fw-update
RUN rpm -i /MLNX_OFED_LINUX-24.10-2.1.8.0-rhel9.5-x86_64/RPMS/ucx-knem-1.18.0-1.2410068.x86_64.rpm /MLNX_OFED_LINUX-24.10-2.1.8.0-rhel9.5-x86_64/RPMS/knem-1.1.4.90mlnx3-OFED.23.10.0.2.1.1.rhel9u5.x86_64.rpm
RUN dnf install -y environment-modules wget hwloc hwloc-devel libevent libevent-devel python3 python3-devel pam-devel readline-devel mariadb-devel perl bzip2-devel logrotate numactl-devel
WORKDIR /root
COPY pmix-5.0.6-1.src.rpm /root
COPY .rpmmacros /root
RUN rpmbuild --rebuild --noclean pmix-5.0.6-1.src.rpm
RUN rpm -i /root/rpmbuild/RPMS/x86_64/pmix-5.0.6-1.el9.x86_64.rpm
RUN rm .rpmmacros
COPY prrte-3.0.8-1.src.rpm /root
COPY .rpmmacros /root
RUN rpmbuild --rebuild --noclean prrte-3.0.8-1.src.rpm
RUN rpm -i /root/rpmbuild/RPMS/x86_64/prrte-3.0.8-1.el9.x86_64.rpm
RUN rm .rpmmacros
COPY openmpi-5.0.6-1.src.rpm /root
RUN rpmbuild --rebuild --noclean --define 'configure_options --with-slurm --with-verbs --with-knem=/opt/knem-1.1.4.90mlnx3' openmpi-5.0.6-1.src.rpm
RUN rpm -e openmpi mpitests_openmpi && rpm -i /root/rpmbuild/RPMS/x86_64/openmpi-5.0.6-1.el9.x86_64.rpm
Note that the container above is built against a Red Hat Universal Base Image. Users may choose to use rockylinux instead. We also expect installation files for Nvidia OFED, PMIx Reference Library (OpenPMIX), PMIx Reference RunTime Environment (PRRTE), and OpenMPI to be in the openmpi
folder. Each file can be download from the linked sources.
Additionally, a file called .rpmmacros
with the following contents is expected in the folder:
%_lto_cflags %nil
We then build the container with:
podman build -t openmpi openmpi
Note that we expect the openmpi
folder mentioned previously to be in the current working directory.
Extending the OpenMPI Container with the HiRep Application
We then extend the container using the Dockerfile below. We create a folder called hirep
with a file called Dockerfile
with the contents below:
FROM localhost/openmpi
RUN yum install -y bsdtar
COPY 57bac424dec078bbccb0d3eeb7e32a027d023685.zip /root/hirep.zip
WORKDIR /hirep
RUN bsdtar xvf /root/hirep.zip --strip-components=1
RUN ln -s /usr/bin/python3 /usr/bin/python
RUN \
\
[ -z "$CC" ] && CC="$(which cc)" && \
mpicc -show && \
\
sed -i "s|^CFLAGS =.*|CFLAGS = -Wall -Wshadow -std=c11 -O3 -march=native -pipe |g" Make/MkFlags && \
sed -i "s|^MPICC = .*|MPICC = $(which mpicc) |g" Make/MkFlags && \
sed -i "s|^CC = .*|CC = ${CC} |g" Make/MkFlags && \
sed -i "s|^INCLUDE = .*|INCLUDE = |g" Make/MkFlags && \
\
cd HMC && \
make -j
RUN cd TestProgram/DiracOperator && make -j
We also expect the source files for HiRep at a specific commit to be in the hirep
folder. This can be downloaded from https://github.com/claudiopica/HiRep/archive/57bac424dec078bbccb0d3eeb7e32a027d023685.zip.
Next we build the container with:
podman build -t hirep hirep
Note that we expect the hirep
folder mentioned previously to be in the current working directory.
Convert the Container to a Singularity Container
We now convert the container to a Singularity container. This can be done with the following command:
podman save --format oci-archive hirep | singularity build hirep.sif oci-archive:///dev/stdin
This will create a file called hirep.sif
which contains the container.
Submit the Job with Slurm
Finally, we will create a job submission file for the DiracOperator component of HiRep. First we create an input file called hirep_input_file
with the following contents:
GLB_T = 96
GLB_X = 48
GLB_Y = 48
GLB_Z = 48
NP_T = 8
NP_X = 4
NP_Y = 2
NP_Z = 2
rlx_level = 1
rlx_seed = 12345
We can the submit the job using the following Job Submission script:
#!/bin/bash
#SBATCH -N 2
#SBATCH -n 128
export PMIX_MCA_psec=native
srun --mpi=pmix singularity run -B /users,/scratch hirep.sif /hirep/TestProgram/DiracOperator/speed_test_diracoperator -i hirep_input_file -o hirep_output_file
Note that we expect hirep_input_file
, hirep.sif
and the Job Submission script to be in the current working directory. Note that this example uses PMIx with OpenMPI but other MPI Implementations (such as Intel MPI) may work better with --mpi=pmi2
.
We can submit the job above as normal and, upon completion, you should see the output of the DiracOperator test in a file called hirep_output_file
in the current working directory.