In order to create a resource allocation and launch tasks you need to submit a batch script. A batch script should contain job specifications such as the partition, number of nodes, number of cores, walltime needed.
To submit a job, use the sbatch
command.
sbatch job_script.sub
Submitting jobs to the scheduling system is probably the most important part of the scheduling system. Because you need to be able to express the requirements of your job so that it can be properly scheduled, is also the most complicated. For SLURM, the lines specifying the job requirements should begin with #SBATCH.
The following tables translates dome of the most commonly used job specifications:
Description | Job Specification |
---|---|
Job Name | --job-name=<job_name> or -J <job_name> |
Partition/Queue | --partition=<queue_name> or -p <queue_name> |
Account/Project | --acount=<account_name> or -A <account_name> |
Number of nodes | --nodes=<number_of_nodes> or -N <number_of_nodes> |
Number of cores (tasks) per node | --ntasks-per-node=<number_of_tasks> |
Walltime Limit | --time=<timelimit> or -t <timelimit> |
Number of GPUs per node | --gres=gpu:<number_of_gpus> or --gpus-per-node=<number_of_gpus> |
Number of GPUs per job | --gpus=<number_of_gpus_per_job> |
Memory requirements per job | --mem=<memory_in_MB> |
Memory requirements per core | --mem-per-cpu=<memory_in_MB> |
Memory requirements per GPU | --mem-per-gpu=<memory_in_MB> |
Standard Output FIle | --output=<filename> or -o <filename> |
Standard Error File | --error=<filename> or -e <filename> |
Combine stdout/stderr | use -o without -e |
Email Address | --mail-user=<email_address> |
Email Type | --mail-type=NONE, BEGIN, END, FAIL, REQUEUE, ALL |
Exclusive Job not sharing resources | --exclusive |
SLURM provides environment variables for most of the values used in the #SBATCH directives.
Environment Variable | Description |
---|---|
$SLURM_JOBID |
Job ID |
$SLURM_JOB_NAME |
Job Name |
$SLURM_SUBMIT_DIR |
Submit directory |
$SLURM_SUBMIT_HOST |
Submit host |
$SLURM_JOB_NODELIST |
Node list |
$SLURM_JOB_NUM_NODES |
Number of nodes allocated to job |
$SLURM_CPUS_ON_NODE |
Number of cores per node |
$SLURM_NTASKS_PER_NODE |
Number of tasks requested |
Users also have the option to get interactive access on the compute nodes using the salloc
command.
$salloc -N <number_of_nodes> --ntasks-per-node= <number_of_cores_per_node> -t <timelimit>
When you want to exit an interactive job, before it has reached its timelimit just type exit
.
When running under the CPU partition you need to specify the number of tasks per node (number of cores per node). If not specified, the default value is 1. In your job script or salloc command you need the option --ntasks-per-node=<number_of_cores_per_node>
.
For example, to submit a job using two cores on one node in your job script you should have
#SBATCH --nodes=1
#SBATCH --ntasks-per-node-2
The equivalent command for interactive access is
salloc -N1 --ntasks-per-node=2
To submit a job using four whole Cyclone nodes (40 cores per node), in your job script you should have
#SBATCH --nodes=4
#SBATCH --ntasks-per-node=40
The equivalent command for interactive access is
salloc -N4 --ntasks-per-node=40
When running under the GPU partition you need to specify the number of gpus per node or per job.
In your job script or salloc command you need one of the below options:
--gres=gpu:<number_of_gpus_per_node>
--gpus-per-node=<number_of_gpus_per_node>
--gpus=<number_of_gpus_per_job>
For example, to submit a job using two GPUs on one node in your job script you should have
#SBATCH --nodes=1
#SBATCH --gres=gpu:2
The equivalent command for interactive access is
salloc -N1 -gres=gpu:2
To submit a job using all GPUs on two nodes on Cyclone (4 GPUs per node), in your job script you should have
#SBATCH --nodes=4
#SBATCH --gpus=8
or
#SBATCH --nodes=4
#SBATCH --gpus-per-node=4
The equivalent command for interactive access is
salloc -N4 --gpus=8
or
salloc -N4 --gpus-per-node=4
There are default memory amounts per core and per GPU. A user can ask for less or more memory per core or per GPU but if not explicitly asked, the default amounts will be given to each job.
The table below shows the default memory amounts per HPC system.
Cyclone Partition | Default memory per core (MB) | Default memory per GPU (MB) |
---|---|---|
cpu | 4800 | N/A |
gpu | 4800 | 48000 |
milan | 2000 | N/A |
p100 | 2000 | 62000 |
nehalem | 4000 | N/A |
a100 | 10000 | 124000 |
skylake | 2000 | N/A |
The above defaults can change using the --mem-per-cpu=<size[units]>
or --mem-per-gpu=<size[units]>
options respectively in your salloc command or in your job script.
For example, a CPU job on a Cyclone node needing 10 cores and 2000 MB of memory per core, can be allocated as below:
salloc -N1 --ntasks-per-node=10 --mem-per-cpu=2000
The default time limit for a job is 1 hour. If --time
is not specified in SLURM, then the submitted job will run only for one hour and then be killed. The maximum time a job can run for is 24 hours.
X11 is now enabled through SLURM. Users can get interactive access on a compute node with the possibility of running programs with a graphical user interface (GUI) directly on the compute node. To achieve that the βx11
flag needs to be given for example:
salloc -N1 --x11
Note that for X11 to work on the compute nodes, users need to login to the cluster with X11 forwarding enabled. For example for Cyclone:
ssh -X username@cyclone.hpcf.cyi.ac.cy
Instead of using the mpirun
or mpiexec
commands you can launch MPI jobs using the SLURM srun
command.
When using OpenMPI you can simply use the srun
command instead of the mpirun command: For example to run an executable called my_exec on 40 cores with mpirun we would type:
mpirun -np 40 ./my_exec
Using srun you should type:
srun -n 40 ./my_exec
When using IntelMPI you first need to set the I_MPI_PMI_LIBRARY
and then use srun:
export I_MPI_PMI_LIBRARY=/usr/lib64/libpmi.so srun -n 40 ./my_exec
To use mpirun when you have already set the I_MPI_PMI_LIBRARY
variable, you first need to unset it:
unset I_MPI_PMI_LIBRARY