Running Jobs
Learn how to submit, monitor, and manage jobs on REPACSS using SLURM.
Job Types
Interactive vs Batch Jobs
Interactive Jobs
interactive -c 8 -p h100
Batch Jobs
sbatch job.sh
sbatch -p zen4 job.sh
sbatch -p h100 job.sh
Job Scripts
Script Templates
Basic Template
#!/bin/bash
#SBATCH --job-name=test
#SBATCH --output=test.out
#SBATCH --error=test.err
#SBATCH --time=01:00:00
#SBATCH --nodes=1
#SBATCH --ntasks=1
#SBATCH --cpus-per-task=1
#SBATCH --mem=4G
# Load modules
module load
# Run program
./my_program
Python Template
#!/bin/bash
#SBATCH --job-name=python_job
#SBATCH --output=python_job.out
#SBATCH --error=python_job.err
#SBATCH --time=01:00:00
#SBATCH --nodes=1
#SBATCH --ntasks=1
#SBATCH --cpus-per-task=8
#SBATCH --mem=32G
# Load required modules
module load gcc
# Activate conda environment
source ~/miniforge3/etc/profile.d/conda.sh
conda activate myenv
# Run Python script
python script.py
GPU Template
#!/bin/bash
#SBATCH --job-name=gpu_test
#SBATCH --output=gpu_test.out
#SBATCH --error=gpu_test.err
#SBATCH --time=01:00:00
#SBATCH --nodes=1
#SBATCH --ntasks=1
#SBATCH --cpus-per-task=1
#SBATCH --gres=gpu:1
# Load modules
module load cuda
# Run program
./gpu_program
Job Management
Submission
sbatch job.sh
- Submit jobsbatch --array=1-10 job.sh
- Array jobssbatch --dependency=afterok:12345 job.sh
- Job dependenciesMonitoring
squeue -u $USER
- Your jobssqueue -p zen4
- Zen4 partitionsqueue -p h100
- H100 partitionControl
scancel 12345
- Cancel specific jobscancel -u $USER
- Cancel all your jobsscancel -p zen4
- Cancel partition jobsResource Requests
Resource Types
- CPU Jobs: Use
--nodes
,--ntasks
,--cpus-per-task
, and--mem
- GPU Jobs: Add
--gres=gpu:1
(orgpu:2
,gpu:4
) - Python Jobs: See Python Environment Setup for specific configurations
- Consider using
--cpus-per-task
for parallel Python processing - Adjust
--mem
based on your data processing needs