Python Environment Setup

Learn how to set up and manage Python environments on REPACSS using Miniforge.

Overview

REPACSS provides Python through the module system, but for better control and reproducibility, users are encouraged to set up their own conda environments. This guide will help you create and manage your Python environments using Miniforge.

Miniforge is a minimal installer for Conda that includes the Mamba libraries. It allows users to install the Conda package manager with conda-forge set as the default channel. We recommend Miniforge over Miniconda or Anaconda for several reasons:

  • Community-Focused: Miniforge is a community-driven project specifically designed to work seamlessly with the conda-forge channel, ensuring access to a vast and frequently updated collection of packages.
  • No Default Channels: Unlike Anaconda and Miniconda, which include the "defaults" channel that may lead to outdated packages or conflicts, Miniforge sets up and is only dependent on conda-forge, providing a more consistent and reliable package management experience.
  • Lightweight Installation: Miniforge provides a minimal installer, which means it occupies less disk space and allows users to install only the packages they need.
  • Enhanced Compatibility: Using conda-forge as the primary channel improves compatibility with numerous scientific packages that may not be available in the default channels.

Prerequisites

  • Access to REPACSS login nodes
  • Basic knowledge of Linux commands
  • Understanding of Python package management

Removing Existing Conda Installations

Before installing Miniforge, it's recommended to remove any existing conda installations to avoid conflicts.

Check for Existing Conda Installations

Check if conda is installed:

conda --version

If conda is not installed, you will receive a "command not found" error and can proceed with the installation.

Verify existing conda installations:

ls -al | grep conda

Remove any existing conda-related directories and files:

find . -maxdepth 1 -name \*conda\* -exec rm -ir +

This command will ask for confirmation before deleting each file or directory. Answer:

  • y to remove the item
  • n to keep the item

Setting Up Miniforge

Installation

1. Download Miniforge:

wget https://github.com/conda-forge/miniforge/releases/latest/download/Miniforge3-Linux-x86_64.sh

2. Make the installer executable:

chmod +x Miniforge3-Linux-x86_64.sh

3. Run the installer:

./Miniforge3-Linux-x86_64.sh
  • Follow the prompts and accept the license agreement
  • When asked for the installation location, use your home directory: /home/$USER/miniforge3

4. Initialize Miniforge:

source ~/.bashrc

Basic Usage

Create a new environment:

conda create -n myenv python=3.11

Activate your environment:

conda activate myenv

Install packages:

conda install numpy pandas scipy
# or
pip install package_name

Deactivate environment:

conda deactivate

Best Practices

Environment Management

  • Create separate environments for different projects
  • Use environment.yml files for reproducibility
  • Keep your base environment clean
  • Regularly update conda
# Export environment
conda env export > environment.yml

# Create environment from file
conda env create -f environment.yml

# Update conda
conda update conda

Storage Considerations

  • Store environments in your home directory
  • Use scratch space for large datasets
  • Clean up unused environments
# Remove unused environment
conda env remove -n envname

Running Python Jobs

Interactive Sessions

Request an interactive session:

interactive -c 8 -p h100

Activate your environment:

conda activate myenv

Run Python:

python script.py

Batch Jobs

Create a job script (e.g., run_python.sh):

#!/bin/bash
#SBATCH --job-name=python_job
#SBATCH --output=python_job.out
#SBATCH --error=python_job.err
#SBATCH --time=01:00:00
#SBATCH --partition=h100
#SBATCH --nodes=1
#SBATCH --ntasks=1
#SBATCH --cpus-per-task=8
#SBATCH --mem=32G

# Load any required modules
module load gcc

# Activate conda environment
source ~/miniforge3/etc/profile.d/conda.sh
conda activate myenv

# Run Python script
python script.py

Submit the job:

sbatch run_python.sh

Common Issues and Solutions

Environment Activation Fails

  • Ensure Miniforge is properly initialized
  • Check if the environment exists: conda env list

Package Installation Issues

  • Try installing from conda-forge channel: conda install -c conda-forge package_name
  • Use pip as a fallback

Memory Issues

  • Monitor memory usage: top or htop
  • Adjust batch job memory requests