Installation๏ƒ

This guide covers the installation process for the Entomological Label Information Extraction system.

Prerequisites๏ƒ

System Requirements๏ƒ

  • Operating System: Windows 10+, macOS 10.14+, or Linux (Ubuntu 18.04+)

  • Python: 3.10 or higher

  • Memory: 8GB RAM minimum, 16GB recommended

  • Storage: 5GB free space minimum

  • Conda: Required for Python environment management

  • Gemini API Key: Required for the Gemini pipeline (free from Google AI Studio)

  • Tesseract OCR: Required only for the traditional pipelines (not needed for Gemini)

  • Docker: Optional (for containerized execution or HPC)

Software Dependencies๏ƒ

Conda Installation๏ƒ

Conda is required for managing the Python environment.

All Platforms

# Download and install Miniconda
# Visit: https://conda.io/miniconda.html

# Verify installation
conda --version

Gemini API Key๏ƒ

A Gemini API key is required for the Gemini pipeline (recommended). It is free to obtain.

# Get your key from https://aistudio.google.com/apikey
export GEMINI_API_KEY=<your-api-key>

Tesseract OCR Installation๏ƒ

Tesseract is required only for the traditional pipelines (not needed for the Gemini pipeline).

macOS

brew install tesseract

# Verify installation
tesseract --version

Windows

# Download installer from:
# https://github.com/UB-Mannheim/tesseract/wiki

# After installation, verify:
tesseract --version

Linux (Ubuntu/Debian)

sudo apt update
sudo apt install tesseract-ocr

# Verify installation
tesseract --version

Docker Installation (Optional)๏ƒ

Docker is optional - only needed for containerized execution or HPC environments.

macOS

brew install --cask docker
open /Applications/Docker.app

Windows

# Download from: https://docker.com
# Or: winget install Docker.DockerDesktop

Linux

sudo apt install docker.io docker-compose
sudo systemctl start docker
sudo usermod -aG docker $USER  # Optional

Installation Methods๏ƒ

Option 2: pip Installation๏ƒ

# Clone the repository
git clone https://github.com/MargotBelot/entomological-label-information-extraction.git
cd entomological-label-information-extraction

# Create virtual environment
python -m venv venv

# Activate virtual environment
# On Windows:
venv\\Scripts\\activate
# On macOS/Linux:
source venv/bin/activate

# Install package
pip install -e .

Option 3: Development Installation๏ƒ

For developers who want to contribute:

# Clone the repository
git clone https://github.com/MargotBelot/entomological-label-information-extraction.git
cd entomological-label-information-extraction

# Create conda environment
conda env create -f environment.yml
conda activate ELIE

# Install with development dependencies
pip install -e .[dev]

# Install pre-commit hooks
pre-commit install

Option 4: HPC/Cluster Installation (Apptainer)๏ƒ

For high-performance computing environments:

# Build Apptainer container
cd pipelines
apptainer build elie.sif elie.def

# Transfer to HPC cluster
scp elie.sif username@hpc.cluster.edu:/path/on/hpc/

# Run on HPC
apptainer run --bind /scratch/data:/app/data elie.sif mli

See pipelines/HPC_QUICKSTART.md for complete HPC documentation including SLURM job scripts.

Verification๏ƒ

Test Installation๏ƒ

# Verify conda environment
conda activate ELIE

# Check that the package is installed
python -c "import label_processing; print('โœ… Installation successful!')"

# Verify Tesseract is installed
tesseract --version

# Optional: Check Docker (if using containerized execution)
docker --version

# Run health check
python scripts/health_check.py

Test Basic Functionality๏ƒ

# Launch the GUI to test the interface
python launch_gui.py

# Or test with sample data (if available)
python scripts/processing/detection.py --help

Data Directory Setup๏ƒ

The system expects specific directory structures:

# These directories should already exist in the repository
ls data/MLI/input    # Multi-label input directory
ls data/MLI/output   # Multi-label output directory
ls data/SLI/input    # Single-label input directory
ls data/SLI/output   # Single-label output directory

Troubleshooting๏ƒ

Common Issues๏ƒ

Conda not found

Install Miniconda from https://conda.io/miniconda.html and restart your terminal.

Tesseract not found (traditional pipelines only)

Install Tesseract: brew install tesseract (macOS) or sudo apt install tesseract-ocr (Linux). Not needed for the Gemini pipeline.

Docker not found (optional)

Only needed for containerized execution. Install from https://docker.com if needed.

Permission denied with Docker (Linux)

Add your user to the docker group: sudo usermod -aG docker $USER and log out/in.

Conda environment creation fails

Try updating conda: conda update conda and retry.

Import errors

Make sure youโ€™ve activated the environment: conda activate ELIE.

Memory errors

Ensure you have sufficient RAM available. Close other applications if needed.

Getting Help๏ƒ

If you encounter issues:

  1. Check the Troubleshooting guide

  2. Review the error messages carefully

  3. Check system requirements are met

  4. Consult the GitHub issues page

  5. Contact the maintainers

Next Steps๏ƒ

After successful installation:

  1. Read the Quick Start Guide guide

  2. Review the User Guide

  3. Check the API Reference documentation

  4. Try processing some sample images