pipelines Package
The pipelines package provides Docker-based processing pipelines and workflow management.
Package Contents
This package contains a unified Docker configuration and requirements for different processing pipelines:
Dockerfile- Consolidated multi-stage Dockerfile for all pipeline componentsdocker-compose.yml- Unified Docker Compose with profiles for MLI, SLI, and standalone servicesrequirements/- Directory containing specific requirements for different pipeline components
Pipeline Configurations
Gemini Pipeline (Recommended)
The Gemini pipeline uses the Google Gemini API for all vision tasks. It is the recommended pipeline for most users and handles both printed and handwritten labels:
Detection + Classification: Gemini detects all labels in a specimen image, classifies them (printed, handwritten, mixed, identifier, empty), and determines rotation angle
OCR / HTR: Gemini reads text from each label (works for printed AND handwritten)
Post-processing: Text cleaning and consolidation
Entity Recognition: Gemini extracts structured entities (scientific names, collectors, dates, localities)
GBIF + OSM Enrichment: Validates names against GBIF Backbone Taxonomy and geocodes localities with OpenStreetMap
Darwin Core Export: Outputs standardised Darwin Core records (JSON and CSV)
Crop & Cleanup: Optional label cropping and intermediate file removal
Multi-Label Pipeline
The multi-label pipeline processes full specimen images with multiple labels:
Label Detection: Uses Faster R-CNN to detect individual labels
Label Cropping: Extracts detected labels as separate images
Classification: Determines label types (empty, handwritten, printed, identifier)
Processing: Routes labels for appropriate processing
Single-Label Pipeline
The single-label pipeline processes pre-cropped individual label images:
Classification: Determines label types
Rotation Correction: Corrects label orientation
OCR Processing: Extracts text using Tesseract or Google Vision API
Post-processing: Cleans and structures extracted text
Requirements Structure
The requirements/ directory contains specialized dependency files:
gemini.txt- Dependencies for the Gemini pipeline (lightweight: google-genai, opencv, pandas, requests, numpy, nltk)classifier.txt- Dependencies for classification models (traditional)empty_labels.txt- Dependencies for empty label detection (traditional)postprocess.txt- Dependencies for text post-processingrotation.txt- Dependencies for rotation correction (traditional)segmentation.txt- Dependencies for label segmentation (traditional)tesseract.txt- Dependencies for Tesseract OCR (traditional)
Docker Usage
To run the pipelines:
# Gemini pipeline (recommended — lightweight, API-based)
cd pipelines
GEMINI_API_KEY=<your-key> docker-compose --profile gemini up
# Multi-label processing (MLI, traditional)
cd pipelines
docker-compose --profile mli up
# Single-label processing (SLI, traditional)
cd pipelines
docker-compose --profile sli up
# Run individual services (traditional)
cd pipelines
docker-compose up segmentation # Detection only
docker-compose up rotation # Rotation correction only
docker-compose up tesseract # OCR only
docker-compose up classification_nuri # ID/Description classification
docker-compose up classification_hp # Handwritten/Printed classification