scripts Package
The scripts package contains standalone utilities, evaluation scripts, and processing tools.
Package Contents
Health Check Script for Entomological Label Information Extraction Validates system requirements and provides diagnostic information. |
|
Modules
Health Check
Health Check Script for Entomological Label Information Extraction Validates system requirements and provides diagnostic information.
- scripts.health_check.check_python_version()[source]
Check Python version and provide recommendations.
- scripts.health_check.check_docker()[source]
Check Docker installation and status.
- scripts.health_check.check_project_structure()[source]
Check if we’re in the correct project directory.
- scripts.health_check.check_system_resources()[source]
Check available system resources.
- scripts.health_check.check_dependencies()[source]
Check for optional dependencies.
- scripts.health_check.main()[source]
Run comprehensive health check.
Evaluation Scripts
The evaluation subpackage contains comprehensive evaluation and analysis tools:
- scripts.evaluation.analysis_eval.parse_arguments()[source]
Parse command-line arguments using argparse.
- Returns:
Parsed command-line arguments.
- Return type:
- scripts.evaluation.analysis_eval.evaluate_labels(empty_folder, not_empty_folder)[source]
Evaluate predicted labels against ground truth labels.
- scripts.evaluation.analysis_eval.main()[source]
Main function to execute label evaluation.
- scripts.evaluation.classifiers_eval.parse_arguments()[source]
Parse command-line arguments and return the parsed arguments.
- Returns:
Parsed command-line arguments.
- Return type:
- scripts.evaluation.classifiers_eval.main()[source]
Main function to evaluate classifier accuracy and generate reports.
- scripts.evaluation.cluster_eval.parse_arguments()[source]
Parse command-line arguments and return the parsed arguments.
- Returns:
Parsed command-line arguments.
- Return type:
- scripts.evaluation.cluster_eval.is_word(token)[source]
Checks whether a token is a valid word (not punctuation or too short). :param token: The token to check. :type token: str
- scripts.evaluation.cluster_eval.tokenize_text(labels, ground_truth)[source]
Tokenizes and lowercases text fields from labels. :param labels: Labels to tokenize. :type labels: List[Dict[str, str]] or Dict[str, tuple[str, str]] :param ground_truth: Whether the labels are ground truth data. :type ground_truth: bool
- scripts.evaluation.cluster_eval.build_word_vectors(labels, ground_truth)[source]
Builds a Word2Vec model from the tokenized labels. :param labels: Labels to build vectors from. :type labels: List[Dict[str, str]] or Dict[str, tuple[str, str]] :param ground_truth: Whether the labels are ground truth data. :type ground_truth: bool
- Returns:
A tuple containing the trained Word2Vec model and the tokenized labels.
- Return type:
- scripts.evaluation.cluster_eval.build_mean_label_vector(model, labels)[source]
Computes the mean vector for each label using the Word2Vec model. Also tracks labels that have no valid tokens (and thus no vector). :param model: The trained Word2Vec model. :type model: gensim.models.Word2Vec :param labels: Tokenized labels with IDs. :type labels: List[Dict[str, List[str]]]
- Returns:
A tuple containing a dictionary of mean vectors and a list of skipped IDs.
- Return type:
- scripts.evaluation.cluster_eval.load_json(path)[source]
Loads the ground truth JSON file. :param path: Path to the JSON file. :type path: str
- scripts.evaluation.cluster_eval.load_cluster_csv(path)[source]
Loads cluster assignments from a CSV file. :param path: Path to the CSV file. :type path: str
- scripts.evaluation.cluster_eval.plot_tsne(label_vectors, clusters, out_path, verbose, skipped_ids)[source]
Generates and saves a t-SNE scatter plot with cluster coloring and hover text. Also includes skipped labels (no vectors) as a separate “No vector” cluster. :param label_vectors: Dictionary of label IDs to their mean vectors. :type label_vectors: Dict[str, np.ndarray] :param clusters: Dictionary mapping label IDs to their cluster ID and transcript. :type clusters: Dict[str, List[str]] :param out_path: Path to save the t-SNE plot HTML file. :type out_path: str :param verbose: Whether to print verbose output. :type verbose: bool :param skipped_ids: List of label IDs that had no valid tokens and thus no vector. :type skipped_ids: List[str]
- scripts.evaluation.cluster_eval.main(args)[source]
Main entry point for clustering visualization. Loads data, trains embeddings, computes vectors, runs t-SNE, and saves plot. :param args: Parsed command-line arguments. :type args: argparse.Namespace
- Returns:
None
- scripts.evaluation.detection_eval.parse_arguments()[source]
Parse command-line arguments and return the parsed arguments.
- Returns:
Parsed command-line arguments.
- Return type:
- scripts.evaluation.detection_eval.main()[source]
Main function to evaluate IOU scores and generate visualizations.
- scripts.evaluation.ocr_eval.parse_arguments()[source]
Parse command-line arguments and return the parsed arguments.
- Returns:
Parsed command-line arguments.
- Return type:
- scripts.evaluation.ocr_eval.main()[source]
Main function to evaluate OCR predictions and save results.
- scripts.evaluation.redundancy.parse_arguments()[source]
Parse command-line arguments and return the parsed arguments.
- Returns:
Parsed command-line arguments.
- Return type:
- scripts.evaluation.redundancy.main()[source]
Main function to evaluate redundancy in a dataset and save results.
- scripts.evaluation.rotation_eval.parse_arguments()[source]
Parse command-line arguments and return the parsed arguments.
- Returns:
Parsed command-line arguments.
- Return type:
- scripts.evaluation.rotation_eval.load_images(input_image_dir)[source]
Load images from the given directory and extract ground truth labels.
- scripts.evaluation.rotation_eval.rotate_image(img_path, angle)[source]
Rotate the image by the given angle and save it back to the same path.
- scripts.evaluation.rotation_eval.evaluate_rotation_model(input_image_dir, output_folder_path)[source]
Load model, predict rotations, and evaluate performance.
- scripts.evaluation.rotation_eval.main()[source]
Main function to execute rotation model evaluation.
Post-processing Scripts
The postprocessing subpackage provides tools for result consolidation and processing:
Consolidate Pipeline Results Script
Creates a single JSON file that links all per-label results across the pipeline (detection → classification → rotation → OCR → post‑processing).
Supports both the traditional (TensorFlow-based) pipeline and the Gemini pipeline.
Output is a flat list of per-label entries, each containing: source_image,
label_filename, label_index, category, bounding-box coordinates,
rotation_angle, and ocr (method, text, confidence).
- scripts.postprocessing.consolidate_results.parse_arguments()[source]
Parse command-line arguments.
- Return type:
- scripts.postprocessing.consolidate_results.consolidate_results(output_dir)[source]
Auto-detect pipeline type and consolidate all results.
- scripts.postprocessing.consolidate_results.main()[source]
Main entry point.
- scripts.postprocessing.process.parse_arguments()[source]
Parse command-line arguments using argparse.
- Returns:
Parsed command-line arguments.
- Return type:
- scripts.postprocessing.process.process_ocr_output(ocr_output, outdir)[source]
Process OCR output to identify Nuri labels, empty labels, and correct plausible labels.
- scripts.postprocessing.process.main()[source]
Main function to parse arguments and execute OCR processing.
Processing Scripts
The processing subpackage contains core processing utilities:
- scripts.processing.analysis.parse_arguments()[source]
Parse command-line arguments using argparse.
- Returns:
Parsed command-line arguments.
- Return type:
- scripts.processing.analysis.validate_directories(input_dir, output_dir)[source]
Validate that the input directory exists and create the output directory if needed.
- scripts.processing.analysis.main()[source]
Main execution function. Parses command-line arguments, validates directories, processes images, and prints the execution duration.
- scripts.processing.classifiers.parse_arguments()[source]
Parse command-line arguments for the classification script.
- Returns:
Parsed command-line arguments.
- Return type:
- scripts.processing.classifiers.resolve_default_model_path(model_int)[source]
Get the default model path based on model number using centralized configuration.
- Parameters:
model_int (int) – Model number (1-3)
- Returns:
Path to the default model
- Return type:
Path
- scripts.processing.classifiers.get_class_names_by_model(model_int)[source]
Return default class names for the selected model number using centralized configuration.
- scripts.processing.classifiers.load_class_names_from_file(path)[source]
Load class names from a text file (one per line).
- scripts.processing.classifiers.main()[source]
Main function to execute classification using a TensorFlow model.
- Return type:
None
- class scripts.processing.detection.OptimizedPredictLabel(path_to_model, classes, threshold=0.8, use_cache=True)[source]
Bases:
objectOptimized version of PredictLabel with caching and streamlined loading.
- load_model_optimized()[source]
Load model with optimized strategy.
- Return type:
detecto.core.Model
- scripts.processing.detection.parse_arguments()[source]
Parse command-line arguments using argparse.
- Returns:
Parsed command-line arguments.
- Return type:
- scripts.processing.detection.clear_model_cache()[source]
Clear all cached models.
- scripts.processing.detection.setup_device(device_arg)[source]
Setup optimal device for inference.
- scripts.processing.detection.main()[source]
Main execution function with performance optimizations.