label_processing.utils

Utility functions for the entomological label processing pipeline.

Provides image validation, filename generation, JSON/CSV I/O, NURI format checking, and model integrity verification helpers used across all pipeline variants.

Functions

check_dir(directory)

Checks if the directory contains valid jpg files with integrity validation.

check_nuri_format(transcript)

Check NURI's format in OCR transcription "text".

generate_filename(original_path, appendix[, ...])

Gets the path to a file or directory as an input and returns it with an appendix added to the end.

load_dataframe(filepath_csv)

Loads the CSV file using Pandas.

load_jpg(filepath)

Loads the jpg files using the OpenCV module.

load_json(file)

Load JSON data from a file and deserialize it.

read_vocabulary(file)

Read a CSV file containing vocabulary and convert it to a dictionary.

replace_nuri(transcript)

Correct NURI format in OCR transcription JSON output.

save_json(data, filename, path)

Saves a json file with human-readable format.

validate_image_integrity(filepath[, ...])

Validate image file integrity with strict memory safety limits.

verify_model_integrity(model_path[, ...])

SECURITY: Mandatory model file integrity verification using SHA256 checksums.