scripts.processing.gemini_ocr

Gemini OCR / HTR Script

Performs text recognition on label images using the Gemini API. Unlike Tesseract and Google Vision which only handle printed text, Gemini can process printed, handwritten, and mixed labels.

Output format matches the existing pipeline: JSON list of {ID, text, confidence}.

Usage:

python gemini_ocr.py -d <image_dir> -o <output_dir> python gemini_ocr.py -d <output_dir> -o <output_dir> –categories printed handwritten mixed

Functions

main()

parse_arguments()

Parse command-line arguments.