label_evaluation Package

The label_evaluation package contains evaluation metrics and analysis tools for assessing system performance.

Package Contents

accuracy_classifier

evaluate_text

iou_scores

redundancy

Modules

Accuracy Classifier

label_evaluation.accuracy_classifier.metrics(target, pred, gt, out_dir=PosixPath('/home/docs/checkouts/readthedocs.org/user_builds/entomological-label-information-extraction/checkouts/latest/docs'))[source]

Build a text report showing the main classification metrics, to measure the quality of predictions of the classification model, and save it to a text file.

Parameters:
  • target (list) – Names matching the classes.

  • pred (pd.DataFrame) – Predicted classes.

  • gt (pd.DataFrame) – Ground truth classes.

  • out_dir (Path) – Directory where the report file will be saved.

Returns:

Classification report as a text output.

Return type:

str

label_evaluation.accuracy_classifier.cm(target, pred, gt, out_dir=PosixPath('/home/docs/checkouts/readthedocs.org/user_builds/entomological-label-information-extraction/checkouts/latest/docs'), title='Classifier')[source]

Compute confusion matrix to evaluate the performance of the classification.

Parameters:
  • target (list) – Names matching the classes.

  • pred (pd.DataFrame) – Predicted classes.

  • gt (pd.DataFrame) – Ground truth classes.

  • out_dir (Path) – Path to the target directory to save the confusion matrix plot.

  • title (str) – Title for the confusion matrix plot.

Return type:

None

Text Evaluation

exception label_evaluation.evaluate_text.EmptyReferenceError(message=None)[source]

Bases: Exception

Custom exception for handling cases where the reference string is empty.

label_evaluation.evaluate_text.calculate_cer(reference, hypothesis)[source]

Calculate the Character Error Rate (CER) between reference and hypothesis.

Parameters:
  • reference (list) – List of reference (ground truth) strings.

  • hypothesis (list) – List of hypothesis (predicted) strings.

Returns:

The computed CER value.

Return type:

float

label_evaluation.evaluate_text.get_gold_transcriptions(filename, sep=',')[source]

Load ground truth transcriptions from a CSV file into a dictionary.

Parameters:
  • filename (str) – Path to the CSV file.

  • sep (str, optional) – Delimiter used in the CSV file. Defaults to β€˜,’.

Returns:

Dictionary with keys as unique identifiers and values as transcription text.

Return type:

dict

label_evaluation.evaluate_text.load_json_predictions(filename)[source]

Load predictions from a JSON file.

Parameters:

filename (str) – Path to the JSON file.

Returns:

List of predictions from the JSON file.

Return type:

list

label_evaluation.evaluate_text.calculate_scores(gold_text, predicted_text)[source]

Calculate Word Error Rate (WER) and Character Error Rate (CER) between ground truth and prediction.

Parameters:
  • gold_text (str) – Ground truth transcription.

  • predicted_text (str) – Predicted transcription.

Returns:

(WER, CER) both rounded to two decimal places.

Return type:

tuple

label_evaluation.evaluate_text.create_plot(data, score_name, file_name)[source]

Create and save a violin plot for the given error scores.

Parameters:
  • data (list) – List of numerical scores to visualize.

  • score_name (str) – Name of the score (e.g., β€œCER” or β€œWER”).

  • file_name (str) – Path to save the plot image.

Return type:

None

label_evaluation.evaluate_text.evaluate_text_predictions(ground_truth_file, predictions_file, out_dir)[source]

Evaluate OCR predictions against a ground truth dataset.

Parameters:
  • ground_truth_file (str) – Path to the ground truth CSV file.

  • predictions_file (str) – Path to the predictions JSON file.

  • out_dir (str) – Output directory for results.

Returns:

(List of WER scores, List of CER scores)

Return type:

tuple

IoU Scores

label_evaluation.iou_scores.calculate_iou(pred_coords, gt_coords)[source]

Calculates Intersection over Union (IOU) scores by comparing predicted and ground truth segmentation coordinates.

Parameters:
  • pred_coords (tuple) – Coordinates for the predicted bounding box (xmin, ymin, xmax, ymax).

  • gt_coords (tuple) – Coordinates for the ground truth bounding box (class, xmin, ymin, xmax, ymax).

Returns:

IOU score.

Return type:

float

label_evaluation.iou_scores.comparison(df_pred_filename, df_gt_filename)[source]

Compare bounding box coordinates and calculate IOU scores.

Parameters:
  • df_pred_filename (pd.DataFrame) – DataFrame with predicted labels.

  • df_gt_filename (pd.DataFrame) – DataFrame with ground truth labels.

Returns:

DataFrame with added IOU scores.

Return type:

pd.DataFrame

label_evaluation.iou_scores.concat_frames(df_pred, df_gt)[source]

Concatenate predicted and ground truth datasets with IOU scores.

Parameters:
  • df_pred (pd.DataFrame) – DataFrame with predicted bounding boxes.

  • df_gt (pd.DataFrame) – DataFrame with ground truth bounding boxes.

Returns:

Concatenated DataFrame with calculated IOU scores.

Return type:

pd.DataFrame

label_evaluation.iou_scores.box_plot_iou(df_concat, accuracy_txt_path=None)[source]

Generate a box plot for IOU scores.

Parameters:
  • df_concat (pd.DataFrame) – DataFrame with IOU scores.

  • accuracy_txt_path (str, optional) – Path to save accuracy percentages.

Returns:

Plotly figure object.

Return type:

go.Figure

Redundancy Analysis

label_evaluation.redundancy.clean_data(data)[source]

Preprocess the dataset by converting text to lowercase, removing punctuation and whitespace, and excluding entries containing β€˜http’.

Parameters:

data (list of dict) – List of dictionaries with labels’ transcription.

Returns:

Preprocessed list of dictionaries.

Return type:

list of dict

label_evaluation.redundancy.redundancy(data)[source]

Identify duplicate entries in a preprocessed dataset.

Parameters:

data (list of dict) – Preprocessed list of dictionaries with labels’ transcription.

Returns:

List of dictionaries containing duplicate entries.

Return type:

list of dict

label_evaluation.redundancy.per_redundancy(data)[source]

Calculate the percentage of transcription redundancy in a dataset.

Parameters:

data (list of dict) – Preprocessed list of dictionaries with labels’ transcription.

Returns:

Percentage of redundant text.

Return type:

int