label_evaluation.redundancy

Functions

clean_data(data)

Preprocess the dataset by converting text to lowercase, removing punctuation and whitespace, and excluding entries containing 'http'.

per_redundancy(data)

Calculate the percentage of transcription redundancy in a dataset.

redundancy(data)

Identify duplicate entries in a preprocessed dataset.