scripts.evaluation.cluster_eval

Functions

build_mean_label_vector(model, labels)

Computes the mean vector for each label using the Word2Vec model.

build_word_vectors(labels, ground_truth)

Builds a Word2Vec model from the tokenized labels.

is_word(token)

Checks whether a token is a valid word (not punctuation or too short).

load_cluster_csv(path)

Loads cluster assignments from a CSV file.

load_json(path)

Loads the ground truth JSON file.

main(args)

Main entry point for clustering visualization.

parse_arguments()

Parse command-line arguments and return the parsed arguments.

plot_tsne(label_vectors, clusters, out_path, ...)

Generates and saves a t-SNE scatter plot with cluster coloring and hover text.

tokenize_text(labels, ground_truth)

Tokenizes and lowercases text fields from labels.