Information Theoretic Metrics for Multi-Class Predictor Evaluation
How do you decide if a predictive model you have built is any good? How do you compare the performance of two models? As time goes on, data changes and you have to rebuild your models — how do you compare the new model’s behavior on the new data with the old model’s behavior on the old data?
The most common metrics used to evaluate a classifier are accuracy, precision, recall and F1 score. These metrics are widely used in machine learning, information retrieval, and text analysis (e.g., text categorization). Each of these metrics is imperfect in some way (each captures only one aspect of predictor performance and can be fooled by a weird data set).
None of them can be used to compare predictors across different datasets.
In April, I presented at the NYC Machine Learning meetup an information-theoretic performance metric which does not suffer from the aforementioned flaws and can be used in both classification (binary and multi-class) and categorization (each example can be placed in several categories) settings.
The code to compute the metric is available under the Apache open-source license.
Hakka labs recorded the video: