Area under ROC and PR curves for weighted and unweighted data
by Jens Keilwagen, Ivo Grosse, and Jan Grau
Precision-recall and ROC curves are highly informative about the performance of binary classifiers, and the area under these curves is a popular scalar performance measure for comparing different classifiers. For many applications, class labels are not provided with absolute certainty, but with some degree of confidence, often reflected by weights or soft labels assigned to the data points. Here, we provide a command line program that uses an interpolation for precision-recall curves (and ROC curves) that can also be used for weighted test data.
After downloading AUC.jar, you can compute the area under the precision-recall and ROC curve from lists of scores provided in one (weighted data) or two (unweighted data) files.
For unweighted data, please use:
java -jar AUC.jar <fg> <bg>
where <fg> and <bg> are files with one classification score per line for the positive (fg) and negative (bg) class, respectively.
For weighted data please use:
java -jar AUC.jar <weighted>
where <weighted> is a tab-delimited file with one classification score and the weights for fg (positive class) and bg (negative class) per line.