Model selection should be easy. And it is – if you know how to calculate and interpret ROC curves and AUC scores. That’s what you’ll learn in this article – in 10 minutes if you’re coding along. In 5 if you aren’t.
After reading, you’ll know:
ROC and AUC demistyfied
You can use ROC (Receiver Operating Characteristic) curves to evaluate different thresholds for classification machine learning problems. In a nutshell, ROC curve visualizes a confusion matrix for every threshold.
But what are thresholds?
Every time you train a classification model, you can access prediction probabilities. If a probability is greater than 0.5, the instance is classified as positive. Here, 0.5 is the decision threshold. You can adjust it to reduce the number of false positives or false negatives.
ROC curve shows a False positive rate on the X-axis. This metric informs you about the proportion of negative class classified as positive (Read: COVID negative classified as COVID positive).
On the Y-axis, it shows a True positive rate. This metric is sometimes called Recall or Sensitivity, so keep that in mind. It informs you about the positive class proportion that was correctly classified (Read: COVID positive and classified as COVID positive).
Refer to the following image for a refresher in the confusion matrix and TPR/FPR calculation:
Great, but what is AUC?
AUC represents the area under the ROC curve. Higher the AUC, the better the model at correctly classifying instances. Ideally, the ROC curve should extend to the top left corner. The AUC score would be 1 in that scenario.
Let’s go over a couple of examples. Below you’ll see random data drawn from a normal distribution. Means and variances differ to represent centers for different classes (positive and negative).
For a great model, the distributions are entirely separated:
You can see that this yields an AUC score of 1, indicating that the model classifies every instance correctly.
Can AUC be 0? Yes – it means the model is reciprocating the classes. In other words, it’s predicting positive classes and negative and vice versa. Take a look at the image below:
Can you think of a quick way of turning a 0% accurate model into a 100% one? Let’ me know in the comment section below.
Finally, there’s a scenario when AUC is 0.5. It means the model is useless. Just think about it, you ask a model whether someone is positive or negative, and it tells you: well, maybe it’s positive, maybe it’s negative (50:50). That’s useless for binary classification tasks.
Here’s how the ROC curve looks like when AUC is 0.5:
Now you know the theory. Let’s connect it with practice next.
Using ROC and AUC in Python
You’ll use the White wine quality dataset for the practical part. Here’s how to load it with Python:
The first couple of rows look like this:
Initially, this is not a binary classification dataset, but you can convert it to one. Let’s say the wine is Good if the
quality is 7 or above, and Bad otherwise:
There’s your binary classification dataset. Let’s visualize the counts of good and bad wines next. Here’s the code:
And here’s the chart:
And there’s nothing more to do with regards to preparation. You can make a train/test split next:
Great! The snippet below shows you how to train logistic regression, decision tree, random forests, and extreme gradient boosting models. It also shows you how to grab probabilities for the positive class. It will come in handy later:
You can visualize the ROC curves and calculate the AUC now. The only requirement is to remap the Good and Bad class names to 1 and 0, respectively.
The following code snippet visualizes the ROC curve for the four trained models and shows their AUC score on the legend:
Here’s the corresponding visualization:
No perfect models here, but all of them are far away from the baseline (unusable model). The random forest algorithm is the best, with a 0.93 AUC score. That’s amazing for the preparation and feature engineering we did.
In a nutshell, you can use ROC curves and AUC scores to choose the best machine learning model for your dataset. Image 7 shows you how easy it is to interpret the ROC curves, even when there are multiple curves on the same chart.
If you need a completely automated solution, look only at the AUC and select the model with the highest score.
What’s your approach to model selection? Let me know in the comment section.