Black-box models aren’t cool anymore. It’s easy to build great models nowadays, but what’s going on inside? That’s what Explainable AI and LIME try to uncover.
Don’t feel like reading? Check out my video on the topic:
Knowing why the model makes predictions the way it does is essential for tweaking. Just think about it – if you don’t know what’s going on inside, how the hell will you improve it?
LIME isn’t the only option for machine learning model interpretation. The alternative is SHAP. You can learn more about it here:
Today we also want to train the model ASAP and focus on interpretation. Because of that, the identical dataset and modeling process is used.
After reading this article, you shouldn’t have any problems with explainable machine learning. Interpreting models and the importance of each predictor should become second nature.
The article is structured as follows:
What is LIME?
The acronym LIME stands for Local Interpretable Model-agnostic Explanations. The project is about explaining what machine learning models are doing (source). LIME supports explanations for tabular models, text classifiers, and image classifiers (currently).
To install LIME, execute the following line from the Terminal:
pip install lime
In a nutshell, LIME is used to explain predictions of your machine learning model. The explanations should help you to understand why the model behaves the way it does. If the model isn’t behaving as expected, there’s a good chance you did something wrong in the data preparation phase.
You’ll now train a simple model and then begin with the interpretations.
You can’t interpret a model before you train it, so that’s the first step. The Wine quality dataset is easy to train on and comes with a bunch of interpretable features. Here’s how to load it into Python:
The first couple of rows look like this:
All attributes are numeric, and there are no missing values, so you can cross data preparation from the list.
Train/Test split is the next step. The column
quality is the target variable, with possible values of good and bad. Set the
random_state parameter to 42 if you want to get the same split:
Model training is the only thing left to do.
ScikitLearn will do the job, and you’ll have to fit it on the training set. You’ll get an 80% accurate classifier out of the box (
And that’s all you need to start with model interpretation. You’ll learn how in the next section.
To start explaining the model, you first need to import the LIME library and create a tabular explainer object. It expects the following parameters:
training_data– our training data generated with train/test split. It must be in a Numpy array format.
feature_names– column names from the training set
class_names– distinct classes from the target variable
mode– type of problem you’re solving (classification in this case)
Here’s the code:
And that’s it – you can start interpreting! A bad wine comes in first. The second row of the test set represents wine classified as bad. You can call the
explain_instance function of the
explainer object to, well, explain the prediction. The following parameters are required:
data_row– a single observation from the dataset
predict_fn– a function used to make predictions. The
predict_probafrom the model is a great option because it shows probabilities
Here’s the code:
show_in_notebook function shows the prediction interpretation in the notebook environment:
The model is 81% confident this is a bad wine. The values of alcohol, sulphates, and total sulfur dioxide increase wine’s chance to be classified as bad. The volatile acidity is the only one that decreases it.
Let’s take a look at a good wine next. You can find one at the fifth row of the test set:
Here’s the corresponding interpretation:
Now that’s the wine I’d like to try. The model is 100% confident it’s a good wine, and the top three predictors show it.
That’s how LIME works in a nutshell. There are different visualizations available, and you are not limited to interpreting only a single instance, but this is enough to get you started. Let’s wrap things up in the next section.
Interpreting machine learning models is simple. It provides you with a great way of explaining what’s going on below the surface to non-technical folks. You don’t have to worry about data visualization, as the LIME library handles that for you.
This article should serve you as a basis for more advanced interpretations and visualizations. You can always learn further on your own.
What are your thoughts on LIME? Do you want to see a comparison between LIME and SHAP? Please let me know.