On Local Interpretable Model-agnostic Explanations

Vishnu Sharma
6 min readDec 26, 2020

In this blog post, I am going to give a quick overview of the Local Interpretable Model-agnostic Explanation (LIME) tool (paper and code) by Ribeiro et al. It is aimed at explaining the decision-making process of machine/deep learning models. Before we jump into the specifics, let’s first understand what each of the terms means.

  • Local: There local explanations seek to explain the decision taken for a specific example. The other type, global explanations, explain the model itself (in other words, try to explain the decision taken for all the examples)
  • Interpretable: Interpretability and Explainability are the two types of approaches for model transparency. Interpretability means being able to explain how a decision was reached (e.g. printing the nodes in a decision tree). Explainability means generating post-hoc explanations for a decision. For explainability, we try to infer a relationship between the input and output and do not need to worry about the exact process of reaching a decision.
  • Model-agnostic: A model-agnostic method/tool means that its output does not depend on the type of the model. In contrast, a model-dependent or a model-specific model can explain/interpret only a specific type of model (e.g. linear classifier)

Why?

The need for model transparency can be described well by a single word: Trust.

Machine/Deep Learning has shown tremendous potential at solving problems/doing tasks, even outperforming humans at times. We already saw AI at work in many applications (Spotify/Netflix recommendation, Alexa/Google Home, Enhancing photos, to name a few). Yet, the rate of adoption is not quite as fast in areas like healthcare and autonomous driving. Why?

The applications like recommendations do not cause us any harm when AI fails. However, a wrong prescription or failure to stop the vehicle at the right moment may be life-threatening. It is natural for people to fear the unknown and hence lack trust in AI. Even the experts are oblivious to how these models reach a decision, further increasing the lack of trust.

While this somber and alarming tone resonates with a multitude of news articles, I assure you that we are years away from realizing HAL 9000 and SkyNet.

How does LIME work?

The premise is simple: train an interpretable model in the neighborhood of the example and use it to explain the decision.

How LIME works

In the example above, the white and blue surfaces are the decision surfaces classified using a machine/deep learning model. As can be seen, the decision boundary is quite complex. The example for which we wish to explain the decision is shown in the bold red + sign. The brownish + and blue discs are the neighborhood examples of the positive and negative predicted classes. Their sizes represent how much weight is given to them, based on their distance from the original example.

The steps are as following:

  1. Feature Representation Group similar things together, which will act as features in the interpretable domain. For an image, it can be superpixels.
    The paper represents the original d-dimensional data as x ∈ ℝᵈ and the corresponding example in the interpretable domain as x ∈ { 0,1 }ᵈ’. It means that we have a d′-dimensional vector with the values 1/0 indicating whether a feature is present or not.
Input image
Superpixel segments
Superpixels

For examples we are going to look at the tutorial notebook on images by the authors.

The images on the left show examples of such features. In the example, we use superpixel algorithm to generate segments in the image. Superpixel algorithm generates labels for pixels by grouping them by similarity in colors and locations. In the second image, we have 172 such labels/segments (hence, d’ = 172). The boundaries generated by these segment is shown in the last image.

Each of these superpixel act as as a feature to be used as input for LIME. Thus for the original image, we have a 172-dimensional vector with all 1s (all features present). We would generate training datafor LIME by perturbing the presense of these features.

2. Training Data Generation: The data for training the interpretable model requires three things: (a) the input data (perturbation) in interpretable domain z′∈{0,1}ᵈ′, (b) the output data which is the prediction of the machine/deep learning model (NOT the ground truth), and (c) a weight/distance metric quantifying the distance of the perturbed data form the example under consideration πₓ(z)). The perturbation here means variations of the data with the presence/absence of features in the interpretable domain. Thus for an actual example with a 3-dimensional representation {1,1,1}, some of the perturbations could be {0,0,1},{1,0,1},{1,1,0}.

Perturbed example 1
Perturbed example 2
Perturbed example 3

For getting the output data, we need to make the predictions over the perturbed data in the original domain (z ∈ Rᵈ) using the machine/deep learning model. A perturbed data in the original domain can be represented by removing the corresponding superpixel segments, or by replacing them with the mean value of all the pixels in that superpixel. The reason we use the model outputs here instead of the ground truth is that we want to explain the model predictions.

Some examples of perturbed are shown on left. Some of teh superpixels have been removed fromt these images. The corresponding input vector for LIME would be 172-dimensional vectors with 0s at locations where the superpixels are missing in the perturbed image. The corresponding output would be the probability of a dog being present in the image if we feed the perturbed images to it.

3. Model Training: Train a linear model (or another interpretable model) over the inputs, outputs, and weights obtained in the last step. This model should be highly accurate. The example in the notebook uses Ridge Regression.

4. Interpretation: The weights/coefficients in the linear model, corresponding to the prediction for the example under consideration, are used to explain the prediction. The higher the value, the more important a feature is for the prediction.

Final explanation

In the image on left, the explanation generated by LIME are shown. Here only top-10 explanations are shown. The green segments show the parts which contributed the most for predicting a dog. The red segments show the ones which contributed leat or negatively (negative weight).

My Takeaways

The authors give several examples (along with notebooks) to show that LIME works with a variety of datasets. The experiments with human users show that the explanations generated are useful in understanding the model. One part not covered in this post is that model can be used with a submodular function to achieve a global explanation.

Formulation of the problem of interpretability is a great contribution of this paper. It opens doors to using othe interpretable models and not restricting it only to linear models. Further work with this tool proposes using Shapley values (Kernel SHAP).

There are two main limitations of this work in my opinion:

  1. The interpretable domain selection is an important part of the process and is dependent on the user, which means that if the domain is not right, explanations may not be useful sometimes. The authors acknowledge it in the paper.
  2. Similar to other methods for interpretability, LIME generates explanations when a feature is present or absent. What could give more information is replacing one feature with other feature(s) in the neighborhood. This would generate more data in the neighborhood.

References

[1] Ribeiro, Marco Tulio, Sameer Singh, and Carlos Guestrin. “ Why should I trust you?” Explaining the predictions of any classifier. Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining. 2016.

[2] Lundberg, Scott M., and Su-In Lee. “A unified approach to interpreting model predictions.” Advances in neural information processing systems. 2017.

--

--