Using rationales and influential training examples to (attempt to) explain neural predictions in NLP

09 Sep 2020 - Arun Balajiee

Talk Speaker: Byron C Wallace

Talk Date: 2020-09-09

This talk was divided into three sections – firstly Dr. Wallace explained the idea of interpretability of NLP models proceeded with talk about influence functions NLP models and finally concluded with sequence tagging.

Interpretability of the model is to understand what information gets encoded in the input variable, to be able to explain the outcome of the NLP model. In many cases such as healthcare applications, it matters a lot to be able to intepret the NLP models correctly. In some cases where we want to eliminate bias, this would be essential as well. Also, understanding if the model works well based on the training data is important to be able to say if the training set is helping us achieve the outcomes we want. One of the popular ideas behind this is the notion of introducing “rationales” to explain classifications, to provide a bit-level justification of the input vector. If we consider the use case of sentiment analysis, if we get a positive sentimental outcome, we want to be able to say why or how our model arrived at that. The idea started with the work of Zaidan et al. This notion has been extended a lot further by the people at Dr. Wallace’s lab. The aspects considered for rationale is to differentiate “plausibility” (whether the encoded information and the reasoning argument for the outcome makes sense to a human) vs “faithful”-ness ( whether the rationale is related with the outcome) of the NLP models. So the idea here is to separate the phases of input encoding and prediction to be able to understand the process of information processing of the model.

The discussion was to explaining the idea of generating discrete rationales for a given text, using a model that could work with gradient descent ( use a non-differential outcome variable to describe rationale, using a continous function in the model). The contribution of Wallace et al. is being able to achieve this with high accuracy and with high plausibility in their model named FLESH

The second and third parts of the talk were about identifying the different parts of input, similar to an ablation study, that influence the outcome variable. These ideas have larger implications in the areas of natural language inference, identifying artifacts in the output and handling artifacts in the input variable, being able to handle cases where the input text has a few replacements with different nationality profiles and still be able to produce outcome with high accuracy.

I think that this talk was very relevant to my understanding how NLP models work. Many of these questions were things that I have wondered for a while and I got get some of my questions answered by attending this talk – for example I have always wondered how will I be able to know that even if my model works correctly, it is actually learning the things I am assuming it learns. This talk was mainly the answer to that. I think going forward I will consider trying to incorporate these ideas in most of my AI implementations in my research