Normal view MARC view ISBD view

Fragile interpretations and interpretable models in NLP

By:

Kohli, Ayushi

Contributor(s):

advised by Devi, V. Susheela

Material type: Book

BookPublication details: Bangalore: Indian Institute of Science, 2023Description: xi, 66p.: col. ill. e-Thesis 2.047 MBDissertation: MTech(Res); 2023; Computer Science and Automation Subject(s):

DDC classification:

001.535 KOH

Online resources:

Click here to access online

Dissertation note: MTech(Res); 2023; Computer Science and Automation Summary: Deploying deep learning models in critical areas where the cost of making a wrong decision leads to a substantial financial loss, like in the banking domain, or even loss of life, like in the medical field, is significantly less. We cannot entirely rely on deep learning models as they act as black boxes for us. This problem can be resolved by Explainable AI, which aims to explain these black boxes. There are two approaches to explaining these black boxes, one being via posthoc explainability techniques and the other being by designing inherently interpretable models. These two are the basis of our work. In the first part, we talk about the instability of posthoc explanations, leading to fragile in- terpretations. This work focuses on the robustness of NLP models along with the robustness of interpretations. We have proposed an algorithm that perturbs the input text such that the generated text is semantically, conceptually, and grammatically similar to the input text, yet the interpretations produced are fragile. Through our experiments, we have shown how the interpretations of two very similar sentences vary significantly. We have shown that posthoc explanations can be unstable, inconsistent, unfaithful, and fragile; and, therefore, cannot be trusted. Finally, we have concluded whether to trust the robust NLP models or the posthoc explanations. In the second part, we have designed two inherently interpretable models, one for offensive lan- guage detection tasks in the case of multi-task learning for three subtasks, sharing a hierarchical relationship between them and the other for the question pair similarity task. Our offensive language detection model achieved an F1 score of 0.78 on the OLID dataset and 0.85 on the SOLID dataset. Our question pair similarity model achieved an F1 score of 0.83. We also provide a detailed analysis of the model interpretability as well as prediction interpretability.

Tags from this library: No tags from this library for this title. Log in to add tags.

Average rating: 0.0 (0 votes)

Holdings
Item type	Current library	Call number	URL	Status	Date due	Barcode
Thesis	JRD Tata Memorial Library	001.535 KOH (Browse shelf(Opens below))	Link to resource	Available		ET00257

includes bibliographical references and index

MTech(Res); 2023; Computer Science and Automation

Deploying deep learning models in critical areas where the cost of making a wrong decision leads to a substantial financial loss, like in the banking domain, or even loss of life, like in the medical field, is significantly less. We cannot entirely rely on deep learning models as they act as black boxes for us. This problem can be resolved by Explainable AI, which aims to explain these black boxes. There are two approaches to explaining these black boxes, one being via posthoc explainability techniques and the other being by designing inherently interpretable models. These two are the basis of our work. In the first part, we talk about the instability of posthoc explanations, leading to fragile in- terpretations. This work focuses on the robustness of NLP models along with the robustness of interpretations. We have proposed an algorithm that perturbs the input text such that the generated text is semantically, conceptually, and grammatically similar to the input text, yet the interpretations produced are fragile. Through our experiments, we have shown how the interpretations of two very similar sentences vary significantly. We have shown that posthoc explanations can be unstable, inconsistent, unfaithful, and fragile; and, therefore, cannot be trusted. Finally, we have concluded whether to trust the robust NLP models or the posthoc explanations. In the second part, we have designed two inherently interpretable models, one for offensive lan- guage detection tasks in the case of multi-task learning for three subtasks, sharing a hierarchical relationship between them and the other for the question pair similarity task. Our offensive language detection model achieved an F1 score of 0.78 on the OLID dataset and 0.85 on the SOLID dataset. Our question pair similarity model achieved an F1 score of 0.83. We also provide a detailed analysis of the model interpretability as well as prediction interpretability.

There are no comments on this title.

to post a comment.