Fragile interpretations and interpretable models in NLP

By: Contributor(s): Material type: BookBookPublication details: Bangalore: Indian Institute of Science, 2023Description: xi, 66p.: col. ill. e-Thesis 2.047 MBDissertation: MTech(Res); 2023; Computer Science and Automation Subject(s): DDC classification:
  • 001.535 KOH
Online resources: Dissertation note: MTech(Res); 2023; Computer Science and Automation Summary: Deploying deep learning models in critical areas where the cost of making a wrong decision leads to a substantial financial loss, like in the banking domain, or even loss of life, like in the medical field, is significantly less. We cannot entirely rely on deep learning models as they act as black boxes for us. This problem can be resolved by Explainable AI, which aims to explain these black boxes. There are two approaches to explaining these black boxes, one being via posthoc explainability techniques and the other being by designing inherently interpretable models. These two are the basis of our work. In the first part, we talk about the instability of posthoc explanations, leading to fragile in- terpretations. This work focuses on the robustness of NLP models along with the robustness of interpretations. We have proposed an algorithm that perturbs the input text such that the generated text is semantically, conceptually, and grammatically similar to the input text, yet the interpretations produced are fragile. Through our experiments, we have shown how the interpretations of two very similar sentences vary significantly. We have shown that posthoc explanations can be unstable, inconsistent, unfaithful, and fragile; and, therefore, cannot be trusted. Finally, we have concluded whether to trust the robust NLP models or the posthoc explanations. In the second part, we have designed two inherently interpretable models, one for offensive lan- guage detection tasks in the case of multi-task learning for three subtasks, sharing a hierarchical relationship between them and the other for the question pair similarity task. Our offensive language detection model achieved an F1 score of 0.78 on the OLID dataset and 0.85 on the SOLID dataset. Our question pair similarity model achieved an F1 score of 0.83. We also provide a detailed analysis of the model interpretability as well as prediction interpretability.
Tags from this library: No tags from this library for this title. Log in to add tags.
Star ratings
    Average rating: 0.0 (0 votes)
Holdings
Item type Current library Call number URL Status Date due Barcode
Thesis Thesis JRD Tata Memorial Library 001.535 KOH (Browse shelf(Opens below)) Link to resource Available ET00257

includes bibliographical references and index

MTech(Res); 2023; Computer Science and Automation

Deploying deep learning models in critical areas where the cost of making a wrong decision leads to a substantial financial loss, like in the banking domain, or even loss of life, like in the medical field, is significantly less. We cannot entirely rely on deep learning models as they act as black boxes for us. This problem can be resolved by Explainable AI, which aims to explain these black boxes. There are two approaches to explaining these black boxes, one being via posthoc explainability techniques and the other being by designing inherently interpretable models. These two are the basis of our work. In the first part, we talk about the instability of posthoc explanations, leading to fragile in- terpretations. This work focuses on the robustness of NLP models along with the robustness of interpretations. We have proposed an algorithm that perturbs the input text such that the generated text is semantically, conceptually, and grammatically similar to the input text, yet the interpretations produced are fragile. Through our experiments, we have shown how the interpretations of two very similar sentences vary significantly. We have shown that posthoc explanations can be unstable, inconsistent, unfaithful, and fragile; and, therefore, cannot be trusted. Finally, we have concluded whether to trust the robust NLP models or the posthoc explanations. In the second part, we have designed two inherently interpretable models, one for offensive lan- guage detection tasks in the case of multi-task learning for three subtasks, sharing a hierarchical relationship between them and the other for the question pair similarity task. Our offensive language detection model achieved an F1 score of 0.78 on the OLID dataset and 0.85 on the SOLID dataset. Our question pair similarity model achieved an F1 score of 0.83. We also provide a detailed analysis of the model interpretability as well as prediction interpretability.

There are no comments on this title.

to post a comment.

                                                                                                                                                                                                    Facebook    Twitter

                             Copyright © 2023. J.R.D. Tata Memorial Library, Indian Institute of Science, Bengaluru - 560012

                             Contact   Phone: +91 80 2293 2832

Powered by Koha