A learnable distillation approach for model-agnostic explainability with multimodal applications

By: Contributor(s): Material type: BookBookLanguage: en Publication details: Bengaluru : IISc , 2023 .Description: xv, 66p. col. ill. 29.1cm *20.5 cm e-Thesis 35.4MbDissertation: MTech (Res); 2023; Electrical engineeringSubject(s): DDC classification:
  • 621 DEB
Online resources: Dissertation note: MTech (Res); 2023; Electrical engineering Summary: Deep neural networks are the most widely used examples of sophisticated mapping functions from feature space to class labels. In the recent years, several high impact decisions in domains such as finance, healthcare, law and autonomous driving, are made with deep models. In these tasks, the model decisions lack interpretability, and pose difficulties in making the models accountable. Hence, there is a strong demand for developing explainable approaches which can elicit how the deep neural architecture, despite the astounding performance improvements observed in all fields, including computer vision, natural language processing, generates the output decisions. The current frameworks for explainability of deep models are based on gradients (eg. GradCAM, guided-gradCAM, Integrated gradients etc) or based on locally linear assumptions (eg. LIME). Some of these approaches require the knowledge of the deep model architecture, which may be restrictive in many applications. Further, most of the prior works in the literature highlight the results on a set of small number of examples to illustrate the performance of these XAI methods, often lacking statistical evaluation. This thesis proposes a new approach for explainability based on mask estimation approaches, called the Distillation Approach for Model-agnostic Explainability (DAME). The DAME is a saliency-based explainability model that is post-hoc, model-agnostic (applicable to any black box architecture), and requires only query access to black box. The DAME is a student-teacher modeling approach, where the teacher model is the original model for which the explainability is sought, while the student model is the mask estimation model. The input sample is augmented with various data augmentation techniques to produce numerous samples in the immediate vicinity of the input. Using these samples, the mask estimation model is learnt to generate the saliency map of the input sample for predicting the labels. A distillation loss is used to train the DAME model, and the student model tries to locally approximate the original model. Once the DAME model is trained, the DAME generates a region of the input (either in space or in time domain for images and audio samples, respectively) that best explains the model predictions. We also propose an evaluation framework, for both image and audio tasks, where the XAI models are evaluated in a statistical framework on a set of held-out of examples with the Intersection-over-Union (IoU) metric. We have validated the DAME model for vision, audio and biomedical tasks. Firstly, we deploy the DAME for explaining a ResNet-50 classifier pre-trained on ImageNet dataset for the object recognition task. Secondly, we explain the predictions made by ResNet-50 classifier fine-tuned on Environmental Sound Classification (ESC-10) dataset for the audio event classification task. Finally, we validate the DAME model on the COVID-19 classification task using cough audio recordings. In these tasks, the DAME model is shown to outperform existing benchmarks for explainable modeling. The thesis concludes with a discussion on the limitations of the DAME approach along with the potential future directions.
Tags from this library: No tags from this library for this title. Log in to add tags.
Star ratings
    Average rating: 0.0 (0 votes)
Holdings
Item type Current library Call number Status Date due Barcode
E-BOOKS E-BOOKS JRD Tata Memorial Library 621 DEB (Browse shelf(Opens below)) Available ET00121

Include bibliographical references and index.

MTech (Res); 2023; Electrical engineering

Deep neural networks are the most widely used examples of sophisticated mapping functions from feature space to class labels. In the recent years, several high impact decisions in domains such as finance, healthcare, law and autonomous driving, are made with deep models. In these tasks, the model decisions lack interpretability, and pose difficulties in making the models accountable. Hence, there is a strong demand for developing explainable approaches which can elicit how the deep neural architecture, despite the astounding performance improvements observed in all fields, including computer vision, natural language processing, generates the output decisions. The current frameworks for explainability of deep models are based on gradients (eg. GradCAM, guided-gradCAM, Integrated gradients etc) or based on locally linear assumptions (eg. LIME). Some of these approaches require the knowledge of the deep model architecture, which may be restrictive in many applications. Further, most of the prior works in the literature highlight the results on a set of small number of examples to illustrate the performance of these XAI methods, often lacking statistical evaluation. This thesis proposes a new approach for explainability based on mask estimation approaches, called the Distillation Approach for Model-agnostic Explainability (DAME). The DAME is a saliency-based explainability model that is post-hoc, model-agnostic (applicable to any black box architecture), and requires only query access to black box. The DAME is a student-teacher modeling approach, where the teacher model is the original model for which the explainability is sought, while the student model is the mask estimation model. The input sample is augmented with various data augmentation techniques to produce numerous samples in the immediate vicinity of the input. Using these samples, the mask estimation model is learnt to generate the saliency map of the input sample for predicting the labels. A distillation loss is used to train the DAME model, and the student model tries to locally approximate the original model. Once the DAME model is trained, the DAME generates a region of the input (either in space or in time domain for images and audio samples, respectively) that best explains the model predictions. We also propose an evaluation framework, for both image and audio tasks, where the XAI models are evaluated in a statistical framework on a set of held-out of examples with the Intersection-over-Union (IoU) metric. We have validated the DAME model for vision, audio and biomedical tasks. Firstly, we deploy the DAME for explaining a ResNet-50 classifier pre-trained on ImageNet dataset for the object recognition task. Secondly, we explain the predictions made by ResNet-50 classifier fine-tuned on Environmental Sound Classification (ESC-10) dataset for the audio event classification task. Finally, we validate the DAME model on the COVID-19 classification task using cough audio recordings. In these tasks, the DAME model is shown to outperform existing benchmarks for explainable modeling. The thesis concludes with a discussion on the limitations of the DAME approach along with the potential future directions.

There are no comments on this title.

to post a comment.

                                                                                                                                                                                                    Facebook    Twitter

                             Copyright © 2023. J.R.D. Tata Memorial Library, Indian Institute of Science, Bengaluru - 560012

                             Contact   Phone: +91 80 2293 2832

Powered by Koha