Graph Clustering Approaches for Speaker Diarization of Conversational Speech (Record no. 431573)

MARC details
000 -LEADER
fixed length control field 03571nam a22002177a 4500
008 - FIXED-LENGTH DATA ELEMENTS--GENERAL INFORMATION
fixed length control field 240319b |||||||| |||| 00| 0 eng d
082 ## - DEWEY DECIMAL CLASSIFICATION NUMBER
Classification number 621.3822
Item number SIN
100 ## - MAIN ENTRY--PERSONAL NAME
Personal name Singh, Prachi
245 ## - TITLE STATEMENT
Title Graph Clustering Approaches for Speaker Diarization of Conversational Speech
260 ## - PUBLICATION, DISTRIBUTION, ETC. (IMPRINT)
Place of publication, distribution, etc Bangalore:
Name of publisher, distributor, etc Indian Institute of Science,
Date of publication, distribution, etc 2023.
300 ## - PHYSICAL DESCRIPTION
Extent xxiii, 115p.:
Other physical details col. ill.
Accompanying material e-Thesis
Size of unit 4.715Mb
500 ## - GENERAL NOTE
General note Includes bibliographical references
502 ## - DISSERTATION NOTE
Dissertation note PhD;2023;Electrical Engineering
520 ## - SUMMARY, ETC.
Summary, etc In this era of advanced machine intelligence, real-world speech applications need to be equipped to deal with conversations involving multiple speakers. An essential first step in speech information extraction from conversational speech is the task of finding “who spoke when”, also referred to as speaker diarization. The focus of this doctoral thesis is to describe our efforts in investigating graph clustering techniques for this problem. While graph models have been used in several other domains, its application to temporal segmentation of speech is the first of its kind. The thesis is divided into three main parts. In the first part of the thesis, we describe a novel proposal on self-supervised learning to perform joint representation learning and clustering, called self-supervised clustering (SSC) for diarization. On the learned representations, we explore path integral clustering (PIC), a graph-based clustering algorithm. The PIC is an agglomerative graph clustering method that performs clustering based on the edge connections of a node, called path integral. The proposed SSC with path integral clustering (SSC-PIC) is shown to achieve state-of-the-art performance for benchmark datasets. The second part of the thesis is an extension of SSC-PIC to incorporate metric learning. We design a neural version of the probabilistic linear discriminant analysis (PLDA) approach with learnable parameters to compute a log-likelihood score between embeddings from two segments of the recording. We propose a joint self-supervised representation learning and metric learning approach called SelfSup-PLDA-PIC. In the third part of the thesis, we introduce an end-to-end supervised graph clustering approach. We develop a supervised learning setup using labeled conversational data for training this model. In this setting, we propose a supervised clustering approach called Supervised HierArchical gRaph Clustering (SHARC) for speaker diarization. This approach uses Graph Neural Networks (GNN) to capture the similarity between the speaker embeddings and performs hierarchical clustering. An extension of this work is the joint training of the speaker embedding extractor along with the GNN module, referred to as end-to-end SHARC (E-SHARC). To incorporate overlapped speech detection, the E-SHARC model is extended for diarization of overlapped speech recordings. In summary, this thesis introduces innovative self-supervised and supervised methods that utilize hierarchical graph clustering to improve diarization performance. These approaches have demonstrated state-of-the-art results on different benchmark datasets. Despite their success, we also acknowledge the limitations of these methods, which open up opportunities for further advancements in diarization, especially in complex and challenging environments.
650 ## - SUBJECT ADDED ENTRY--TOPICAL TERM
Topical term or geographic name as entry element Speaker Diarization
650 ## - SUBJECT ADDED ENTRY--TOPICAL TERM
Topical term or geographic name as entry element Self-supervised clustering
650 ## - SUBJECT ADDED ENTRY--TOPICAL TERM
Topical term or geographic name as entry element Graph Clustering
700 ## - ADDED ENTRY--PERSONAL NAME
Personal name Advised by Ganapathy, Sriram
856 ## - ELECTRONIC LOCATION AND ACCESS
Uniform Resource Identifier https://etd.iisc.ac.in/handle/2005/6442
942 ## - ADDED ENTRY ELEMENTS (KOHA)
Koha item type Thesis

No items available.

                                                                                                                                                                                                    Facebook    Twitter

                             Copyright © 2024. J.R.D. Tata Memorial Library, Indian Institute of Science, Bengaluru - 560012

                             Contact   Phone: +91 80 2293 2832