Normal view MARC view ISBD view

Algorithmic Approaches to Pangenome Graph Problems / (Record no. 433719)

MARC details
000 -LEADER
fixed length control field	04391nam a2200277 4500
008 - FIXED-LENGTH DATA ELEMENTS--GENERAL INFORMATION
fixed length control field	250616b \|\|\|\|\|\|\|\| \|\|\|\| 00\| 0 eng d
041 ## - LANGUAGE CODE
Language code of text/sound track or separate title	en
082 ## - DEWEY DECIMAL CLASSIFICATION NUMBER
Classification number	572.86
Item number	CHA
100 ## - MAIN ENTRY--PERSONAL NAME
Personal name	Chandra, Ghanshyam
245 ## - TITLE STATEMENT
Title	Algorithmic Approaches to Pangenome Graph Problems /
260 ## - PUBLICATION, DISTRIBUTION, ETC. (IMPRINT)
Place of publication, distribution, etc	Bangalore :
Name of publisher, distributor, etc	Indian Institute of Science,
Date of publication, distribution, etc	2025.
300 ## - PHYSICAL DESCRIPTION
Extent	xix, 114 p. :
Other physical details	col. ill. ;
Accompanying material	e-Thesis
Size of unit	7.445 Mb
500 ## - GENERAL NOTE
General note	Includes bibliographical references
502 ## - DISSERTATION NOTE
Dissertation note	PhD ; 2025 ; Department of Computational and Data Sciences
520 ## - SUMMARY, ETC.
Summary, etc	The human reference genome serves as a foundational baseline for comparing newly sequenced human genomes. With the growing availability of high-quality human genome assemblies, there is now an opportunity to modernize the reference genome by incorporating genome sequences from thousands of individuals. By capturing genetic variation of diverse populations, a pangenome reference promises to improve equity in human genetics and genomics. An efficient way to represent a pangenome reference is a graph data structure where the vertices are labeled with sequences and the edges connect two sequences that appear consecutively in a genome. Existing works have discussed the construction and the benefits of a pangenome reference, but most methods use ad-hoc heuristics that lack strong theoretical foundations. In this thesis, we introduce novel problem formulations and algorithms to address the following questions: (1) How to align sequences to a pangenome graph? (2) How to infer a newly sequenced genome using a pangenome reference? and (3) How to accelerate whole-genome alignment, a crucial step in pangenome graph construction? The first two parts of this thesis focus on solving the problem of aligning sequencing reads to a pangenome graph. Given a set of exact substring matches between a read and the vertex labels, chaining refers to identifying an ordered subset of matches that be combined together to form an alignment. Previous methods ignore distances between match locations because computing distances quickly on graphs is non-trivial. We propose the first chaining formulations and efficient algorithms that account for the pairwise distances between match locations. The time complexity of our algorithms is parameterized in the size of minimum path cover, which is known to be small for pangenome graphs. We empirically demonstrate improved accuracy in aligning long reads to graphs. In the second part, we further enhance the optimization criteria for sequence-to-graph alignment by penalizing recombinations, where a recombination refers to switching between genomes in a pangenome graph. This feature helps in improving the alignment quality, as most paths in a pangenome graph represent biologically unlikely recombinations. We develop efficient dynamic programming algorithms for chaining and alignment problems. We also give fine-grained reductions to prove that significantly faster algorithms are impossible under the strong exponential time hypothesis (SETH). The third part of the thesis introduces a novel problem formulation for inferring an individual's genome sequence as a path in a pangenome graph. This task is useful for variant discovery and genotyping applications. We give a proof of NP-hardness and design efficient integer programming algorithms. Using publicly available sequencing datasets, we show that our algorithm accurately infers major histocompatibility complex (MHC) sequences using low-coverage sequencing data, outperforming existing heuristic algorithms. In the final part, we propose parallel algorithms to accelerate whole-genome alignment, a fundamental problem in bioinformatics. We implement a multi-core parallel chaining algorithm and a fast mechanism for differentiating primary and secondary chains. These optimizations lead to runtime gains over a commonly used parallel alignment algorithm, minimap2. We discuss the generalization of our techniques for fast pangenome graph construction.
650 ## - SUBJECT ADDED ENTRY--TOPICAL TERM
Topical term or geographic name as entry element	Computational Biology
650 ## - SUBJECT ADDED ENTRY--TOPICAL TERM
Topical term or geographic name as entry element	Combinatorial Algorithms
650 ## - SUBJECT ADDED ENTRY--TOPICAL TERM
Topical term or geographic name as entry element	High Performance Computing
650 ## - SUBJECT ADDED ENTRY--TOPICAL TERM
Topical term or geographic name as entry element	Pangenome graph
650 ## - SUBJECT ADDED ENTRY--TOPICAL TERM
Topical term or geographic name as entry element	Haplotype-aware pangenome graphs
650 ## - SUBJECT ADDED ENTRY--TOPICAL TERM
Topical term or geographic name as entry element	Human genome
650 ## - SUBJECT ADDED ENTRY--TOPICAL TERM
Topical term or geographic name as entry element	Strong exponential time hypothesis
700 ## - ADDED ENTRY--PERSONAL NAME
Personal name	Jain, Chirag
Relator term	advisor
856 ## - ELECTRONIC LOCATION AND ACCESS
Uniform Resource Identifier	https://etd.iisc.ac.in/handle/2005/6956
942 ## - ADDED ENTRY ELEMENTS (KOHA)
Koha item type	Thesis

No items available.

Print
Save record
BIBTEX Dublin Core MARCXML MARC (non-Unicode/MARC-8) MARC (Unicode/UTF-8) MARC (Unicode/UTF-8, Standard) MODS (XML) RIS ISBD
More searches

Search for this title in:
Other Libraries (WorldCat) Other Databases (Google Scholar) Online Stores (Bookfinder.com) Open Library (openlibrary.org)