Time: March 24th 10:00am
Location: EB 3105
The growth of genome-scale sequence data in public databases is outpacing Moore's law. Fueled by the explosion of genomic data, comparative genomics seeks to annotate and understand the genomes of different organisms, leading to new biological and biomedical discoveries. Current comparative genomic studies typically utilize a computational pipeline consisting of several stages, including: (1) aligning orthologs, which are genomic sequences related by evolutionary descent from a common ancestral sequence, (2) examining evolutionary relationships among orthologs and other genomic features, and (3) using the resulting insights to reason about the biological function and significance of the genomic features under study. Modern comparative genomic studies face three primary challenges: biological -- the co-occurrence of multiple complex evolutionary events; mathematical -- the need to devise realistic, yet tractable, models of genome evolution; and, computational -- the need to develop accurate and scalable algorithms and tools for conducting large-scale analyses. My research addresses all three challenges. In this talk, I will begin by describing my Ph.D. work on large-scale sequence alignment and phylogenetic estimation (stages (1) and (2), respectively). Some of the primary contributions of my work are new divide-and-conquer techniques, which are essential to accurate inference while enabling scalability that improved upon previous methods by an order of magnitude (in terms of number of sequences) or more. Next, I will discuss my postgraduate work on machine learning techniques for comparative genomics in the presence of complex evolutionary events, especially those that result in non-tree-like evolutionary histories (stages (2) and (3)). Highlights include PhyloNet-HMM, a model-based inference method that combines phylogenetic networks, which capture non-tree-like evolutionary relationships among genomes, with hidden Markov models (HMMs), which capture dependencies within genomes, in a novel manner. The performance of PhyloNet-HMM is demonstrated through an empirical analysis of mouse genomes, resulting in a new clinical insight and a new biological discovery. Other important applications include the study of horizontal gene transfer in bacteria and its role in the spread of antibiotic resistance. I will conclude with general observations and directions for my future research.