DIMACS Seminar on Math and CS in Biology

Title:

Spliced Alignment: a New (and Naive) Approach to Gene Recognition

Speaker:

Pavel A. Pevzner

Departments of Mathematics and Computer Science

University of Southern California

Place:

Department of Computer Science, Room 402

35 Olden Street, Computer Science Building

Princeton University

Time:

1:30 PM

Tuesday, October 17, 1995

Please note the different time and place for this talk

Abstract:

Gene recognition is one of the most important problems in computational molecular biology. Previous attempts to solve this problem were based on artificial intelligence and, surprisingly enough, applications of theoretical computer science methods for gene recognition were almost unexplored. Recent advances in large-scale cDNA sequencing open a way towards a new *combinatorial* approach to gene recognition. I describe a *spliced alignment* algorithm and a software tool which explores all possible exon assemblies in polynomial time and finds the multi-exon gene structure with the best fit to a related protein. Unlike other existing methods, the algorithm successfully performs exons assemblies even in the case of short exons or exons with unusual codon usage; we also report correct assemblies for the genes with more than 10 exons provided a homologous protein is already known. On a test sample of human genes with known mammalian relatives the average overlap between the predicted and the actual genes was 98%, which is remarkably well as compared to other existing methods. At that, the algorithm absolutely correctly reconstructed 87% of genes. The rare discrepancies between the predicted and real exon-intron structures were restricted either to extremely short initial or terminal exons or proved to be results of alternative splicing and errors in database feature tables.

This is a joint work with Mikhail S. Gelfand, Andrey A. Mironov and Sing-Hoi Sze.

Document last modified on October 11, 1995