DIMACS Seminar on Math and CS in Biology


Spliced Alignment: a New (and Naive) Approach to Gene Recognition


Pavel A. Pevzner
Departments of Mathematics and Computer Science
University of Southern California


Department of Computer Science, Room 402
35 Olden Street, Computer Science Building
Princeton University


1:30 PM
Tuesday, October 17, 1995
Please note the different time and place for this talk


Gene recognition is one of the most important problems in computational molecular biology. Previous attempts to solve this problem were based on artificial intelligence and, surprisingly enough, applications of theoretical computer science methods for gene recognition were almost unexplored. Recent advances in large-scale cDNA sequencing open a way towards a new *combinatorial* approach to gene recognition. I describe a *spliced alignment* algorithm and a software tool which explores all possible exon assemblies in polynomial time and finds the multi-exon gene structure with the best fit to a related protein. Unlike other existing methods, the algorithm successfully performs exons assemblies even in the case of short exons or exons with unusual codon usage; we also report correct assemblies for the genes with more than 10 exons provided a homologous protein is already known. On a test sample of human genes with known mammalian relatives the average overlap between the predicted and the actual genes was 98%, which is remarkably well as compared to other existing methods. At that, the algorithm absolutely correctly reconstructed 87% of genes. The rare discrepancies between the predicted and real exon-intron structures were restricted either to extremely short initial or terminal exons or proved to be results of alternative splicing and errors in database feature tables.

This is a joint work with Mikhail S. Gelfand, Andrey A. Mironov and Sing-Hoi Sze.

Document last modified on October 11, 1995