DIMACS Seminar on Math and CS in Biology
Title:
Spliced Alignment: a New (and Naive) Approach to Gene Recognition
Speaker:
- Pavel A. Pevzner
- Departments of Mathematics and Computer Science
- University of Southern California
Place:
- Department of Computer Science, Room 402
- 35 Olden Street, Computer Science Building
- Princeton University
Time:
- 1:30 PM
- Tuesday, October 17, 1995
- Please note the different time and place for this talk
Abstract:
Gene recognition is one of the most important problems in
computational molecular biology. Previous attempts to solve this
problem were based on artificial intelligence and, surprisingly
enough, applications of theoretical computer science methods for gene
recognition were almost unexplored. Recent advances in large-scale
cDNA sequencing open a way towards a new *combinatorial* approach to
gene recognition. I describe a *spliced alignment* algorithm and a
software tool which explores all possible exon assemblies in
polynomial time and finds the multi-exon gene structure with the best
fit to a related protein. Unlike other existing methods, the algorithm
successfully performs exons assemblies even in the case of short exons
or exons with unusual codon usage; we also report correct assemblies
for the genes with more than 10 exons provided a homologous protein is
already known. On a test sample of human genes with known mammalian
relatives the average overlap between the predicted and the actual
genes was 98%, which is remarkably well as compared to other existing
methods. At that, the algorithm absolutely correctly reconstructed 87%
of genes. The rare discrepancies between the predicted and real
exon-intron structures were restricted either to extremely short
initial or terminal exons or proved to be results of alternative
splicing and errors in database feature tables.
This is a joint work with Mikhail S. Gelfand, Andrey A. Mironov
and Sing-Hoi Sze.
Document last modified on October 11, 1995