DIMACS Theory Seminar
Title:
Locating Protein Coding Regions in Human DNA using Decision Trees
Speaker:
- Steven Salzberg
- Johns Hopkins University
Place:
- Computer Science Building, 35 Olden Street, Room 302
- Princeton University
Time:
- 1:30 - 2:30 PM
- Thursday, April 6, 1995
Abstract:
Genes in eukaryotic DNA stretch across hundreds or thousands of base
pairs, while the regions of those genes that code for proteins usually
occupy only a small percentage of the sequence. Identifying the
coding regions is of vital importance to the understanding of
mammalian genetics. Using the growing body of publicly available DNA
sequences, researchers have begun experimenting with computational
methods for distinguishing between coding and non-coding regions, and
several promising results have been reported. Existing methods
experience their greatest difficulty when trying to identify short DNA
sequences, for which the statistics available are quite limited. We
describe here a new approach, based on a randomized decision tree
algorithm, for identifying coding regions in DNA. This approach
produces consistently higher accuracies than previous methods on short
DNA subsequences. The algorithm can easily be trained for any length
DNA sequence. The talk will review the gene identification problem
as background material before presenting details of the decision tree
algorithm and the experiments on human DNA sequences.
A reception follows the talk at 2:30 in the Tea Room.
Host: Simon Kasif (kasif@cs.princeton.edu)
Document last modified on March 31, 1995