DIMACS Seminar on Math and CS in Biology
Title:
Locating Protein Coding Regions in Human DNA using Decision Trees
Speaker:
- Steven Salzberg
- Johns Hopkins University
Place:
- 431 CoRE Building, Busch Campus
- Rutgers University
Time:
- 3:00 PM
- Monday, April 10, 1995
Abstract:
Genes in eukaryotic DNA stretch across hundreds or thousands of base
pairs, while the regions of those genes that code for proteins usually
occupy only a small percentage of the sequence. Identifying the
coding regions is of vital importance to the understanding of
mammalian genetics. Using the growing body of publicly available DNA
sequences, researchers have begun experimenting with computational
methods for distinguishing between coding and non-coding regions, and
several promising results have been reported. Existing methods
experience their greatest difficulty when trying to identify short DNA
sequences, for which the statistics available are quite limited. We
describe here a new approach, based on a randomized decision tree
algorithm, for identifying coding regions in DNA. This approach
produces consistently higher accuracies than previous methods on short
DNA subsequences. The algorithm can easily be trained for any length
DNA sequence. The talk will review the gene identification problem
as background material before presenting details of the decision tree
algorithm and the experiments on human DNA sequences.
Upcoming Talks:
- April 17: Dr. Charles Cantor, Boston U. (distinguished lecture)
- April 24: Dr. Richard Lipton, Princeton (seminar will be held at U. Penn)
Document last modified on April 6, 1995