Samuel Karlin

Waksman Institute - Rutgers University
Busch Campus
Piscataway, New Jersey
October 20, 1994 at 4:00 PM

Topic of Discussion

Assessments of Heterogeneities in DNA Sequences

The accelerating accumulation of DNA and protein sequences poses challenges and provides opportunities in analyzing genomic organization and evolution.

Methods and concepts described in this talk provide means for assessment and interpretation of heterogeneities within and between DNA sequences. We will focus on the following data: (1) Patterns and anomalies of di-, tri-, and tetranucleotides; (2) phylogenetic reconstructions based on distance measures of dinucleotide relative abundances; (3) identification of exceptional peptides and oligonucleotides (e.g., rare and frequent words) in protein and genomic sequences; and (4) counts and spacings of various marker arrays such as specific words, purine tracts, regulatory motifs, nucleosome placements, and restriction targets.

Three classes of statistical functionals can aid in identifying and evaluating distinctive sequence features: (a) r-scan analysis used in discerning anomalies (clustering, overdispersion, evenness) in the spacings of a specified marker along the sequence; (b) segmental quantile distributions compared across genomic data sets or with appropriate reference distributions; (c) score based sequence analysis as a means of characterizing anomalies in sequence text and as applied in multiple sequence comparisons, in sensitivity measures of nucleotide distributions and in gene predictions.

In the first talk we will describe and apply methods (a) and (b). The second talk focuses on applications of method (c).

DIMACS Center - Rutgers University
CoRE Building Lecture Hall - Busch Campus
Piscataway, New Jersey
October 21, 1994 at 11:30 AM

Topic of Discussion

Statistical Studies of Biomolecular Sequences: Score Based Methods

This presentation reviews the method of score-based sequence analysis with the objectives of discerning distinctive segments in single sequences and identifying significant common segments in sequence comparisons. We will describe methods and results for both the theory and its applications. These include distributional theory involving several high scoring segments in single sequences useful in identifying transmembrane segments, distribution formulas for general scoring regimes useful in multiple sequence comparisons, applications for predicting exons and genes in DNA sequences, and identifying distinguished charge patterns in protein sequences.

Document last modified on October 31, 1994