DIMACS Seminar on Math and CS in Biology
Title:
The Statistics of Local Sequence Similarities and The Choice of Protein Alignment Scoring Systems
Speaker:
- Stephen Altschul
- National Center for Biotechnology Information
- National Library of Medicine
- National Institutes of Health
Place:
- CoRE Building, Room 431
- Busch Campus, Rutgers University
Time:
- 11:00 AM
- Monday, February 12, 1996
Abstract:
One simple form of protein sequence comparison aligns only segments of the
sequences being compared, and employs a "substitution matrix" to specify a
score for each aligned pair of amino acids [1,2]. Within the past six years,
a powerful statistical theory [3,4] has emerged for local alignments lacking
gaps [5]. Its main features [6] are: First, that any substitution matrix is
implicitly (if not explicitly) tailored to locating alignments with a
specific frequency of aligned residue pairs; Second, that alignment scores may
be scaled so that they are expressed as bits of information; Finally, that the
information needed to distinguished an alignment from chance is directly
proportional to the log of the search space size. An extension of the theory
yields the ability to assess the significance of a collection of high-scoring
segment pairs [7]. Once gaps are allowed, the distribution of alignment
scores has not been established analytically. However, computational
experiments strongly suggest that the same basic theory covers this broader
class of alignments. While the relevant statistical parameters can not be
calculated from first principles, they may be estimated by random simulation
[8,9] or database search [10,11]. How best to choose gap costs is an
important open question, currently amenable only to empirical study [12,13].
- [1] Smith, T.F. & Waterman, M.S. (1981) J Mol Biol 147:195-197.
- [2] Pearson, W.R. & Lipman, D.J. (1988) Proc Natl Acad Sci USA 85:2444-2448.
- [3] Karlin, S. & Altschul, S.F. (1990) Proc Natl Acad Sci USA 87:2264-2268.
- [4] Dembo, A., Karlin, S. & Zeitouni, O. (1994) Ann Prob 22:2022-2039.
- [5] Altschul, S.F. et al. (1990) J Mol Biol 215:403-410.
- [6] Altschul, S.F. (1991) J Mol Biol 219:555-565.
- [7] Karlin, S. & Altschul, S.F. (1993) Proc Natl Acad Sci USA 90:5873-5877.
- [8] Waterman, M.S. & Vingron, M. (1994) Proc Natl Acad Sci USA 91:4625-4628.
- [9] Altschul, S.F. & Gish, W. (1996) Meth Enzymol 266:460-480.
- [10] Collins, J.F., Coulson, A.F.W. & Lyall, A. (1988) CABIOS 4:67-71.
- [11] Mott, R. (1992) Bull Math Biol 54:59-75.
- [12] Pearson, W.R. (1995) Prot Sci 4:1145-1160.
- [13] Vogt, G., Etzold, T. & Argos, P. (1995) J Mol Biol 249:816-831.
Document last modified on February 6, 1996