DIMACS TR: 2008-16

Probabilistic evolutionary model for substitution matrices of PAM and BLOSUM families.



Authors: Valentina V. Sulimova, Vadim V. Mottl, Casimir A. Kulikowski and Ilya B. Muchnik

ABSTRACT

Background: Almost all problems of protein analysis must inevitably be based on comparing the types of amino acids from which  protein sequences are composed. Similarities between amino acids are most commonly based on two methods derived from very different approaches: the evolu-tionary based substitution matrixes of the PAM (Point Accepted Mutation) family, derived from phylogenetic trees, and the BLOSUM substitution matrixes which are statistically inferred from multiple alignments of groups of proteins  which, according to their  authors, S. and J. Henikoff, are essentially different from the PAM family of matrices. Results: In this paper we prove that the statistical approach for computing substitution matrixes of the BLOSUM family can be explained in terms of the PAM evolutionary model. This means that both of these approaches are actually based on similar types of evolutionary models, and the main difference between them lies in the different initial data for estimating their unknown model parame-ters. We also show that all PAM substitution matrices can be represented as kernel functions in their mathematical structure, and lose their positive semi-definiteness only because of choice of final rep-resentation. Conclusions: The fact that the PAM and BLOSUM substitution matrices are originally positive semidefinite, allows them to be easily used for constructing kernels over a set of proteins, so, with-out loss of biological meaning, these similarity measures can be applied without correction. Fur-thermore, any new substitution matrix will automatically be a kernel if, first, it is estimated by either the Dayhoff or  Henikoff techniques and, second, the final representation proposed in the present research is adopted.

Paper Available at: ftp://dimacs.rutgers.edu/pub/dimacs/TechnicalReports/TechReports/2008/2008-16.pdf
DIMACS Home Page