DIMACS TR: 2001-08

On the inter-residue correlation patterns and their role in classification of protein families

Authors: Boris Galitsky and Sergey Shelepin


We build a novel method to calculate and analyze the correlations in mutational behavior between different positions in a multiple sequence alignment. The inter-dependence between the residues for a protein family is represented as a matrix of correlation values obeying the invariance with respect to specific amino acids, the number of sequences representing a family, the length of sequences, residue variability and the uniformity of data set representation. Common and distinguishing properties of the few protein families, including immunoglobulins, are revealed, based on the geometry of correlation matrices. We analyze the specific texture of these matrices, inherent to the specific families, and suggest a way to distinguish proteins from non-protein set of sequences.

The role of correlation matrix technique in classification is discussed. We suggest that the classification criteria should be based on the residues at the positions with the highest overall correlation with the other positions. Revealing the positions with various correlation strength helps to reconstruct the phylogeny of protein families.

Paper Available at: ftp://dimacs.rutgers.edu/pub/dimacs/TechnicalReports/TechReports/2001/2001-08.ps.gz

DIMACS Home Page