 
This special focus is jointly sponsored by the Center for Discrete Mathematics and Theoretical Computer Science (DIMACS), the Biological, Mathematical, and Physical Sciences Interfaces Institute for Quantitative Biology (BioMaPS), and the Rutgers Center for Molecular Biophysics and Biophysical Chemistry (MB Center).
Title: Algorithms for Structural Comparison and Statistical Analysis of Three Dimensional Protein Motifs
The comparison of structural subsites in proteins is increasingly relevant to the prediction of their biological function. To address this problem, we have designed the Match Augmentation algorithm (MA). Given a structural motif of interest, such as a functional site, MA searches a target protein structure for a match: the set of atoms with the greatest geometric and chemical similarity. MA is extremely efficient because it exploits the fact that the amino acids in a structural motif are not equally important to function. Using motif residues ranked by functional significance via the Evolutionary Trace (ET), MA prioritizes its search by initially forming matches with functionally significant residues, then, guided by ET, it augments this partial match stepwise until the whole motif is found. With this hierarchical strategy, MA runs considerably faster than other methods, and almost always identifies matches in homologs known to have cognate functional sites. In order to interpret matches, we further developed a statistical method using nonparametric density estimation of the frequency distribution of structural matches. This data driven technique measures the statistical significance of matches in an effort to separate matches which identify functional homologs from functionally unrelated proteins. Preliminary results show that matches identifying cognate active sites are statistically significant.
This is joint work with V.Y. Fofanov, D.M. Kristensen, M. Kimmel, L. Kavraki and O. Lichtarge
Title: On Updating Torsion Angles of Molecular Conformations
Conformation of a molecule is defined by the relative positions of atoms in the molecule. There are three main representations for conformations of molecules: Cartesian coordinates, distance matrix and internal coordinates. In biochemistry, conformational changes of a molecule are usually described in terms of its internal coordinates. However, for many applications, such as molecular docking, the Cartesian coordinates of atoms are needed for computation. Although for each conformational change, the Cartesian coordinates of atoms can be updated in linear time (which is optimal asymptotically), the constant factor becomes significant if a large number of updates are needed. Zhang and Kavraki examined three methods, called simple rotations, Denavit-Hartenberg local frames and atomgroup local frames. Based on their implementations, they showed that atomgroup local frames are more efficient than the other two. In this paper, by expressing the torsion angle change as a composition of translations and rotations, we observe that the simple rotations can be implemented in an efficient way by taking advantage of consecutive operations. Both quantitative and experimental comparisons show that the improved simple rotations, in which the rotation operation is expressed either in unit quaternion or rotation matrix, is as efficient as atomgroup local frames and thus have the advantage of avoiding the need of precomputations of a set of local frames and transformations between them in the local frames approach. Our source code will be made available publicly.
Title: On global optimization of the Lennard-Jones potential functions
The minimization of molecular potential energy functions is one of the most challenging nonconvex global optimization problems and plays an important role in the determination of stable states of certain classes of molecular clusters and proteins. In this presentation we discuss several issues related to this problem including optimality conditions, equivalent formulations, and possible bounds on the global minimum of the total potential energy of the Lennard-Jones cluster. Some numerical results are also presented.
References
Title: Reverse Engineering of Protein and Gene Networks: how to design experiments efficiently?
In this talk we investigate the computational complexities of a combinatorial problem that arises in the reverse engineering of protein and gene networks. We abstract a combinatorial version of the problem and observe that this is ``equivalent'' to the set multicover problem. We discuss efficient randomized approximation algorithms for this problem and discuss why achieving signifacntly better approximations than ours may be computationally difficult.
Joint work with Piotr Berman (Pennsylvania State University) and Eduardo Sontag (Rutgers University).
Title: Computational identification of binding sites in proteins
Virtually all cellular processes depend on precisely orchestrated interactions between proteins and other molecules within cells. We are developing computational approaches for predicting which amino acids of a protein participate in protein-protein, protein-DNA and protein-RNA interactions. We have generated several non-redundant datasets of interfaces extracted from macromolecular complex structures in the PDB and used a variety of knowledge-based approaches to generate classifiers capable of predicting which residues of the protein partner(s) are located in the complex interface. In published work, we have shown that machine learning algorithms can be successfully used to identify binding sites in protein-protein complexes using a combination of sequence, structural, and phylogenetic information. More recently, we have used similar approaches to analyze protein-nucleic acid complexes and to generate classifiers capable of accurately predicting which amino acids of the protein partner participate in nucleic acid recognition, using only the amino acid sequence of the protein as input. The level of success obtained in our experiments suggests that these computational approaches will be valuable both for localizing binding domains in structural genomics targets for which structural, but not functional information is available and for functional classification of proteins when structural information is not available. The ability to reliably predict which residues of a protein directly contribute to ligand binding, using a fully automated approach, should advance our understanding of how proteins recognize their interactions partners in cells and potentially contribute to the development of novel therapies for both genetic and infectious diseases.
Title: Sequence order independent structural alignment
Protein structure alignment is an indispensable tool used for many different studies in bioinformatics. Most structural alignment algorithms assume that the structural units of two similar proteins will align sequentially. This assumption is flawed, and as a result proteins with similar structure but with a permutated sequence arrangement are often missed. We present a solution to the problem based on an approximation algorithm that finds a sequence-order independent structural alignment that is close to optimal. We first exhaustively fragment two proteins and calculate a novel similarity score between all possible aligned fragment pairs. We treat each aligned fragment pair as a node on a graph that is connected by an edge, if there are intra-residue sequence conflicts. By regarding the realignment of the fragment pairs as a special case of the maximum-weight independent set problem, we solve this computationally hard problem approximately by iteratively solving relaxations of an appropriate integer programming formulation. The resulting structural alignment is sequence order independent. Our method is insensitive to gaps, insertions, deletions and circular permutations. As an example, we show that our algorithm is capable of automatically detecting proteins that share similar structure but differ by circular permutation. The use of circular permutation studies in aspects of research from protein evolution to protein folding is growing rapidly in spite of the low number of known occurrences of circular permutations.
Title: Consistent sets of secondary structures in proteins
Several methods exist to predict secondary structures of a protein
from its sequence of aminoacids.  The methods of our interest make
their predictions based on local patterns and they seem to have
inherent limitation on their sensitivity/specificity relationship.  We
propose to apply such methods set to have high sensitivity and low
specificity, thus predicting ``too many'' structures and then to solve
a global optimization problem to extract a consistent set from the
``redundant'' family of predicted structures. In this talk we
formulate several possible optimization problems and heuristics to
solve them.
Ming-Yang Kao, Northwestern University 
 Title: Reconstructing a Circular Order from Inaccurate Adjacency 
  Information with Applications in NMR Data Interpretation
 Given the adjacency information of $n$ elements in a circular order, our goal is to reconstruct the order. If there are errors in the adjacency data, then the problem is NP-hard in the general case. However, if the adjacency information is derived from each element reporting some numerical values as identifiers for itself and its front neighbor, then an optimal ordering can be defined as the one which minimizes the sum of differences between the numerical identifier of an element from that reported by its neighbor. Our main result is that finding such an optimal ordering takes optimal $\Theta(n\log n)$ time. This setting arises naturally in NMR data interpretation for determining protein structures.
 Title: Characterizations of protein functions through evolutionary 
		   models of structural binding surfaces
                   
 Inferring biological roles of proteins and classifying them by their functions are challenging tasks, as global protein sequence and structure similarities are often unreliable for functional inference. Protein plays its role by interacting with other molecules, and local binding surfaces contain direct useful information.  To identify locally similar binding surfaces and to assess their biological similarity, scoring matrix such as Pam and Blosum are not suitable, because residues on protein functional surfaces experience different selection pressure than residues in folding core. We develop methods for estimating replacement rates of residues based on a continuous time Markov model using Bayesian Markov chain Monte Carlo. Combined with geometrically computed libraries of millions of binding surfaces, we show our method can infer protein functions from structures with accuracy, and can predict functional roles of proteins from structural genomics with unknown biological roles and with only hypothetical sequence homologs.  We further discuss how to construct libraries of binding surfaces for proteins of different functional classes based on a canonical basis set of binding surfaces, and show such an approach can provide general models for classifying proteins by their biochemical roles with sensitivity and specificity.
 Title: Novel Sequential Monte Carlo Techniques and Their Applications to Protein Structure Analysis
 I will describe the general sequential Monte Carlo (SMC) methodology, which has become popular in the engineering and statistics communities, and discribe its connections with molecular structure simulations and optimizations. Two new ideas for improving the efficiency of SMC methods will be introduced: using pilot samples to guide for resampling and incorporating a data-dependent optimal resampling criterion. We will illustrate these ideas using examples from Hydrophobic-Hydrophilic (HP) protein model simulation and near-native structure sampling.  By applying the new SMC scheme, we were able to characterize many important ensemble properties of NNS, including the size of NNS set, the probability of randomly sampling one NNS structure, and the occurrence of native contacts in NNS. We also found that the widely used pairwise potential function behaved surprisingly bad for  stabilizing near native protein structures. Based on the joint work with Junni Zhang, Jinfeng Zhang, Ming Lin, Rong Chen, and Jie Liang.
Johannes Rudolph, Duke University 
 Title: Protein-Protein Interfaces
 Protein-protein interactions form the basis for most cellular processes including events such as cell division and growth that are intimately linked to human diseases. Despite their importance and many previous studies, our understanding of the molecular basis for protein-protein interactions remains poor, lagging far behind that of protein folding and structure.  We are striving to fill this gap by elucidating universal descriptors of protein interfaces based on composition, shape, and forces.  We begin by defining and generating the interface surface between two or more complexed proteins using tools from computational geometry including weighted Voronoi diagrams, weighted Delauney triangulations, and topological persistence.  Simply described, the interface surface is a wrinkly (non-flat) sheet halfway between two proteins trimmed where it protrudes beyond the interface. The advantage of the single interface surface over separate surfaces of the contributing proteins is that it provides a well-defined object that can be readily manipulated in calculations and be easily visualized.  Analogous to protein structures, we define a hierarchy of structures on the interface surface. Primary structure describes the adjacencies (intra- and inter-molecular) of the residues contributing to the interface. Secondary structures are recurring geometric motifs such as cups and planes.  We believe that most interface surfaces consist of a few canonical motifs, equivalent to alpha-helices and beta-sheets in protein structures.  The landscape of these motifs forms the tertiary structure of an interface, which we refer to as the fold. Our goal is classifying interfaces through a deeper understanding of these three hierarchical structures and their interdependence.  We also study a variety of naturally defined functions on the interface surface including distance to neighboring proteins and electrostatic potentials. We have demonstrated the biochemical relevance of the interface surface by using it to identify hotspot residues and partition the interface into previously recognized fragments. Additionally, we have established a web-based Protein Interface Database for analyzing and disseminating our data.
 Title: Digital biology: Relations between data-mining
                   in biological sequences and physical chemistry
 The digital nature of biology is crucial to its functioning
as an information system.  The hierarchical development of biological components (translating DNA to proteins which form complexes in cells that aggregate to make tissue which form organs in different species) is discrete (or quantized) at each step. It is important to understand what makes proteins bind to other proteins predictably and not in a continuous distribution of places, the way grease forms into blobs.
 Data mining is a major technique in bioinformatics. It has been used on both genomic and proteomic data bases with significant success. One key issue in data mining is the type of lens that is used to examine the data. At the simplest level, one can just view the data as sequences of letters in some alphabet. However, it is also possible to view the data in a more sophisticated way using concepts and tools from physical chemistry. We will give illustrations of the latter and also show how data mining (in the PDB) has been used to derive new results in physical chemistry.  Thus there is a useful two-way interaction between data mining and physical chemistry.
 We will give a detailed description of how data mining in the PDB can give clues to how proteins interact. This work makes precise the notion of hydrophobic interaction in certain cases. It provides an understanding of how molecular recognition and signaling can evolve. We also illustrate how this can be used to understand enzymatic activity.  This work also introduces a new model of electrostatics for protein-solvent systems.
 This talk is based on joint work with Ariel Ferndandez and Kristina Rogale Plazonic.
 Title: Predicting protein interactions and their interfaces
 I will discuss  a general structural bioinformatics framework for predicting protein interactions that can be applied to specific structural motifs.  Our framework incorporates both sequence and experimental data, and we will demonstrate its effectiveness in predicting coiled-coil protein interactions.
 I will also discuss recent hardness results and computational methods for the side-chain positioning problem, an important component of predicting and designing protein structures and interfaces. I will present an integer linear programming formulation of the problem, and then show empirically that, surprisingly, in many interesting cases the linear program relaxation finds optimal solutions to the integer program. Our analysis demonstrates that LP-based approaches are highly effective in finding optimal (and near-optimal) solutions for the side-chain positioning problem.
 Title: Functional Annotation of Proteins with Known Structure by Structure and Sequence Similarity, DNA-protein Interaction Patterns and GO Framework
                  
 Following the discovery of an increasing number of proteins with known structures there is 
the need to provide functional annotation that is both highly accurate and consistent. 
Two approaches have been developed to address this issue: (i) Using sequence and structural 
similarity to assign function within the Gene OntologyTM (GO) framework; (ii) Using structural similarity 
and protein-DNA interaction patterns to assign protein function
 The results and additional information on both (i) and (ii) can be found at http://spdc.sdsc.edu.
  
 Title: An approach to semi flexible docking: a case study of the enzymatic reaction catalyzed by terpenoid cyclases
 A method is described for semi flexible molecular docking, when part of the small molecule is fixed within the protein matrix. The algorithm is based on the assumption that a part of the ligand molecule (at least 2 atoms) is defined as fixed in a predetermined position within the protein matrix. The approach is applied to model the first two reaction steps in the formation of 5-epi aristolochene by 5-epi-aristolochene cyclase from Nicotiana tabacum.
 Title: High throughput processing of the structural information of the Protein Data Bank
 The Protein Data Bank (PDB) is the largest, most comprehensive, freely available 
  depository of protein structural
  information, containing more than 30 000 deposited structures. 
  On one hand, the form and the organization of the
  PDB seems to be perfectly adequate for gathering information from 
  specific protein structures, by using the bibl
  iographic references and the informative remark fields. 
  On the other hand, however, it seems to be impossible to
  automatically review remark fields and journal references for 
  processing hundreds or thousands of PDB files.
   We present here a family of combinatorial algorithms to solve 
  this problem. Our algorithms are capable to automatically analyze PDB structural 
  information, identify missing atoms, repair chain ID information, and most importantly, 
  the algorithms are capable of identifying ligands and binding sites, facilitating 
  data mining studies of pro
  ven protein-ligand binding and testing virtual docking algorithms. 
  The algorithm can also be used for predicting
  flexible sub-chains of the structure.
 Title: Computational mapping of proteins for exploring the role of binding site plasticity
		 in molecular recognition and information transfer
 Computational mapping of proteins moves molecular probes - small organic molecules containing various functional groups . around the protein surface, finds favorable positions, clusters the conformations, and ranks the clusters on the basis of the average free energy (1-3). We employ at least six different probes, retain the five lowest free energy clusters for each probe, and define the positions at which several clusters overlap as the consensus sites. Mapping a number of enzymes has shown that the largest consensus sites (i.e., consensus sites with the highest number of overlapping clusters) are almost always localized to major subsites of the enzyme active site, and as a result, the amino acid residues that interact with the probes also bind the specific ligands of the protein (3). Thus, the method can provide detailed and reliable information on important amino acid residues in the binding site. It is important to note that the mapping finds the active site even when it is not the largest pocket (3), and in this sense it performs better than binding site identification methods based on purely geometric criteria.
 In this talk we describe the application of solvent mapping to two classes of proteins, which are the peroxisome proliferator activated receptor-...PPAR....representing nuclear receptors, and cytochromes P450. Results provide fundamental new insight on the mechanism of information processing by these important molecules, and show that the mapping can be used as a powerful tool of analysis to understand the role of relatively small but highly cooperative conformational changes in molecular recognition.
 References
 Title: Protein-protein Docking Algorithm Development and Testing
 Protein-protein docking is a important approach to study
protein-protein interactions. I will describe the efforts in my lab on
developing docking algorithms ZDOCK and RDOCK, a docking benchmark,
and the testing of the algorithms on the benchmark as well as through
the community-wide blind test CAPRI. I will also discuss analysis on
transient and obligatory complexes.
 
 Title: Modeling the chemosensing system of E. coli
 The chemotaxis network in E. coli is the best studied = signal-transduction network of any living organism. The network enables = E. coli to swim toward attractants such as amino acids or sugars, and = away from repellents. The cells perform chemotaxis by detecting = temporal changes in their chemical environment and transducing this = information into a decision to swim straight or change direction = (tumble). The system is remarkable for its high sensitivity and robust = adaptation over a wide range of external chemical concentrations. = Motivated by recent in vivo FRET studies [1], we model the chemosensing = system as a mixed array of interacting receptors akin to an Ising = lattice. Our results support and extend the robust-adaptation model of = Barkai and Leibler [2], in which receptor complexes function as = two-level systems.
 [1] V Sourjik and HC Berg, PNAS 99(1), 123-127 (2002); Ibid, Nature = 428(6981), 437-441 (2004).
 [2] N Barkai and S Leibler, Nature 387(6636), 913-917(1997).
 Title: Fast and Accurate Protein Side-Chain Packing
 If we know the primary sequence of a protein, can we predict its 
three-dimensional structure by computational
approaches? This is one of the most important and difficult problems 
in computational molecular biology and has
tremendous implications for the field of proteomics.
 In this talk, we will focus on protein side-chain packing, 
one key problem of protein structure prediction. This
problem has been proved to be NP-hard and hard to approximate. We will describe a novel approach to the protein side-chain packing problem, 
based on the tree-decomposition of
a protein structure. This algorithm not only runs very efficiently in practice, 
but also yields a
polynomial-time approximation scheme for some types of objective functions.
 We have implemented the proposed algorithm combined with several other 
components as one protein side-chain
prediction program TreePack. TreePack achieves an average speed five 
times faster than SCWRL 3.0, a widely-used
side-chain packing program, and runs up to 90 times faster on 
large proteins. TreePack not only runs very
efficiently, but also has the same accuracy as SCWRL.
 Title: Multi-Scale Structure Prediction of Helical Transmembrane 
		   Proteins
                   
 As the first step toward a multi-scale, hierarchical computational approach for 
membrane proteins structure prediction, the packing of transmembrane (TM) 
helices was modeled at the residual and atomistic levels respectively. 
For predictions at the residual level, the helix-helix and helix-lipid 
interactions were described by a set of knowledge-based energy functions. 
For predictions at the atomistic level, CHARMM19 force field was employed. 
To facilitate the system to overcome energy barriers, Wang-Landau sampling was 
carried out by performing a random walk in the energy and conformational spaces. 
Native-like structures were predicted at both levels for 2- and 7-helix systems. 
Interestingly, consistent results were obtained from simulations at residual and 
atomistic levels for the same system, strongly suggesting the feasibility of a 
hierarchical approach for membrane structure prediction. 
Jie Liang, UIC
  
Jun Liu, Harvard University 
L. Ridgway Scott, University of Chicago 
  
Mona Singh, Princeton University
Ilya N. Shindyalov, University of California at San Diego
Vladimir Sobolev, Weizmann Institute of Science, Israel 
Zoltan Szabadka, Eotvos University, Hungary 
Sandor Vajda, Boston University
Zhiping Weng, Boston University 
Ned Wingreen, Princeton University 
Jinbo Xu, MIT 
 
Ying Xu, University of Georgia 
 Previous: Program
 Previous: Program
 Workshop Index
 Workshop Index
 DIMACS Homepage
 DIMACS Homepage