This special focus is jointly sponsored by the Center for Discrete Mathematics and Theoretical Computer Science (DIMACS), the Biological, Mathematical, and Physical Sciences Interfaces Institute for Quantitative Biology (BioMaPS), and the Rutgers Center for Molecular Biophysics and Biophysical Chemistry (MB Center).
Title: Mapping biological networks
I will present a preview of the results of a large-scale experimental screen for human protein-protein interactions and describe new algorithms for analyzing protein interaction and genetic interaction networks. We have developed new methods for segmenting networks into functional sub-networks and for estimating network complexity.
Title: Duplication and the evolution of the vertebrate insulin signaling pathway
Whole genome duplication is thought to be a major force driving the evolution of vertebrates. Although such duplications are rare, they can catalyze dramatic evolutionary change by duplicating entire pathways of interacting genes in a single event. The members of the duplicate pathway can then evolve in concert to take on a new functional role. For example, it has been suggested that the diversification of mammalian growth and metabolism resulted from the duplication of the insulin pathway in an early vertebrate. While such hypotheses are powerful and provocative, they are still largely speculative. I will present computational approaches to validating theories of large scale duplication using the insulin signaling pathway as an example.
Title: Reconstructing temporal ordering of proteins in biological pathways using protein-protein interactions
The recent advent of high-throughput biochemical methods make large data sets of protein-protein interactions available, offering the promise of new biological insights. Previous work on protein interactions focuses on protein function prediction and clustering related proteins. Instead, we proposed a graph-theoretic approach to mine temporal relations in biological pathways. Concretely, we model the assembly pathway of protein complexes as interval graphs and reconstructed the temporal ordering of proteins in the pathways by ordering the vertices in the protein interaction graph. We have developed a tool XRONOS to perform the interval graph fitting. We will present the results of applying our tool to the ribosomal assembly pathway, as well as statistical and biological validation of our results.
Title: Computational Proteins: Understanding Protein Function on a Genome-scale using Networks
My talk will be concerned with topics in proteomics, in particular predicting protein function on a genomic scale. We approach this through the prediction and analysis of biological networks -- both of protein-protein interactions and transcription-factor-target relationships. I will describe how these networks can be determined through Bayesian integration of many genomic features and how they can be analyzed in terms of various simple topological statistics.
A Bayesian networks approach for predicting protein-protein interactions from genomic data. R Jansen, H Yu, D Greenbaum, Y Kluger, NJ Krogan, S Chung, A Emili, M Snyder, JF Greenblatt, M Gerstein (2003) Science 302: 449-53. ExpressYourself: A modular platform for processing and visualizing microarray data. NM Luscombe, TE Royce, P Bertone, N Echols, CE Horak, JT Chang, M Snyder, M Gerstein (2003) Nucleic Acids Res 31: 3477-82. TopNet: a tool for comparing biological sub-networks, correlating protein properties with topological statistics. H Yu, X Zhu, D Greenbaum, J Karro, M Gerstein (2004) Nucleic Acids Res 32: 328-37. Genomic analysis of regulatory network dynamics reveals large topological changes. NM Luscombe, MM Babu, H Yu, M Snyder, SA Teichmann, M Gerstein (2004) Nature 431: 308-12. Annotation transfer between genomes: protein-protein interologs and protein-DNA regulogs. H Yu, NM Luscombe, HX Lu, X Zhu, Y Xia, JD Han, N Bertin, S Chung, M Vidal, M Gerstein (2004) Genome Res 14: 1107-18.
Title: Modeling cells with protein networks
With the appearance of large networks of protein-protein and protein-DNA interactions as a new type of biological measurement, methods are needed for constructing cellular pathway models using interaction data as the central framework. The key idea is that, by querying the molecular interaction network with other biological data sets, it will be possible to organize the network into modules representing the repertoire of distinct functional processes in the cell. Three distinct types of network queries will be discussed, including queries to identify:
(1) Networks in control of gene expression changes (2) Networks correlating with systematic phenotypes and synthetic lethals (3) Protein interaction networks that are conserved across species
Using these computational modeling and query tools, we are constructing network models to explain the physiological response of yeast to an array of DNA damaging agents.
Relevant articles and links:
Title: Evolution of mammalian gene sequence and gene expression networks
The results of a network-based attempt to elucidate the evolutionary forces involved in shaping mammalian gene sequence and gene expression divergence will be presented. Gene sequence comparisons were done for human and mouse protein coding sequences (CDSs) as well as proximal promoter sequences. In addition, microarray data from the Novartis Gene Expression Atlas were used to analyze tissue-specific gene expression profiles. Relationships among gene sequences and gene expression patterns were resolved into networks and their topological properties were compared. The topology of a human gene coexpression network, derived from tissue-specific expression profiles, shows scale-free properties that imply evolutionary self-organization via preferential node attachment. Genes with numerous coexpressed partners (the hubs of the coexpression network) evolve more slowly on average than genes with fewer coexpressed partners, and genes that are coexpressed show similar rates of evolution. Levels of change in gene expression between human-mouse orthologs are correlated with levels of CDS divergence that are determined largely by purifying selection. In contrast, evolutionary changes of tissue-specific gene expression profiles do not show such a correlation with sequence divergence. Finally, comparison of gene expression profiles with promoter sequence divergence versus CDS divergence suggests convergent evolution of gene expression patterns that are driven by cis-regulatory sequences.
Title: Conserved Patterns of Protein-Protein Interaction in Yeast, Worm and Fly
To elucidate cellular machinery on a global scale, we performed a multiple comparison of the newly available protein-protein interaction networks of C. elegans, D. melanogaster and S. cerevisiae. This comparison integrated protein interaction and sequence information to reveal 71 network regions that were conserved across all three species and many exclusive to the metazoans. Using this conservation we found statistically significant support for many new protein functions and protein interactions. We tested sixty interaction predictions for yeast by two-hybrid analysis, confirming approximately half of these. Significantly, many of the predicted functions and interactions would not have been identified from sequence similarity alone, or from the interactions within a single species. Joint work with the research groups of Trey Ideker (Whitehead and UCSD), Roded Sharan (Tel Aviv) and Peter Uetz (Stuttgart).
Title: Degree Distributions of BFS trees in random graphs
In this talk, we will examine the degree distribution of BFS trees of random (multi-)graphs with a given degree distribution. A special case of our results will show that the degree distribution of a BFS tree in a random regular graph exhibits a power law, and thus that the BFS tree's distribution can be very different from that of the underlying graph.
The focus of this talk will be on the techniques used to prove this result. In particular, we will encounter continuous-time views of the exploration of a random graph, and Martingale-style tail bounds to prove concentration results. These techniques are likely useful in the analysis of other random graph models and their properties, including biological networks.
The results presented are joint work with Dimitris Achlioptas, Aaron Clauset, and Cris Moore.
Title: Multi-Molecular Assemblies by Multiple Combinatorial Docking and Biomolecular Networks
The vast majority of proteins function when associated in multimolecular assemblies. Yet, prediction of the structures of multimolecular complexes has largely not been addressed, probably due to the magnitude of the combinatorial complexity of the problem. Docking applications have traditionally been used to predict pairwise interactions between molecules. We have developed an algorithm that extends the application of docking to multi-molecular assemblies. We apply it to predict both the quaternary structures of oligomers and multi-protein complexes. The algorithm predicted well a near native arrangement of the input subunits for all cases in our data set, where the number of the subunits of the different target complexes varied from 3 to 10. In order to simulate a more realistic scenario, unbound cases were also tested. In these cases the input conformations of the subunits are either unbound conformations of the subunits or a model obtained by homology modelling technique. The successful predictions of the unbound cases, where the input conformations of the subunits are different from their conformations within the target complex, suggest that the algorithm is robust. We expect that this type of algorithm should be particularly useful to predict the structures of large macromolecular assemblies, difficult to solve by experimental structure determination. The method is based on the concept that binding and folding are similar processes, involving combinatorial assembly of protein (sub)parts.
I shall further highlight the applications of such a method within the context of biomolecular networks, focusing on their dynamic associations.
The research of R. Nussinov has been funded in whole or in part with Federal funds from the National Cancer Institute, National Institutes of Health, under contract number NO1-CO-12400.
Title: Functional organization of cellular networks
An important goal of network biology is the systematic uncovering of the organizational principles that define the functional organization of various cellular networks, with the ultimate goal of developing models of their integrated behavior. In this talk, I will review our recent progress on understanding the organization of flux states in cellular metabolism and initial insights to the mode by which environmental signals may modulate these states.
Joint work with E. Almaas, G. Balazsi and A.-L. Barabasi
Title: A Random Graph Approach to Protein Structure Determination
Graphs provide a natural encoding of protein three-dimensional structures --- vertices correspond to amino acid positions, and edges to pairs of sequentially adjacent or close-enough (``contact'') positions. Nuclear magnetic resonance (NMR) spectroscopy is a key experimental technique for protein structure determination, and provides data probing both contacts and sequential adjacencies. However, NMR data are noisy (many false edges) and the mapping (``assignment'') between NMR data and the protein itself is unknown and hard to determine. Such a mapping is an important step in the complex task of efficient determination of three dimensional structure of proteins via NMR. We are pursuing novel random graph models and algorithms to address several problems useful for determination of an NMR assignment under different experimental protocols. A graph model capturing the correlation structure of the noise in NMR data enables characterization of the underlying problem and gives a framework for rigorously analyzing algorithms and also determining the statistical significance of the output structures. We have studied the problem of finding long paths in a graph representation of sequential NMR data. We showed that a randomized search employing local patching of identified structure can efficiently discover high-quality solutions. We then extended our algorithm to handle higher-order structures (such as beta sheets) from contacts. Results show that our approach is able to overcome significant noise and local ambiguity in identifying significant secondary structure fragments of resonance assignment.
Title: Integrative Analyses of Interaction Networks Underlying the Cellular Circuitry in Yeast
Cellular processes are regulated by interactions between various types of molecules, such as proteins, genes, and metabolites. Among these, the interactions between proteins and the interactions between transcription factors and their target genes play a prominent role, controlling the activity of proteins and the expression levels of genes. A significant number of such interactions has been revealed recently via high throughput technologies. These data can be represented as a network of interactions describing the circuitry responsible for the regulation of a variety of cellular processes. Analysis of this cellular circuitry is one of the major research goals in the post genomic era.
Previous studies have analyzed aspects of this network concentrating on either transcription-regulation or protein-protein interactions separately. Here we search for composite network motifs: characteristic network patterns consisting of both transcription-regulation and protein-protein interactions that recur significantly more often than in random networks. To this end we developed algorithms for detecting motifs in networks with two or more types of interactions and applied them to an integrated data set of protein-protein interactions and transcription regulation in Saccharomyces cerevisiae. We found a two-protein mixed-feedback loop motif, five types of three-protein motifs exhibiting coregulation and complex formation, and many motifs involving four proteins. Virtually all four-protein motifs consisted of combinations of smaller motifs. This study presents a basic framework for detecting the building blocks of networks with multiple types of interactions.
Joint work with Esti Yeger-Lotem, Samuel Sattath, Nadav Kashtan, Shalev Itzkovitz, Ron Milo, Uri Alon, and Hanah Margalit.
Title: Network topology and evolution of hard to gain and hard to loose attributes
Recent studies of properties of various biological networks have been focusing on discovering characteristic features of such networks. Following the discovery that degree distribution in biological networks is typically roughly scale free researchers started to look for other network measurements that would be more powerful in discriminating between various types of networks. Such measurements include, for example, clustering coefficient, diameter, and most recently network motifs. Whenever one finds and differentiating characteristics, say an over-represented motif, new questions arise: (1) why the network evolved to have a given property, (2) is the property related to evolutionary constraints or is it a signature of functional selection, and in case of networks motif (3) is given motif pattern a signature or some global characteristic? In this talk we address the above questions form the perspective of evolution of hard to gain and hard to loose characters.
Title: Duplication Based Models for Proteome Network Evolution
Protein-protein interaction networks, particularly that of the yeast S. Cerevisiae, have recently been studied extensively. These networks seem to satisfy the small world property and their (1-hop) degree distribution seems to form a power law. More recently, a number of duplication based random graph models have been been proposed with the aim of emulating the evolution of protein-protein interaction networks and satisfying these two graph theoretical properties. In this talk, we will show that the proposed model of Pastor-Satorras et al. does not generate the power law degree distribution with exponential cutoff and the more restrictive model by Chung et al. cannot be interpreted unconditionally. It is possible to slightly modify these models to ensure that they generate a power law degree distribution. However, even after this modification, the more general k-hop degree distribution achieved by these models, for k>1, are very different from that of the yeast proteome network. We address this problem by introducing a new network growth model that takes into account the sequence similarity between pairs of proteins (as a binary relationship) as well as their interactions. The new model captures not only the k-hop degree distribution of the yeast protein interaction network for all k>0, but it also captures the 1-hop degree distribution of the sequence similarity network, which again seems to form a power law.
Title: Glimpses on the evolution of transcription regulation networks
Most genes are regulated by transcription factors (TFs), which bind to specific short strings at the promoter regions of their DNA sequences. We are developing methods that use sequence, expression and other data from multiple species, in order to identify such signals and to trace their evolution. I will discuss in this talk three studies:
Joint work, in parts, with Amos Tanay, Irit Gat-Viks, Chaim Linhart, Rani Elkon (TAU) and Aviv Regev (Harvard)
Title: Dynamical Properties of the Yeast Cell Cycle Network
Despite the complex environment in and outside of the cell, various cellular functions are carried out reliably by the underlying biomolecular networks. How is the stability of a cell state achieved? How can a biological pathway take the cell from one state to another reliably? Here we address these questions from a dynamic systems point of view. We study the network regulating the cell cycle of the budding yeast, investigating its global dynamical property and stability. We found that this network is extremely stable and robust for its function. The stationary state of the cell, or the state at a checkpoint in general, corresponds to a global attractor of the dynamics-almost all initial protein states flow to the biological stationary state. Furthermore, the biological pathway of the cell-cycle sequence-which is a particular trajectory in the state space-is a globally stable and attracting trajectory of the dynamics. These dynamic properties, arising from the underlying network connection, are also robust against small perturbations to the network and against parameter changes in the model.
Title: An informatics analysis of protein factors and binding sites on Yeast DNA
Transcription factors, and therefore their binding sites, play a role in the crucial process of regulating gene expression. To investigate the network topology among seemingly separate regulatory pathways, an informatics analysis of yeast transcription factor binding site matrices obtained from the TRANSFAC database has been performed. The binding site matrices were compared using multiple distance measures, and hierarchical clustering using DIANA (divisive analysis) was performed on the basis of these distance measures. The relative merits of the metrics considered were compared. Additionally, two interesting situations have been studied: cases for which a pair of transcription factor binding matrices have large separation but share a transcription factor; and cases for which a pair of transcription factor binding matrices have small separation but are bound by different factors. These cases serve as guides for future studies to elucidate understanding based on first-principle physics/chemistry.