DIMACS Mini-Workshop on System Based Modeling in Informatics

February 19 - 20, 2001
DIMACS Center, Rutgers University, Piscataway, New Jersey

Organizers:: Michael Liebman, Abramson Family Cancer Center, University of Pennsylvania, School of Medicine, liebmanm@mail.med.upenn.edu; Richard L.X. Ho, R.W. Johnson Pharmaceutical Research Institute, RHo@prius.jnj.com

Presented under the auspices of the Special Year on Computational Molecular Biology.

Abstracts:

1. NIGMS Funding Opportunities for Quantitative Approaches to Biomedical Research James J. Anderson, NIGMS The National Institute of General Medical Sciences (NIGMS: a component of the U.S. National Institutes of Health), in conjunction with other Institutes and the National Science Foundation, has issued a number of program announcements (PA) and requests for applications (RFA) that provide for the support of cross-disciplinary research, education, and training involving quantitative approaches to complex biological and biomedical problems. The suite of initiatives have as their foci: 1. The understanding of system principles and dynamics in processes involving large numbers of interacting components, at all levels of biological organization. 2. The development of analytical methodologies to discover the genetic architecture of complex genetic traits. 3. The study of the evolutionary dynamics of pathogens and their hosts with their environments. 4. The development of enabling technologies useful for the study of metabolic processes and metabolic engineering. 5. The development of basic mathematical concepts and algorithms that have the potential for significantly advancing the state of the art in biomedical research. The initiatives comprise mechanisms to fund research projects (traditional research project grants (R01) and program project grants (P01)), to fund establishment of integrative research efforts ("glue grants," R24), to fund extensive programs of research related activities (Center grants (P50)), to provide support for short courses and workshops (R25 education grants) for both biologists and non-biologists, and to provide training at both pre- and postdoctoral level (T32 and T33 Training Grants). Detailed information on these programs will be available, and can also be found at the following URL:http://www.nih.gov/nigms/funding

2. Analysis of the Global Role of IHF in Transcriptional Regulation Craig J. Benham Department of Biomathematical Sciences Mount Sinai School of Medicine New York, NY In a collaboration with Dr. Wes Hatfield (UC-Irvine) we are investigating the IHF-regulated ilvPG promoter. We have shown that it is activated by a mechanism involving the transmission of stress-induced destabilization. Specifically, IHF binding causes a four-fold increase in transcriptional initiation rates, but only when the substrate DNA is negatively supercoiled. The mechanism for this activation was elucidated by a collaboration between computational analysis of stress-induced destabilization and experimental investigations. When negatively supercoiled, a region containing the IHF binding site experiences a destabilization of its duplex structure. IHF binding causes this region to reform the B-DNA structure, which causes the destabilization to be transferred to the 10 region of the promoter, thereby activating transcription. This talk will briefly describe the computation of stress-induced destabilization. Then the collaboration will be described by which the mechanism of the IHF-regulated ilvPG promoter was elucidated. Then we indicate how this work is being extended to elucidate the global roles of high-affinity IHF binding in transcriptional regulation throughout the E. coli genome. The theoretical analysis of all other known IHF-regulated genes finds that they all have destabilization properties that suggest that they also could be regulated by the same transmission-type mechanism. Finally, the complete analysis of the entire genome finds a total of 125 ORFs with the catenation of properties needed for this mechanism. These results are currently being experimentally tested in Dr. Hatfield's lab using expression arrays. This is the first collaborative investigation of a global mechanism of regulation. This and other work shows that stress-induced strand separation plays central roles in the initiation of gene expression. Indeed, recent results suggest that it may be among the most archaic regulators of this essential biological process. Speculations will be presented regarding the reasons for this importance.

3. An application framework for modelling biological processes Carolyn Cho, Physiome Sciences, Princeton NJ USA The recent increase in the generation and variety of biological data has stimulated demand for modeling and simulation to complement experimental investigation. The mathematical expertise required for model building, however, has limited its wide-spread application. Physiome Sciences is developing software and applications for modeling signal transduction and other biochemical pathways, intracellular and extracellular physiological processes, organs and systems. These tools have expert-user capabilities but are also designed for use by researchers with no mathematical modeling expertise. In addition to providing hypothesis testing and predictive capabilities, the resulting models become the entry point for accessing relevant data. Physiome software provides researchers a unified environment to filter information, analyze data, develop hypotheses and create a shared knowledge base. Physiome's modeling framework is designed around 4 core themes.

Transparent - Researchers can choose to build and simulate a model without encountering the underlying mathematical formalism, or to edit the mathematics and data directly.
Customizable - Physiome provides the tools for the researcher to build an individual model from a general framework. The researcher may also import custom mathematics at any stage of the model construction or simulation.
Expandable - Models can be re-used as components to build more realistic, larger-scale models as new data are generated
Flexible - The model interface with other applications and build on the user's existing computational infrastructure

This presentation focuses on the use of Physiome Sciences' In Silico Cell^TM modeling environment to construct, analyze and interpret biochemical pathway data. In Silico Cell supports the hierarchical modeling of biological systems and the creation of detailed models from simple ones. This process is enabled by the use of the CellML^TM modeling language, an application of XML for describing biological processes at the cellular and sub-cellular level. In Silico Cell's Pathway Editor function allows the researcher to build and edit pathway diagrams and provides network analysis tools to identify possible drug targets at critical points in a biochemical pathway. This is done using a graphical user interface that automatically generates simulations and mathematics from pathway maps. The pathway and its individual components are linked to a database that can be referenced and updated by the researcher.

4. Innovative information management software for genetic analysis. Richard Groves, Genomica Corporation, Boulder, Colorado Genomica provides a unified database structure in which genetic, phenotypic, molecular and population-derived information can be stored in a single, purpose-built database. Stored data may be visualized, manipulated and mined in simple and intuitive data management tools, specifically designed for genetic and molecular data. All stored information can be rapidly transformed and exported in the rich formats required for further research and analysis. Benefits to users include rapid and sophisticated data stratification and mining, more time available for data analysis, less time spent on data formatting and all data is stored in a single location resulting in superior data integrity and security.

5. Applying In Silico, Animal, and Mental Models of Human Weight Control Richard L.X. Ho, R.W. Johnson Pharmaceutical Research Institute There are many ways to study the control of body weight, but all depend on having a model with which to construct hypotheses, design experiments, and integrate the information for new understanding. To refine their mental model of a disease process, researchers traditionally utilize data from clinical studies and in vivo animal models. Recently there has been growing interest in using in silico models or computer simulations to enhance understanding of disease pathophysiology. A few top-down computer models with nonlinear dynamics are now available commercially which allow creation of simulated patients and virtual interventions on those patients. We are now using such computer models with data mining software to integrate information from low and high throughput methods as well as to help us formulate novel hypotheses and experimental designs in the field of human weight control.

6. Pharmacosemiotics: An Emerging Physics/Chemistry/Linguistics Paradigm in Pharmacology. Sungchul Ji, Department of Pharmacology and Toxicology, Rutgers University, Piscataway, N.J. 08855. Semiotics is the scientific study of signs and symptoms that was developed originally in Ancient Greece as a means of medical diagnosis and prognosis. Signs can be macroscopic (e.g., written words and sentences) or microscopic (e.g., hormones and second messengers) in size. Therefore, `pharmacosemiotics,' a term coined by F. E. Yates in 1999, designates the field of study of drugs viewed as `molecular signs' with which biomedical scientists can communicate with living cells in patients to effectuate pharmacotherapy.

There are two distinct approaches to biomedical research 96 i) the PC paradigm based on the assumption that physics (P) and chemistry (C) are necessary and sufficient to solve all biomedical problems, and ii) the PCL paradigm postulating that P and C are necessary but not sufficient and hence that a new approach, linguistics (L), must be added to the traditional paradigm in order to completely describe and understand living phenomena. The PCL paradigm is synonymous with the semiotics paradigm, since the study of molecular signs entails integrating physics, chemistry, and linguistics. The semiotics paradigm appears to be strongly supported by the recent uncovering of the isomorphism between the molecule-based cell language and the word-based human language [BioSystems 44:17-39 (1997); Ann. N. Y. Acad. Sci. 870:411-417 (1999)].

The main objective of this contribution is to discuss a theoretical model of the living cell known as the Bhopalator that has been developed over the past two decades based on the PCL or semiotics paradigm [J. Theoret. Biol. 116:399-426 (1985); Comments Toxicol. 5(6): 571-585 (1997)]. Three groups of concepts, each derived from physics (i), chemistry (ii), and linguistics, played crucial roles in the development of the Bhopalator model of the cell 96 i) the notion of solitons or soliton-like energetic entities entrapped in biopolymers, ii) the principle of self-organizing chemical reaction-diffusion systems, and iii) the concept of words and sentences and the associated principle of `double articulation.' The Bhopalator embodying these concepts predicted i) that biopolymers contain sequence-specific conformational strains, called `conformons,' that drive all biopolymeric functions, ii) that the final form of expression of structural genes is not polypeptides as is widely believed but patterns of concentration and mechanical stress gradients in the cytosol and the nucleus, collectively known as `intracellular dissipative structures' (IDS's), iiia) that noncoding regions of DNA carry `spatiotemporal genes' that control the space- and time-dependent evolution of gene expression, and iiib) that IDS's act as molecular analogs of `sentences' in the cell, which are essential for cells to execute molecular analogs of `propositions,' `arguments,' and `computations.'

Prediction i) was supported by the discovery by C. Benham of `strain induced duplex destabilization or SIDS's in DNA [PNAS 90:2999-3003 (1993)]. Prediction ii) is validated by the finding reported by Sawyer et al. that intracellular calcium waves drive chemotaxis in neutrophils [Science 230:663-666 (1985)]. Prediction iiia) is substantiate by the finding of N. Amano et al. that the number of noncoding bases per genome increases with the increasing number of transcription factors per structural genes in multicellular, but not in unicellular, organisms [Biol. Chem. 378:1397-1404 (1997)]. Finally, Prediction in iiib) is consistent with the notion of "hyperstructures" proposed by Norris et al. as nonequilibrium, transient complexes of biopolymers, metabolites and ions, intermediate in size between individual macromolecules and the cell itself, that regulate bacterial structure and cell cycle [Biochimie 81:915-920 (1999)].

7. Biosimulation: Systems-Based Modeling of Human Physiology in Health and Disease Robert J. Leipold, Ph.D. Modeler - Professional Services Entelos, Inc. Compared to other high-tech industries (automotive, aeronautics, electronics, chemical processing), the pharmaceutical industry has been slow to adopt systems-based modeling to improve the efficiency and effectiveness of its operations. The current process of drug discovery and development is characterized by long development times, great expense, and a high failure rate. Recent technological advances such as expression profiling and sequence databases have increased the rate at which prospective drug targets can be identified, but they have done nothing to improve the odds for success in the subsequent development process. Entelos offers comprehensive models of human physiology (PhysioLabs(tm)) within which one can test prospective targets for effects on clinically-relevant endpoints. PhysioLabs can be used at every step of the drug discovery and development process from target selection and prioritization through design and evaluation of clinical trials. Examples of these applications will be illustrated with case studies.

8. Effective Representation of Gene Families. Hugh B. Nicholas Jr., David W. Deerfield II., Alexander J. Ropelewski, and Jaclyn Schwizer. Pittsburgh Supercomputing Center 440 Fifth Avenue Pittsburgh, PA 15213 The explosion of sequence data has greatly increased the amount of information needed to identify distant homologues to a query sequence against the large, constantly increasing, background of essentially random, unrelated sequences. One effective solution to this problem would be a non-redundant database of aligned gene family protein products which preserves the knowledge of the sequence variation to which a query sequence must be compared. We have prepared alignments of gene families' protein products using different practical unguided, automatic methods as well as refining these alignments by pattern-based methods. We have represented the alignments by three different profile methods (average, evolutionary, and bayesian) and hidden markov models. We will present the results from a comparison of these different alignment and representation strategies.

9. KBTool: An approach to managing diverse biological information that provides simple, coherent, extensible storage and retrieval. Stephen Shaw, Experimental Immunology Branch, National Cancer Institute, Bethesda, MD Information overload is pervasive in modern biology; finding satisfactory ways to efficiently capture and manage available information is essential both for human comprehension and for automated analysis. A small fraction of that biological knowledge is stored in structured databases that are optimal for automated retrieval, analysis and computation. In contrast, a large fraction of that information is present in free-text documents; and as implicit knowledge of biological experts. We have been experimenting with an intermediate approach (evolving in a software application we call "KBTool") that encodes biological information in a fashion more structured than free text, but more flexible and extensible than conventional relational databases. It is closest in concept to an entity-relationship approach. The current implementation has about 100 categories of entities (genes, proteins, peptides, bindings sites, phosphorylation sites, pathways, molecular assemblies, references, drugs, types of malignancies, etc), about 100 kinds of allowed relationships, and more than 50,000 specific entities. Utilization is sufficiently simple that we use it to encode and retrieve information related to very diverse tasks: conduct of biological research, administration of an electronic journal, managing references, personal contacts, bookmarks etc. Because it provides linkage of public data, workgroup-specific data, and private data, KBTool maintains a coherence that we have been unable to achieve in any other way. In our experience, accumulation of biological expertise into a machine-readable form will occur not when experts are required to log in information, but rather when they find it beneficial to do so. KBTool is making progress towards that goal of users storing information based on "enlightened self-interest".

10. A Systems Approach to Post-Genomic Therapeutic Research Roland Somogyi, Alan Ableson, Max Kotlyar, Ross Dickson, Evan Steeg Molecular Mining Corporation, Canada Integrated analysis of genetic, phenotypic and chemical data will be required to gain insight into complex diseases, discover novel drugs, and integrate diagnostics with treatments for individualized therapies. Recent developments in molecular biology, measurement technologies and computational performance are making it possible to approach these challenges from a systems perspective. We must now develop flexible computational methods for discovering predictive connections between genes, phenotypes and drugs. These relationships involve complex non-linear and combinatorial interactions, which cannot be found using traditional linear inference. We are currently applying our advanced data mining and modeling procedures to a) predictions in the areas of drug efficacy, genetic predispositions, diagnostics and toxicology, b) systematic experimental design to provide the right set of facts that permit valid analysis, and c) network reverse engineering and in silico experimentation.

11. Transcriptome annotation using Gene Ontology nomenclature Han Xie and Liat Mintz Bioinformatics, Compugen, Jamesburg, NJ Advances in genomic sequencing and computational biology have presented a unique challenge to annotate the transcriptome of different species. Here we report our efforts in systematically examining the human and rodent transcriptome. EST, mRNA and genomic sequences are clustered using Compugen LEADS platform technology (www.labonweb.com, www.cgen.com ) to predict gene clusters. The standardized gene ontology (GO) nomenclature (www.geneontology.org) was utilized to designate the functions, cellular localization and involvement of pathways for transcripts. Annotation procedures were centered on the sequence and motif homology to GO-annotated genes. In addition, text-mining techniques and multi-parameter cellular localization modeling were used to increase the annotation accuracy, and predict novel annotation. The majority of gene clusters containing mRNA sequences have been assigned GO. This systematic annotation of transcriptome will help discover new gene functionality as well as facilitate higher-order analysis of biological systems.

12. Recurrence Quantification as Tool for Protein Bioinformatics Joseph P. Zbilut, Professor, Molecular Biophysics and Physiology, Rush Medical College, Chicago, IL Recurrence quantification analysis (RQA), a methodology related to contact maps which extends analyses to higher dimensions, has been used to understand proteins in a variety of contexts. Unlike FFTs, its utility redounds from its ability to quantify variables without first filtering them through a mathematical transform. Examples of its use will be given in the context of prion singularities, structure/acitivity relationships and protein protperties.

Previous: Participation

Next: Registration

Workshop Index

DIMACS Homepage

Contacting the Center
Document last modified on February 7, 2001.