This special focus is jointly sponsored by the Center for Discrete Mathematics and Theoretical Computer Science (DIMACS), the Biological, Mathematical, and Physical Sciences Interfaces Institute for Quantitative Biology (BioMaPS), the Rutgers Center for Molecular Biophysics and Biophysical Chemistry (MB Center), and the Division of Life Sciences.
This workshop is jointly sponsored with Robert Wood Johnson Medical School Center for Clinical and Translational Sciences and Amicus Therapeutics.
Title: Protein Intrinsic Disorder, Cell Signaling, and Alternative Splicing
Many proteins, including at least 2/3 of the protein structures in the Protein Data Bank, contain regions that lack specific 3-D structure; indeed some proteins lack specific 3-D structure in their entireties under physiological conditions and yet carry out function as indicated by appropriate biochemical assays. Such proteins and regions have been called natively unfolded, intrinsically unstructured, naturally disordered, and rheomorphic, and various combinations of these terms among others. Starting in 1996, we began to explore the prediction of structured and disordered regions from amino acid sequence. This is distinct from the prediction of irregular (also called random coil) regions. Disorder predictions by us and others suggest that a significant fraction of eukaryotic proteins contain significant-sized regions of disorder. Intrinsic disorder is found to be commonly used in cell signaling, with disorder-to-order-transitions upon binding having at least two advantages: 1. the ability to bind with high specificity coupled with weak affinity due to the flexibility in the unbound state; and 2. the ability to bind to two or more partners due to plasticity in the bound state. Alternative splicing is very common in multicellular eukaryotes but rare or perhaps non-existent in single-cell eukaryotes. The RNA removed by alternative splicing is found to code for regions of intrinsic disorder significantly more often than for regions of structure. Given that signaling segments in regions of disorder are formed from small numbers of contiguous amino acids, and given that many disordered regions have been shown to contain many signaling and regulatory segments in tandem, alternative splicing within regions of disorder provides a simple method for bringing about regulatory and signaling diversity. We propose that the combination of alternative splicing plus intrinsic disorder provided a means to "try out" alternative regulatory pathways, thus enabling the evolution of differentiated cells. Against this background, the formation of protein aggregates by some intrinsically disordered protein regions such as the polyQ, NACP, and others may be one cost arising from the binding promiscuity of these proteins.
Title: Folding and Aggregation of a β-Clam Protein in the Test Tube and in the Cell
Ongoing studies will be presented that are directed at a mechanistic understanding of the folding of a predominantly ß-sheet protein, cellular retinoic acid binding protein I (CRABP I), both in the test tube and in the cell, and how these processes may go wrong, leading to aggregation. CRABP I is a member of the large family of intracellular lipid-binding proteins, which share a conserved structure comprised of a short helix-turn-helix and two nearly orthogonal five-strand β-sheets wrapped around a central ligand-binding.(1) Kinetic analysis has provided a description of the energy landscape of in vitro refolding of CRABP I.(2,3) Significant secondary structure develops upon hydrophobic collapse in the earliest (~250 µs) folding event. Native-like topology forms in a ca. 100 msec kinetic phase. Strikingly, stable hydrogen bonding in the β-sheets forms in a fully cooperative manner in a later (1 sec) phase, during which side chain packing interactions also develop. A point mutation at the end of the helix-turn-helix segment of CRABP I (P39A) leads to a slow-folding mutant4 that is aggregation-prone both in vitro and in vivo.(5,6) We hypothesize that this mutant populates a metastable intermediate state because the native helix extends beyond its normal stopping point into a region that must adopt a β-strand for native structure formation. We have mutated wild-type and mutant CRABP I to incorporate sequences that bind specifically to a biarsenical fluoroscein dye 'FlAsH' making it possible to observe unfolding by urea titration in cells and to follow in real time the formation of aggregates in vivo.(6) The kinetics of aggregation of the slow-folding mutant have been examined both in vitro and in vivo and found to be consistent with a nucleation-propagation mechanism, with a monomeric nucleus, consistent with the hypothesized metastable intermediate.(5) We have also explored how an adjacent polyglutamine tract influences the structure of CRABP I. Strikingly, when the polyQ tract exceeds the length that would lead to pathology in triplet-repeat diseases, the structure of the adjacent, otherwise stable CRABP I is significantly perturbed; additionally, the construct with longer polyQ tracts has a high propensity to aggregate both in vitro and in vivo,(7) as reported for other polyQ-containing polypeptides. Recently, we have found that the osmolyte proline counteracts the aggregation tendencies of the slow-folding CRABP I mutant and prevents the formation of detergent insoluble, fibrillar aggregates by the fusion constructs containing long polyQ tracts.(8) The implications of all of these findings will be discussed in terms of how a folding landscape can be altered by competing aggregation processes and what environmental factors may modulate the relative probability of different fates of the folding protein.
Title: Computer Simulation of Protein Fibrillization: Polyalanine, Poly Glutamine and Beta Amyloid
Assembly of normally-soluble proteins into amyloid fibrils is a cause or associated symptom of numerous human disorders, including Alzheimer's, untington's and the prion diseases. We report molecular?level simulations of spontaneous fibril formation. A novel off-lattice, intermediate-resolution protein model, PRIME, has been developed that is simple enough to allow the simulation of multi-protein systems over relatively long time scales, yet contains enough genuine protein-like character to mimic real protein dynamics when used in conjunction with discontinuous molecular dynamics (DMD) simulation, a fast alternative to conventional molecular dynamics We are simulating the formation and properties of fibrillar protein aggregates, by applying discontinuous molecular dynamics to PRIME. Simulations have been performed on systems containing 12 to 96 model polyalanine peptides, KA14K , and 24 to 48 model polyglutamine peptides Q16, Q32, Q36, Q 40 and Q48. Polyalanine was chosen for study because synthetic polyalanine-based peptides, which form alpha-helical structures at low temperatures and low peptide concentrations, have been found to form beta-sheet complexes (fibrils) in vitro at high temperatures and high peptide concentrations (Blondelle et al., Biochem. 36, 8393, 1997). Polyglutamine was chosen for study because it is known to form fibrils in vitro and in vivo, and because of it's connection with Huntington's and other so-called polyglutamine diseases. In our simulations on polyalanine we find that at low peptide concentration, a system of peptides initially in the random coil state forms alpha-helices at low temperatures and assembles into large beta-sheet structures at high temperatures. When the concentration is increased at high temperatures, the system again forms beta?-sheets but these assemble into fibrils ( really protofilaments) as the simulation progresses. The effect of temperature, peptide concentration and chain length on the kinetics and thermodynamics of fibril formation is explored. Movies of the simulation will be shown. In our simulations on polyglutamine, we find that at low temperature the peptides form amorphous aggregates, at intermediate temperatures the peptides form ordered tube-like aggregates with significant beta sheet character, and at high temperatures the peptides form random coils. The optimal temperature for formation of beta sheets increases with chain length up to 36 residues but not beyond. This may help explain the experimental finding that ( approximately) 36 residues appears to be a critical chain length for polyQ aggregation in vitro and in vivo.
Title: Large-scale annotation of coding non-synonymous SNPs: theory and practice
Germ-line DNA variation that results in a single amino-acid residue change in the protein product of a gene may have a major impact on an individual's susceptibility to disease and sensitivity to drugs. The low population frequency of most such variants precludes the use of case/control or familial cosegregation studies to discriminate between those which are pathogenic/high clinical significance vs. neutral/low clinical significance. A promising alternative approach is to look directly at properties of individual nsSNPs in the context of protein structure, sequence, evolution, protein-protein, and protein-ligand interactions.
We consider a variety of candidate predictive features and describe a computational prediction method that uses comparative protein structure modeling, sequence analysis, and a supervised learning algorithm (support vector machine) to identify putative pathogenic nsSNPs. In cross-validated testing on a dataset of 2500 putatively benign nsSNPs from dbSNP  and 1450 disease-associated nsSNPs from OMIM  covering 1841 human proteins, the SVM predictions were 80.5% accurate. The method has been applied to whole-genome annotation of human nsSNPs (http://salilab.org/LS-SNP), to predict the clinical significance of rare nsSNPs in the BRCA1 and BRCA2 genes found in patients at high risk for familial breast cancer, and to predict which nsSNPs identified in the ABC transporter genes of a healthy multiethnic cohort may result in differential response to chemotherapy and efflux of other xenobiotic compounds.
Title: Stability and compensated pathogenic deviations
A number of mutations that are pathogenic to humans exist in sequences of normal, sometimes closely related individuals. Such cases, called Compensated Pathogenic Deviations (CPDs), have been described in proteins and tRNA molecules. The evolutionary trajectory of a CPD involves at least two interacting events, the CPD itself, and a compensatory substitution that negates the negative effects of a would-be pathogenic mutation. I will present cases where the molecular basis of the interaction of compensated and compensatory substitutions is known, and outline the impact of CPD data on our understanding of the relationship between genotype and phenotype.
Title: Catalytic origins of protein misfolding in end-stage renal disease
Polypeptides have a generic capacity to self-associate into linear, aggregated assemblies termed amyloid fibers. Fibers are insoluble, protease resistant, and the intermediates of formation are cytotoxic. Proteins are therefore under selective pressure to avoid this phenomenon. Consequently, amyloid formation is most prevalent in diseases of advancing age, such as Alzheimer's, and diseases associated with medical therapy, such as dialysis related amyloidosis (DRA). High resolution structural insights into the mechanisms of assembly are elusive due to the transient and heterogeneous nature of fibrillogenesis. Here, we report the conformational changes which initiate fiber formation by the DRA associated amyloid protein, β-2 microglobulin (β2m). Access of β2m to this conformation is enabled by selectively binding divalent cation. The chemical basis of this process was determined to be backbone isomerization of a conserved proline. Based on this finding, we designed a β2m variant that closely adopts this intermediate state. The variant has kinetic, thermodynamic, and catalytic properties consistent with the fibrillogenic intermediate of WT β2m. Furthermore, it is stable and folded enabling us to unambiguously determine the initiating conformational changes at atomic resolution.
Title: SNPs, Protein Structure, and Disease
We have developed structure and sequence based models of the impact of SNPs on protein function in vivo. The models have been applied to a set of single nucleotide variants known to cause monogenic disease, and to a set found in the human population, and not known to be associated with disease. There are two surprising findings from the analysis. First, most monogenic disease causing variants act by mildly destabilizing protein structure. The results suggest that most proteins are only just sufficiently stable to operate effectively in vivo. Second, about a quarter of the SNPs found in the population and not known to be associated with disease appear to seriously impair function at the molecular level. Examination of a set of these cases suggests a variety of mechanisms that make the larger scale system robust with respect to component defects. Network level robustness analysis has the potential to identify those SNPs that most likely contribute to susceptibility to complex diseases. To facility this, we have integrated all the pertinent data into a 'knowledgenet' interface (www.snps3d.org), allowing rapid assessment of the known relationships between proteins relevant to a particular disease, as well as access to molecule level information and to the supporting literature.
Title: Toward a Molecular Level Understanding of Polyglutamine Aggregation
The physical principles that underlie the mechanism of polyglutamine aggregation are of direct relevance to the onset and progression of nine different neurological diseases. The reigning hypothesis is that disease risk and severity of disease are directly connected to the length-dependent propensity of polyglutamine domains to form aggregates rich in beta-sheets. There are several unanswered biophysical questions pertaining to the structure and stability of monomeric and oligomeric forms of polyglutamine. I will present a sampling of results that go toward answering some of these questions. I will also discuss new simulation methods that we are developing to obtain quantitative thermodynamic and structural information about peptide aggregation.
Title: Amyloid Formation by Islet Amyloid Polypeptide
Islet amyloid polypeptide (IAPP, amylin) is responsible for amyloid formation in type-2 diabetes. Mature IAPP is a 37 residue hormone that is co-secreted with insulin from the pancreatic islet ß-cells. IAPP is both an important protein for study in its own right and an excellent model system for studies of fibrillation by non-globular precursors. IAPP is notoriously difficult to work with and is even more prone to aggregate than the Aß peptide. Efficient new methods for preparing the peptide will be described and the results of our recent biophysical investigations into IAPP aggregation will be outlined. In vitro experiments designed to test a model for amyloid formation which involves incorrectly processed IAPP binding to components of the extracellular matrix will also be described.
Title: Structural Studies of Amyloid-like Fibrils
Amyloid or amyloid-like fibrils are elongated insoluble protein aggregates formed in association with neurodegenerative diseases in vivo or in vitro from soluble proteins, respectively. The underlying structure of the fibrillar or "cross-beta" state has presented long standing fundamental questions of protein structure. These include whether fibril-forming proteins have two structurally distinct stable states, native and fibrillar, and whether all or only part of the native protein refolds as it converts to the fibrillar state. We have designed an amyloid of the well-studied enzyme Ribonuclease A and shown that it consists of domain-swapped native like molecules decorating a cross-beta spine. We are also determining atomic resolution structures of peptides predicted to participate in the beta-sheet rich spine of various amyloidogeneic proteins including Sup35, tau and Abeta. These structures shed light on the interactions that stabilize the beta-sheet core of amyloid fibrils. They also provide starting points for the design of therapeutics against neurodegenerative diseases like Alzheimer's and Parkinson's.
Title: Sequence dependence of amyloid formation and toxicity
Due to the complexity of the events involved in the pathogenicity of amyloid formation, a simplest model to study the molecular basis of this process would be desirable. Peptide model systems have been very helpful to provide outstanding knowledge about the underlying factors in amyloid formation. Therefore, short peptides capable to polymerize into fibrils with the same properties that natural amyloid proteins could be a successful alternative to study the cytotoxic mechanism. Based on that, we have scanned all the human amylodogenic proteins described so far with the amylodogenic pattern described by our group. Peptides fibrillation assays showed that the amyloid stretches identified are able to form amyloid-like fibrils. Also, the fibrils formed by these hexapeptides are not pathogenic in PC12 cell culture. On the other hand prefibrillar ordered aggregates of amyloid stretches were toxic. These toxic oligomers obtained from different hexapeptides displayed identical morphology by EM, suggesting that sequence does not play a general role in the toxicity mechanism, although it plays a role in determining which amyloid fibrils will make more of the toxic pre-fibrillar structures. This is further conformed by the observation that D- and L-versions of the same sequence exhibit similar toxicities. Analysis of fluorescently labelled peptides showed attachment of the prefibrilar structures to the cell membrane, but no internalization. Thus it seems toxicity seems to take place by some sequence-independent membrane interactions.
Title: The effect of missence mutations in the N-terminus of BRCA1 on ubiquitine ligase activity and their relationship to breast cancer susceptibility.
The N-terminus of the BRCA1 protein associates with the N-terminus of BARD1 to form a heterodimer which, in concert with E2 ubiquitin-conjugating enzyme UbcH5a, exhibits ubiquitin ligase activity. This activity is abrogated by some BRCA1 missense mutations which occur in the germline of individuals affected with breast cancer, and which segregate with the disease in other family members. However, the majority of missense substitutions reported in this region of BRCA1 are infrequent in the population, and although found in patients with a personal or family history of disease, do not have strong evidence of being disease-causing usually through lack of family pedigree information; neither has there been functional understanding of their impact.
We have examined, by extensive missense substitution, the interaction of BRCA1 with BARD1 and UbcH5a. Selection from a randomly generated library of BRCA1 missense mutations for variants that inhibit the interaction with these components, identified substitutions in residues found altered in patient DNA, indicating a correlation between loss of component binding and propensity to disease development. Patient variants that inhibit the BRCA1:E2 interaction show loss of ubiquitin ligase activity and correlate with disease susceptibility and theoretical predictions of pathogenicity. These data link loss of ubiquitin ligase activity, through loss of E2 binding, to the majority of non-polymorphic patient variants described within the N-terminus of BRCA1 and illustrate the likely significant role of BRCA1 ubiquitin ligase activity in tumour suppression.
Title: Human non-synonymous SNPs: molecular function, evolution and disease
Computational methods for predicting the functional effect of non-synonymous SNPs assume that negative effect on molecular function results in deleterious effect on phenotype and fitness. Evolutionary models of human disease suggest that it is not always the case. However, multiple rare deleterious alleles may be responsible for human complex phenotypes. This viewpoint is supported by our observation that the mutation target for new missense mutations of mildly deleterious effect is very large. The analysis of human disease mutations and human-chimpanzee divergence showed that only ~20% of new missense mutations result in complete loss of function and ~25% are effectively neutral. The remaining majority may lead to rare allelic variants segregating in the human population. Indeed, as seen from the results of massive re-sequencing projects, more than 50% of non-synonymous SNPs with frequency of 1% and below are deleterious. Surprisingly, this means that allele frequency alone may be a predictor of the functional effect of non-synonymous SNPs.
This result justifies the use of computational technique for predicting the SNP function. It also justifies the association study design based on complete re-sequencing rather than genotyping.
These and other practical applications of computational methods for predicting functional SNPs require high accuracy of predictions. We were able to greatly reduce fraction of false-positive predictions in the new version of the earlier developed program PolyPhen. Applications to the studies of human complex phenotypes will be discussed based on the example of plasma levels of HDL-Cholesterol.
Title: Early events in aggregation of prion proteins and Aß peptides
I will discuss scenarios for protein and peptide aggregation. These conceptual ideas will be illustrated by applications to the formation of amyloidogenic form of the cellular prion protein. The mechanism of oligomerization of Aß peptides will be presented.
Title: Molecular structures of fibrils associated with amyloid diseases and yeast prions
Amyloid fibrils are inherently noncrystalline and insoluble materials, making structure determination by x-ray crystallography or solution NMR difficult. In contrast, amyloid fibrils are ideal systems for structural studies by solid state NMR methods, supplemented by information from electron microscopy and fiber diffraction(1). I will briefly describe how we obtain experimental constraints on secondary, tertiary, and quaternary structure in amyloid fibrils from solid state NMR measurements. I will then describe recent results for amyloid fibrils formed by the 40-residue ß-amyloid peptide associated with Alzheimer's disease(2), the 37-residue amylin peptide associated with type 2 diabetes, and the Ure2p yeast prion protein(3). These results provide insights into: (1) the intra- and inter-molecular interactions that stabilize amyloid structures in general; (2) the extent to which the amino acid sequence uniquely determines the molecular structure in an amyloid fibril; (3) the extent to which amyloid fibrils formed by different polypeptides share a common structure; (4) the validity of various proposals regarding amyloid structures that have been based primarily on modeling (rather than direct experimental) studies.
Title: Computational Method for Rapidly Predicting Amyloidogenic Sequences in Proteins and Polypeptides
Amyloid diseases such as Alzheimer's disease (AD) and Parkinson's disease (PD) are characterized by transformation of proteins from their proper native structure into an abnormal beta-rich structure known as amyloid fibril. This process is actually triggered by short peptide segments, i.e., core nucleation motifs, within the polypeptide sequence that readily convert from alpha-helix or random coil in their native state to beta-strand. We refer to this trait as "hidden beta-strand propensity" (HbetaP). A viable therapeutic strategy for fibril elimination is to design aggregation inhibitors that specifically target these core nucleation motifs; however, their identification in proteins remains a major bottleneck. To overcome this impediment, we have developed a novel computational tool that pinpoints these core nucleation motifs of input polypeptides or proteins by calculating the Contact-Dependent Secondary Structure Propensities (CSSP). To our best knowledge, no other computational method exists that possesses this capability. The core nucleation motifs identified by our CSSP tool then serve as targets for the rational design of aggregation inhibitors as potential therapeutic agents. Novel inhibitor design schemes based on multifunctional retro-inverso peptide bioconjugates are proposed that confer improved avidity, selectivity, in vivo stability, and transport (blood-brain and intestinal) properties. In addition, we introduce the Amyloidogenic Sequence Knowledge Base (ASKB: http://opal.umdnj.edu/) that provides access to the CSSP method and to a searchable library of CSSP-predicted amyloidogenic sequences in proteins.
Title: Kinetics and thermodynamics of amyloid fibril formation
The classical picture of non-native protein aggregation is of amorphous structures formed irreversibly from unfolded or misfolded states via colloidal coagulation kinetics and held together by non-specific hydrophobic interactions. We are finding, however, that amyloid assembly, perhaps consistent with the high degree of order in the product, exhibits a highly choreographed assembly mechanism, as well as a surprising degree of reversibility. One implication of these findings is that, at least in favorable circumstances, aspects of amyloid structure and assembly can be extracted by studying the thermodynamics and kinetics of amyloid assembly. Thermodyanamics: Although non-native protein aggregation is generally found to be essentially irreversible, some amyloid assembly reactions arrive at measurable equilibrium positions associated with characteristic free energies. In vitro fibril formation by Aß, the major peptide component of the Alzheimer's disease amyloid plaque, proceeds reproducibly to a critical concentration in the low micromolar range. By measuring the corresponding endpoints for structural variants of Aß, we can derive free energies of elongation that contain information about amyloid fibril structure. Kinetics: In favorable cases, analysis of assembly kinetics can yield mechanistic and energetic information about key intermediates such as the aggregation nucleus. In vitro polyglutamine aggregation, associated with expanded CAG repeat diseases like Huntington's disease, proceeds via a nucleated growth polymerization mechanism in which the nucleation event is a highly unfavorable conformational change within the monomer. We estimate that, for a polyglutamine peptide of repeat length 47, which is well above the pathological repeat length threshold of 36, the equilibrium constant describing this unfavorable protein folding reaction is on the order of 10-9. More recent work is on how structural fluctuations within the polyglutamine monomer, as influenced, for example, by repeat length and by flanking sequences, influences aggregation kinetics and thermodynamics. These studies are helping us to understand fundamental principles of amyloid assembly and may provide useful data for testing computational models and for understanding disease processes.