DIMACS Workshop on Identifying Genetic Signatures for the Evolution of Complex Phenotypes

June 11 - 12, 2009
DIMACS Center, CoRE Building, Rutgers University

Organizers:: Gyan Bhanot, Rutgers University; Cancer Institute of New Jersey and the Institute for Advanced Study, gyanbhanot at gmail.com; Raul Rabadan, Columbia University, rabadan at dbmi.columbia.edu

Presented under the auspices of the Special Focus on Computational and Mathematical Epidemiology and the DIMACS/BioMaPS/MB Center Special Focus on Information Processing in Biology.

This special focus is jointly sponsored by the Center for Discrete Mathematics and Theoretical Computer Science (DIMACS), the Biological, Mathematical, and Physical Sciences Interfaces Institute for Quantitative Biology (BioMaPS), and the Rutgers Center for Molecular Biophysics and Biophysical Chemistry (MB Center).

Abstracts:

Harmen Bussemaker, Columbia University

Title: Identifying the genetic determinants of transcription factor activity

The molecular mechanisms underlying the heritability of gene expression currently are poorly understood. However, they are highly likely to involve mediation by trans-acting regulatory factors. We here present a method for identifying genomic loci ("aQTLs") whose allelic variation modulates transcription factor activity. It integrates parallel genotyping and expression profiling data with quantitative prior information about the DNA binding specificity of transcription factors. Post-translational regulatory mechanisms are fully accounted for. Application of our method to segregants from a genetic cross between two haploid yeast strains shows a dramatic increase in the statistical power to detect locus-TF associations, while most previously discovered associations are recovered. Novel findings include functional regulatory polymorphisms in the transcription factor genes STB5 and RFX1, and antagonistic modulation of four cell-cycle related transcription factors by the cyclin-dependent kinase Cdc28. Our method is strictly causal, computationally efficient, and should be widely applicable.

Andrea Califano, Columbia University

Title: Interactome Analysis reveals Master Regulators of human malignancy signatures

The identification of genes acting synergistically as master regulators of physiologic and pathologic cellular phenotypes is a key open problem in systems biology. Here we use a molecular interaction based approach to identify the repertoire of transcription factors and signaling proteins controlling physiologic and tumor-specific phenotypes. Specifically, we analyzed molecular interaction networks inferred by reverse engineering algorithms to identify master regulators of Germinal Center formation and of initiation and maintenance of the mesenchymal phenotype of Glioblastoma Multiforme (GBM), previously associated with poor prognosis. We will discuss the methodology to reconstruct and biochemically validate the context-specific molecular interaction networks, as well as to infer and validate Master Regulator genes. For instance, starting from the unbiased analysis of all TFs, we identify a highly interconnected module of six TFs jointly regulating >75% of the genes in the signature. Two TFs (Stat3 and C/EBPb), in particular, display features of initiators and master regulators of module activity. Ectopic co-expression of these master TFs in murine neural stem cells activates expression of the mesenchymal signature and is highly tumorigenic in vivo. Conversely, co-silencing of the TFs in human glioma cells leads to collapse of the mesenchymal signature and reduction of tumor aggressiveness.

Kevin Chen, NYU and Rutgers University

Title: Population Genomics of MicroRNA and Transcription Factor Binding Sites

Understanding the patterns of mutation and selection in cis-regulatory sites is important for studying human medical genetics, bioinformatic predictions of cis-regulatory sites and evolution. I will present population genomic studies of microRNAs and their binding sites using large-scale SNP genotype data in humans and whole-genome resequencing data in Drosophila simulans. We show that predicted microRNA binding sites are under strong selective constraint, making them good candidates for causal variants of human disease. We also give predictions of non-conserved microRNA binding sites in genes expressed in the same tissue as the microRNA and show how our approach can be used to compare different miRNA target prediction programs. I will then present ongoing work on correlating changes in predicted transcription factor binding sites with changes in gene expression in Saccharomyces cerevisiae and validating these causal variants experimentally.

Ian Ehrenreich, Princeton University

Title: Defining the genetic complexity of phenotypic variation using segregating yeast populations

Identifying the loci that cause complex trait variation, including human disease, has proven to be a challenge because most phenotypes are caused by multiple genetic factors that interact both with each other and with the environment. Recent research suggests commonalities exist between humans and yeast in the genetic architecture of complex traits. Sacchoromyces cerevisiae thus can provide insights into the genetic basis of phenotypic variation that may be difficult or impossible to discover in humans. However, even in S. cerevisiae, mapping studies regularly miss most of the causal loci underlying complex traits, implying that these studies are underpowered. We are developing a new approach that should have the power to map a greater fraction of the loci that are responsible for a phenotypic difference between two yeast strains. The key innovations in our approach are i) the easy generation of segregating populations of millions of S. cerevisiae MATa haploids from any yeast cross and ii) the streamlining of genetic mapping by conducting genotyping and phenotyping on entire segregating populations rather than on individual segregants. I will present recent progress in our research and will discuss the importance of understanding the complex genetic basis of trait differences among individuals.

Ben Greenbaum, IAS, Princeton

Title: Innate Immunity and RNA Viruses

The innate immune response provides a first line of defense against pathogens by targeting generic differential features that are present in foreign organisms but not in the host. These innate responses generate selection forces acting both in pathogens and hosts that further determine their co-evolution. I will describe an analysis of the nucleic acid sequence fingerprints of these selection forces acting in parallel on both host innate immune genes and ssRNA viral genomes. This is done by identifying dinucleotide biases in the coding regions of innate immune response genes in plasmacytoid dendritic cells, and then use this signal to identify other significant host innate immune genes. The persistence of these biases in the orthologous groups of genes in humans and chickens is also examined. I will compare the significant motifs in highly expressed genes of the innate immune system to those in ssRNA viruses and study the evolution of these motifs in the H1N1 influenza genome.

Jody Hey, Rutgers University

Title: Likelihood and Bayesian methods for developing detailed portraits of demographic history.

In this talk I will describe how underlying model of demography plays a role in identifying genetic/phenotype associations.

Arnold J Levine, IAS, Princeton

Title: Uncovering Genes Associated with Autism

This talk will discuss new results identifying the genetics underpinnings of Autism.

Bud Mishra, NYU

Title: Human Population Genomics: Man, Woman, Birth, Death, Infinity, Plus Altruism, Cheap Talks, Bad Behavior, Money, God and Diversity on Steroids

Our ancestors became almost extinct twice, the most recent being about 40,000 to 60,000 years ago. At one point, the population had shrunk to as few as 4,000 individuals, but expanded rapidly as humans migrated to other parts of the world and learned to farm and domesticate animals. The genomes of the current human population record this history as it has been molded by mutations (polymorphisms), migration, genetic drifts and selection. The statistical distributions of genes and other genomic elements are hard to decipher since it mixes huge amount of diversity fueled by genetic drift, resulting from small populations and non-random mating, with significant differences that contribute each individual's overall traits. However, as we prepare to usher in the age of individualized medicine, we have to attack the underlying statistical analysis problem on several fronts: (1) Technology, (2) Systems Biology and Genetics, (3) Statistical Algorithms, and (4) Large-Scale System Building. My group has been engaged in developing a single-molecule sequencing technology (SMASH) and sequence assembly algorithms (SUTTA) to collect very high-quality haplotypic sequencing data from a large number of individuals. Using this data, we aim to catalog and understand how different polymorphisms (SNP, CNV, segmental rearrangements and possibly many others) originate and diffuse through the population. This will then lead to various novel non-parametric algorithms to model the stochastic processes that are modulated by population sizes, migration and mating patterns. This integrated technology can then be used to discover and exploit groups of genetic markers to drive the core recommender engine of individualized medicine. I will discuss various open problems related to this strategy and their possible solutions.

Alexander Morozov, Rutgers University

Title: Evolutionary origin of pairwise and higher-order correlations among amino acid mutations.

Protein structures are determined by their amino acid sequences. As proteins evolve, patterns of amino acid mutations reveal those positions that contribute most to the protein stability and function. Specifically, pairwise and higher-order correlations between amino acid mutations are ubiquitous in protein families. Here we demonstrate that modeling correlations is crucial for understanding the emergence of complex mutational patterns that confer drug resistance to HIV-1 subtype B protease. Using a probabilistic approach designed to treat pair level and higher-order correlations within the same framework, we demonstrate that including pair interactionsis essential for qualitative agreement with mutational data. Furthermore, triplet and higher-order interactions have a significant effect on the predicted frequencies of sequences with large numbers of mutations. Such sequences are of special interest as they are more prevalent after multi-drug therapy. It is a challenging problem to understand the origin of the observed correlations. The correlations may arise through physical proximity of mutated residues, or appear due to the non-linear dependence of protein stability on the folding free energy. In the latter case, a destabilizing mutation is compensated by a stabilizing one, which restores protein function to wild-type levels even if mutated sites are not in close spatial proximity. This compensatory mechanism may explain why attempts to identify amino acid contacts in the structure from mutational data have only met with limited success so far. We have developed a biophysical framework that explicitly takes both of these mechanisms into account and makes an interaction energy prediction for each amino acid pair. Understanding the correlation in terms of both physical and compensatory interactions will guide design of future algorithms that identify amino acid contacts from protein sequence alignments.

Richard Neher, KITP UCSB

Title: Alleles versus Genotypes

Competition between Epistasis and Recombination can cause a Transition between two Regimes of Selection. Biochemical and regulatory interactions, known to be central to biological networks, are expected to cause extensive genetic interactions or epistasis. Yet, the inference of epistasis from the observed phenotype-genotype correlation is impeded by statistical difficulties, while our theoretical understanding of the effects of epistasis on population dynamics remains limited, which in turn limits our ability to interpret data. Of particular interest is the situation of numerous interacting genetic loci with small individual contributions to fitness. Using a computational model, we demonstrate that interacting loci can, despite frequent recombination, exhibit cooperative behavior that locks alleles into favorable genotypes leading to a population consisting of a set of competing clones. As the recombination rate exceeds a certain critical value that depends on the strength of epistasis, this "genotype selection" regime disappears in an abrupt transition giving way to "allele selection" - the regime where different loci are only weakly correlated as expected in sexually reproducing populations. Interestingly, large populations attain highest fitness at a recombination rate just below critical. Clustering of interacting sets of genes on a chromosome leads to the emergence of an intermediate regime, where blocks of cooperating alleles lock into genetic modules. These haplotype blocks disappear in a second transition to pure allele selection.

Itsik Pe'er, Columbia University

Title: Shared genetic segments within and across populations

The availability of cost-effective, high throughput technologies to genotype common alleles has yielded an unprecedented wealth of genomewide data on human variation, deeply sampled within and across populations. We have developed a rapid method that facilitates extensive evaluation of shared genetic segments across millions of sample-pairs.We report analysis of such sharing to improve understanding of recent genetic history of samples, both genomewide as well as for specific loci. We show extensive hidden relatedness between individuals within populations that provides estimates of demographic parameters.Specifically, for Ashkenazi Jewish populations we demonstrate and a severe bottleneck 20-25 generations before present. We show genetic sharing to be focused at regions that suggest a causal mechanism for ancient sharing rather than recent relatedness, such as the HLA and the commonly polymorphic inversion of 5Mbp on chromosome 8p23.1 . Finally, we filter out sharing that is non-informative because it is too recent or causal and show clustering of populations based on genetic sharing.

Gustavo Stolovitzky, IBM Research

Title: Learning to read: DNA sequencing technologies and the $1000 genome

The quest for faster and cheaper ways to sequence full human genomes has triggered much technological innovation. In this talk I will review the early history of DNA sequencing, and discuss some of the very creative solutions inspired by the challenge to reach the dream of sequencing a human genome for $1000 in a day.

Masashi Tanaka, TMIG, Tokyo, Japan

Title: Mitochondrial genome haplogroup D4a is enriched in Japan semi-supercentenarians.

Mitochondrial genome polymorphisms contribute longevity and susceptibility to various age-related diseases. We previously reported that mitochondrial haplogroup D (mt5178C>A) is enriched in Japanese centenarians. We now extended our research to semi-supercentenarians (SSC, aged above 105 years). [Materials and Methods] We analyzed complete mitochondrial DNA (mtDNA) sequences from 112 Japanese SSC, and compared with previously published data from 96 subjects in each of three non-disease phenotypes (centenarians, healthy non-obese males, obese young males) and four disease phenotypes (diabetics with and without angiopathy, and Alzheimer's and Parkinson's disease patients). [Results and Conclusion] We confirmed the correlations observed in a previous study showing enrichment of a hierarchy of haplogroups in the D clade for longevity. For the extreme longevity phenotype we detected enrichment of haplogroup D4a in centenarians and SSC. Haplogroup D4a is characterized by a non-synonymous mutation 14979C>T causing Ile78Thr in cytochrome b of ubiquinonol-cytochrome c oxidoreductase. Haplogroup D4a is also associated with a synonymous mutation 8473T>C, which disrupts the 1st 13-bp direct repeat (8470-8482, ACCTCCCTCACCA to ACCCCCCTCACCA) flanking the 4977-bp common mtDNA deletion. Further study is needed to determine whether these mutations 14979C>T and 8473T>C confer resistance against oxidative stress and suppress age-associated accumulation of deletions in the mitochondrial genome.

Saeed Tavazoie, Princeton University

Title: A cognitive framework for understanding cellular behavior?

Through a combination of physiological observations, /in silico /simulations, and laboratory experimental evolution, we provide evidence that intracellular regulatory networks are capable of predictive behavior in a fashion similar to metazoan nervous systems. Our observations challenge the dominant homeostatic framework and reveal 'psychological' constraints on the evolution of intracellular regulatory networks.

Alexei Vazquez , IAS, Princeton

Title: Human genetic variants of PP2A subunits associate with altered cellular stress responses, evolutionary selection and cancer risk.

Protein phosphatase 2A regulates key components of human cellular stress response pathways. Here we investigate the association of genetic variants of the PP2A regulatory subunits and cellular stress response, evolutionary selection and cancer. Using data reported for the NCI60 tumor derived cell lines we uncover genetic variants manifesting a significant correlation between their genotype and the response to several chemotherapeutic agents. These genetic variants tag haplotypes with unexpectedly high frequencies in the Caucasian population, indicating that the associated genomic regions are under natural selection. Finally, we provide clinical evidence manifesting a significant association between two of these identified genetic variants and clinical variables.

Contacting the Center
Document last modified on June 9, 2009.