Title: Cryptogenes and RNA Editing
RNA editing is a collective term referring to enzymatic processes that change RNA sequence apart from splicing, 5' capping or 3' extension. Our research focuses on uridine insertion/deletion mRNA editing in mitochondria of kinetoplastid protists. This type of editing corrects frameshifts, introduces start and stops codons, and often adds much of the coding sequence to create translatable mRNAs from cryptogenic transcripts. The mitochondrial genome of Trypanosomatids is composed of ~50 maxicircles encoding ribosomal RNAs and proteins, and thousands of minicircles. To produce functional mRNAs, a multitude of nuclear-encoded factors mediate interactions of maxicircle-encoded pre-mRNAs with a vast repertoire of minicircle-encoded guide RNAs. Editing reactions of mRNA cleavage, U-insertions or U-deletions, and ligation are catalyzed by the RNA editing core complex (RECC, the 20S editosome) while each step of this enzymatic cascade is directed by guide RNAs. These 50-60 nucleotide (nt) molecules are 3' uridylated by RET1 TUTase and stabilized via association with the gRNA binding complex (GRBC). Remarkably, the information transfer between maxicircle and minicircle transcripts does not rely on template-dependent polymerization of nucleic acids. Instead, intrinsic substrate specificities of key enzymes are largely responsible for the fidelity of editing. Conversely, the efficiency of editing is enhanced by assembling enzymes and RNA binding proteins into stable multiprotein complexes. In addition to editing, recent studies unraveled a highly integrated network of pre- and post- editing 3' mRNA processing steps required for mRNA stabilization and translation, respectively. Based on physical interactions and functional links between mRNA editing and 3' modification complexes and ribosomes, a model which integrates RNA editing into the mitochondrial gene expression pathway will be discussed.
Title: Generation of variation in trypanosomes
Enhanced mutation contributes in several fundamental ways to the fitness of African trypanosomes. In the process known as antigenic variation, these parasites mount a continuously varying surface coat phenotype that facilitates evasion of mammalian immune systems. The coat is specified by an archive of 2000 silent variant surface glycoprotein (VSG) genes and pseudogenes, which are switched on differentially. Enhanced mutation contributes to trypanosome antigenic variation at different levels: switching occurs by gene conversion; novel variants are generated by combinatorial conversions between genes and pseudogenes during switching; the silent archive hypermutates by indels, base substitutions and intragenic conversions. How is this sophisticated system coordinated in evolutionary terms? Interestingly, the archive and its expression are organized in ways that appear to enhance antigenic variation. One way is that all VSG are in chromosomal subtelomeres. Unlike chromosome cores, subtelomeres undergo relatively rapid mutational processes that are ideal for the generation of multigene families frequently associated with phenotypic diversity in eukaryotes. We are dissecting antigenic variation in a systems approach, relating phenomena and events at different scales, from molecules to host populations.
Title: Environmental Regulation of Switching Rates in Phase Variable Pathogens and Commensal Bacteria
Many surface structures of bacterial pathogens and commensals are subject to high frequency, reversible changes in expression. This type of phenotypic variation is termed phase variation and is driven by a range of mechanisms such as differential methylation of promoter elements, site-specific recombination or mutations in simple sequence repeat (SSR) tracts. These mechanisms exhibit significant differences in their responsiveness to environmental signals and to evolvability of mutation rates. Mutations in SSRs located in either the reading frame or promoter elements drive phase variation of bacterial genes. Multiple genes are subject to phase variation in Haemophilus influenzae, Neisseria meningitidis and Campylobacter jejuni. I will present a comparison of the cis- and trans-acting factors influencing the mutation rates of SSRs in these species and discuss how the mutational patterns of the SSRs may influence evolution of phase variation in these species. Changes in the proportions of phase variants for multiple genes have been examined in bacterial populations from both experimental models and natural infections. Theoretical models have been used to investigate the influence of high mutation rates, selection and environmental switching patterns on the structures of phase variable populations. I will discuss how these novel data sets and models are influencing our understanding of the contributions of phase variation to bacterial adaptation and pathogenesis and the impact these environmental forces may have on evolution of SSR-mediated phase variation.
Title: Genome organization and evolution in vertebrates
Thirty-five years ago we discovered that the vertebrate genome is compositionally compartmentalized. Indeed, it is a mosaic of isochores, megabase-size DNA stretches that cover a broad compositional spectrum and can be assigned to a small number of families (five in the human genome). Isochores correspond to the highest-resolution chromosomal bands, about 3200 of them in the human genome. Gene density (as well as other properties) allowed us to identify two genome spaces, a small, gene-rich, GC-rich genome core only representing 15% of the genome, and a large, gene-poor, GC-poor genome desert corresponding to the remaining 85% of the genome. A number of properties that distinguish the two genome spaces will be described.
The isochore structure of the genome also allowed to discover a genomic code, the collective definition of a set of strong compositional correlations that basically link (i) the GC levels of coding sequences with those of their extended non-coding flanking sequence; (ii) the GC level of coding sequences with the secondary stretches (helix, coil, aperiodic), the hydropathy and the thermodynamic stability of proteins; and (iii) the GC levels of non-coding and regulatory sequences with their short nucleotide sequences, i.e. with nucleosome positioning and transcription factor binding. The existence of a genomic code demonstrates that the genome is an integrated ensemble which encodes not only aminoacid sequences (according to the genetic code) and the secondary structure of proteins, but also the regulation of gene expression.
Title: Transposable elements in human germline genetic variation and cancer
Mobile DNAs hugely impact genome content in varying manners across taxa. High-throughput approaches for detecting interspersed repeats in human genomes are beginning to reveal how much mobile DNAs contribute to genetic diversity. Our laboratories recently reported a ligation mediated PCR strategy termed transposon insertion profiling (TIP) which can be coupled with tiling microarray or next generation sequencing analyses (TIP-chip and TIP-seq) for mapping families of human retrotransposons. We will describe how this method can be used to detect L1(Ta) LINEs and AluYa and AluYb SINEs, applications which have identified numerous novel insertion polymorphisms with a wide range of allele frequencies. Our data show activity of these 'copy and paste' transposons is higher than previous estimates, and that the resulting repeats are major underascertained sources of genomic structural variation. Finally, we will review roles transposons can play in cancer development, both as somatic mutagens and as heritable, polymorphic indels that may underlie predispositions to cancers and other common diseases and traits. We expect technologies targeted specifically for discovery of interspersed repeat insertions will prove useful to researchers studying human genetic variation and instability of genomes in oncogenesis.
Title: Sites of genetic instability in mitosis and cancer
Fragile sites are correlated with chromosomal alterations in cancer cells, but it is not fully understood how and why these sites generate instability. Human common fragile sites are large 100-500 kb regions, with breaks occurring throughout the region. It has been proposed that breaks in human fragile site regions are stimulated by DNA "flexibility peaks," which are AT-rich regions with high variation in twist angle. Using yeast cells with a YAC carrying a 1330kb insert from human fragile site FRA3B, we have investigated the location of breaks in the YAC. Our yeast contain mutations that increase the efficiency of telomere "capping" of breaks, so that we can map the location of telomere-capped YAC breaks. Our data indicate that there are three "hotspots" for breaks in FRA3B, and these hotspots are near clusters of flexibility peaks. In a second project, we are investigating whether instability at fragile sites stimulates mitotic recombination and crossovers. Mitotic recombination can result in loss of heterozygosity (LOH), and thus can contribute to generation of cells homozygous for mutant tumor suppressor genes. We are using a system that allows us to recover and map the location of mitotic crossovers on yeast chromosome III, which contains a naturally-occurring fragile site. I will discuss the recombination events we have analyzed on this chromosome.
Title: Evolution of active deamination of DNA and RNA
Organisms have evolved complex machineries to reduce the risks of genomic damage, it is thus surprising the existence of a class of enzymes that target nucleic acids to insert mutations. The AID/APOBECs arose at the beginning of the vertebrate radiation with Activation Induced deaminase (AID), critical for the somatic diversification of the antibody genes, and APOBEC2, involved in muscle development. A complex history of gene duplications in tetrapods led to APOBEC1, the APOBEC5, and the APOBEC3s. Versatility seems to be the paradigm for this gene family, with each member taking on novel roles and characteristics. Thus APOBEC1, while being able to target DNA, is part of a specific pathway of C>U RNA editing in mammals. On the other hand the APOBEC3s have undergone profound diversification and are part of an innate pathway against foreign and mobile DNA/RNA elements, including HIV. These enzymes are powerful tools in their respective pathways, but their ability to edit nucleic acids represents a double edged sword and failure in their regulation can lead to aberrant function and cancer.
Title: Dissecting homologous recombination outcomes: A critical role for noncrossovers in genomic diversity
Developing germ cells induce a large number of DNA double-strand breaks (DSBs) at hotspots throughout the genome to initiate meiotic recombination. In contrast to mitosis, DSB repair between homologs is favored in meiosis, allowing recombination outcomes to be mapped to fine-scale providing insight into DNA repair mechanisms and their influence on genomic architecture. Repair of these DSBs can produce crossovers (COs) between homologs that ensure accurate chromosome segregation. However, only a small fraction of DSBs are repaired as COs. To analyze interhomolog noncrossovers (NCOs), we performed high-resolution mapping in hybrids derived from inbred mouse strains. We provide direct evidence that the vast majority of repair events are interhomolog NCOs, consistent with models in which frequent interhomolog interactions promote accurate chromosome pairing. Transmission distortion was observed in one hybrid, with NCOs providing a more significant contribution than COs. Thus, NCO recombination events play a substantial role in genome evolution (1). To further delineate mammalian recombination mechanisms, results from a new method to map multiple products from individual recombination events, also known as "mouse tetrads", is presented. We provide the first direct evidence in mammalian meiosis of meiotic-dependent CO and NCO gene conversion. Furthermore NCOs result from unidirectional transfer of information to the recipient strand and are likely formed via synthesis-dependent strand annealing rather than through the canonical double Holliday junction resolution model.
(1) Cole F, Keeney S, Jasin M. 2010. Comprehensive, fine-scale dissection of homologous recombination outcomes at a hotspot in mouse meiosis. Molecular Cell 39: 700-710.
Title: Knowing where transposons go and why
Transposable elements are present in virtually all genomes. Indeed, they are the major components of many genomes, i.e., they comprise about 45% of the human genome and about 85% of the maize genome. Transposable elements can influence genome structure, function and evolution. The transposable element population of any genome is influenced by its target site selection and also how it may be removed from the genome, for example, by excision. Our research is focused on DNA cut and paste transposon that move by excision from a donor site and integrate into a new target site.
We have studied the de novo insertion of the hAT element Hermes and the piggyBac element into the S. cerevisiae genome using next-gen sequencing to characterize large numbers of insertion sites. By comparing Hermes insertion into the genome in vivo and also into naked yeast DNA in vitro, we have found that chromatin structure profoundly influences integration: Hermes insertion in vivo is highly biased towards nucleosome-depleted regions. A similar pattern is observed for piggyBac.
Title: Control and Dynamical Systems, BioEng, EEng Caltech
This talk will review recent progress on developing a "unified" theory for complex networks involving three elements: hard limits on achievable robust performance (tradeoffs, misnamed "laws"), the organizing principles that succeed or fail in achieving them (architectures and protocols), and the resulting high variability data and "robust yet fragile" behavior observed in real systems and case studies (behavior, data). Insights into what universal laws, architecture, and organizational principles might look like can be drawn from three converging research themes. First, the organizational principles of organisms and evolution are becoming increasingly apparent. Richly detailed mechanistic explanations of biological complexity, robustness, and evolvability point to universal principles and architectures. Second, while the components differ and the system processes are far less integrated, advanced technology's complexity is now approaching biology's and there are striking convergences at the level of organization and architecture. Determining what is essential about this convergence and what is merely historical accident requires a deeper understanding of architecture - the most universal, high-level, persistent elements of organization - and protocols. Protocols define how diverse modules interact, and architecture defines how sets of protocols are organized. Third, new mathematical frameworks for the study of complex networks suggests that this apparent network-level evolutionary convergence within/between biology/technology is not accidental, but follows necessarily from their universal system requirements to be fast, efficient, adaptive, evolvable, and most importantly, robust to perturbations in their environment and component parts. The universal hard limits on systems and their components have until recently been studied separately in fragmented domains of physics, chemistry, biology, communications, computation, and control, but a unified theory is needed and appears feasible. We have the beginnings of the underlying mathematical framework and also a series of case studies in classical problems in complexity from statistical mechanics, turbulence, cell biology, human physiology and medicine, neuroscience, wildfire ecology, earthquakes, economics, the Internet, and smartgrid.
 Alderson DL, Doyle JC (2010) Contrasting views of complexity and their implications for network-centric infrastructures. IEEE Trans Systems Man Cybernetics-Part A: Syst Humans 40:839-852.  Chandra F, Buzi G, Doyle JC (2011) Glycolytic oscillations and limits on robust efficiency. Science, Vol 333, pp 187-192.  Gayme DF, McKeon BJ, Bamieh B, Papachristodoulou P, Doyle JC (2011) Amplification and Nonlinear Mechanisms in Plane Couette Flow, Physics of Fluids, in press (published online 17 June 2011)  Doyle JC, Csete MC (2011) , Architecture, constraints, and behavior, PNAS, in press
Title: Balance between error-free and error-prone DNA repair in the immunoglobulin loci
Somaatic hypermutation and class switch recombination of immunoglobulin genes are initiated by activation-induced deaminase (AID), which deaminates cytosine to uracil. Uracils can also be repaired by base excision repair, raising the question of whether this pathway competes with the mutagenic pathway for uracils. To unravel the interplay between repair and mutagenesis, we manipulated the levels of repair proteins in B cells, and observed a surge of hypermutation and DNA breaks. The data support an active role for base excision repair in the immunoglobulin loci, and favor the interpretation that the plethora of AID-induced damage allows some uracils to escape faithful repair and become substrates for mutagenesis.
Title: Integrons: Frameworks for information sharing
Bacterial genomes evolve by mutation, by exchange of homologous segments with close relatives and by the acquisition of exogenous DNA. The last two of these processes lead far more rapidly to divergence within a lineage than the mutational route but some of the most common tools used to study of divergence within specific bacterial clones, e.g. MLST and SNP analysis and even comparisons of unfinished genome sequences, do not take them into account. The study of mobile genetic elements has the potential to greatly enhance our understanding of the enormous diversity that can be found in a single species. Plasmids, integrating elements, transposons, insertion sequences, integrons and gene cassettes, CR elements, and others less well understood all contribute but their identification and analysis is largely neglected. How we can harvest the information contained in this accessory genome and utilize it further our understanding of genome diversification will be addressed with particular reference to the role of integrons and the gene cassettes they harbour.
Title: Distribution and Mechanisms of Meiotic Recombination
Recombination in meiosis is initiated by programmed DNA double-stranded breaks (DSBs). Each mouse spermatocytes typically makes 200?250 DSBs. Meiotic DSBs are crucial for chromosome segregation and ultimately for fertility because without them, homologous chromosomes fail to locate each other in the nucleus and to pair. Not all chromosomes are on a level playing field with regard to the number of DSBs they receive. Pairing of the largely non-homologous sex chromosomes in males (the X and Y) can only be mediated by DSBs in a small region of homology, the pseudoautosomal region (PAR). Futhermore, small autosomal chromosomes enjoy fewer DSBs than large ones. In this talk, I will review mechanisms we recently discovered that ensure successful X-Y recombination in mice - these include an unusual higher-order chromatin structure in the PAR, as well as temporal and genetic control that is distinct from "bulk" autosomal recombination (Science 331, 916-20). I will also summarize ongoing work on the numerical DSB requirements for the pairing of small versus large chromosomes. Our data from a transgenic mouse model indicate that when overall DSB levels are reduced, small chromosomes are more prone to aberrant pairing. This indicates that the seemingly excessive number of DSBs (>200) in wild-type meiosis serves an important function: it ensures that even the smallest chromosomes receive enough DSBs to find their homologous pairing partner.
Title: Indirect selection of local mutation rates
[ALTERNATIVE TITLE: Indirect selection of implicit protocols for mutation]
Suggestions that evolvability could be a selectable trait are routinely dismissed by many evolutionary biologists, who presume that "mutation rate" can be meaningfully summarized as a single statistic and that natural selection must minimize this rate because "the vast majority of mutations with observable effects are deleterious" . However, such simplistic analysis ignores those numerous mutational mechanisms whose site- or sequence-specificity creates opportunities for indirect selection to shape evolutionarily advantageous protocols for variation . Several common genomic patterns, including transposable elements (TEs) and simple sequence repeats (SSRs), display a striking propensity toward particular styles of mutation. Significantly, these patterns generate mutant alleles which retain their propensity to mutate. Hence natural selection that favors a beneficial mutant must also, indirectly, favor its mutational style as well. Since alleles representing localized genetic patterns can vary in their implicit propensity for mutation as well as their explicit effects on phenotype, indirect selection should plausibly shape such variation into genetic patterns, or "implicit protocols for mutation," that can enhance the probability of adaptively advantageous variation. This DIMACS Conference highlights a number of such putative protocols. For example, replication slippage of SSRs provides a "tuning-knob protocol" that enables readily-reversible adjustment for practically any genetic function . The "copy-and-paste protocol" of TEs can introduce duplicate genes or regulatory elements, thereby providing opportunities for adaptive innovation . Curiously, a deeper protocol appears to unite TEs and SSRs, with TE activity generating new SSRs while SSRs provide sites for further TE insertion. The creative power of direct natural selection is evident in diverse adaptive phenotypes; we should not be surprised that indirect selection wields similar power to create implicit protocols for variation.
1. Baer CF et al. (2007) Nature Rev. Genet. 8, doi:10.1038/nrg2158. 2. Kashi Y, King DG (2007) Nature Rev. Genet. 8, doi:10.1038/nrg2158-c1. 3. Kashi Y, King DG (2006) Trends Genet. 22, doi:10.1016/j.tig.2006.03.005. 4. Oliver KR, Green WK (2009) BioEssays 31, doi:10.1002/bies.200800219.
Title: Balancing eukaryotic replication asymmetry with replication fidelity
Coordinated replication of duplex DNA is asymmetric, with continuous leading strand replication preceding discontinuous lagging strand replication. We are investigating whether this replication asymmetry is relevant to genome stability in budding yeast by performing biochemical and genetic studies of mutator derivatives of Pols alpha, delta and epsilon, the three major DNA polymerases that duplicate the nuclear genome. This talk will briefly consider observations published to date that are consistent with a model wherein Pol epsilon is the primary leading strand replicase and Pol delta primarily participates in replicating the lagging strand template. We are also using the three mutator polymerases to determine if the efficiency of mismatch repair (MMR) of replication errors in vivo varies depending on which polymerase generates the mismatch, the identity of the mismatch, the DNA strand containing the mismatch and the local sequence context. An example will be shown wherein Msh2-dependent MMR efficiencies for mismatches made by Pol alpha in vivo are higher than efficiencies for the same mismatches when made by Pol delta. Thus, replication and MMR share a special reciprocal relationship, wherein errors made by the least accurate replicase (proofreading-deficient Pol alpha) are those that are most efficiently repaired by MMR. Given the role of Pol alpha in initating Okazaki fragments, this observation is consistent with the close proximity and possible use of 5´ ends of Okazaki fragments for strand discrimination, which could increase the probability of Msh2-dependent MMR by 5´ excision, by a Msh2-dependent strand displacement mechanism, or both. Finally, evidence will be presented on a second replication fidelity issue, discrimination against rNTP incorporation into DNA. Biochemical and genetic studies indicate that yeast Pols alpha, delta and epsilon incorporate a large number of ribonucleotides into the yeast nuclear genome during each round of replication, that these are removed by RNaseH2-dependent repair, and that defective repair results in cellular stress, including strand-specific genome instability initiated when topoisomerase 1 nicks DNA containing ribonucleotides. Because structural studies indicate that ribonucleotides in DNA alter helix parameters, we are also exploring possible signaling functions for ribonucleotides in the nuclear genome.
Title: RNA-mediated epigenetic inheritance in Oxytricha
RNA, normally thought of as a conduit in gene expression, has a novel mode of action in ciliated protozoa. This opportunity for RNA-mediated epigenetic inheritance is profound in the ciliate Oxytricha, which deletes 95% of its germline genome through global DNA rearrangements. These events shatter its germline chromosomes and then sort and reorder the hundreds of thousands of pieces remaining. Maternally-inherited long, non-coding RNAs provide 1) instructions for sequence reordering, 2) a template for RNA-guided DNA repair that can transmit somatic mutations to the next generation (Nature 2008, 451:153-8), and 3) information to regulate DNA and chromosome copy number (PNAS 2010, 107:22140-4). The mechanism for all of these actions bypasses the traditional mode of inheritance via DNA, hinting at the power of RNA molecules to sculpt genomic information. This suggests that Oxytricha's somatic genome is truly an epigenome, formed through templates and signals arising from the previous generation, and offering a mechanism for the stable inheritance of acquired, spontaneous somatic substitutions, without altering the germline.
Title: DNA structure and regulated region-directed recombination
DNA has considerable potential to form structures other than canonical Watson-Crick duplexes, with profound consequences for human health and disease. This is dramatically illustrated by the contribution of triplet repeat expansions to neurodegenerative disease. Other simple sequence motifs prone to form structures are guanine runs (G-runs), which spontaneously assemble in vitro into structures stabilized by G-quartets, referred to as G4 DNA. G4 motifs are abundant in the immunoglobulin switch regions, where programmed genomic instability must occur rapidly in the course of the immune response. The factors and mechanisms that promote instability at G4 motifs in the immunoglobulin genes are not B cell specific but ubiquitous, and they can promote instability elsewhere in the genome, although on a less rapid time scale than at the immunoglobulin genes. G4 motifs are also abundant elsewhere in the genome, including the telomeres, the ribosomal DNA, and in specific regions of genes ? particularly promoters ? and in specific classes of genes ? most notably oncogenes. Transcription especially puts these regions at risk, as it induces formation of unusual structures, G-loops, which are mutagenic are recombinogenic. Our recent experiments show that core factors of the transcription apparatus specifically recognize G4 structures and resolve them. Thus the genomic structure of G-rich genes and promoters enables their rapid evolution, while the transcription apparatus promotes their stability.
Title: Rapid Generation of Diversity in Bacterial Contingency Genes:
Bacterial pathogens face stringent challenges to their survival because of the many unpredictable, often precipitate, and dynamic changes that occur in the host environment or in the process of transmission from one host to another. Bacterial adaptation to their hosts involves mechanisms for sensing and responding to external changes and/or the selection of variants that arise through mutation. I will review how bacterial pathogens exploit localized hypermutation, through polymerase slippage of simple sequence repeats (SSRs), to generate phenotypic variation and enhanced fitness. These SSRs are located within the reading frame or in the promoter region of a subset of genes, often termed contingency genes, whose functions are usually involved in direct interactions with host molecules. The evolution within bacterial genomes of such contingency genes with high mutation rates facilitates the efficient exploration of phenotypic solutions to unpredictable aspects of the host environment while minimizing deleterious effects on fitness.
Title: Exogenes: accelerated evolution of venom peptides from cone snails and other venomous molluscs
There are >10,000 species of venomous molluscs, the best known being the cone snails. The latter comprise ~500 species, and constitute only a minor fraction of the total biodiversity. It is now well established that most biologically-active components of snail venoms are small disulfide-rich peptides; every venomous mollusc has a large complement (>100) of such peptides, typically with 2-4 disulfide bonds.
A large majority of cone snail venom these peptides are encoded by only a few gene superfamilies that are subject to accelerated evolution; consequently, each Conus species has its own distinct complement of venom peptides. There is remarkable signal sequence conservation within a gene superfamily, as well as of cysteine residues in the mature peptide region. Hypermutation of all amino acids between cysteine residues is typically observed in cone snail gene superfamilies (Olivera et al., 1999). Thus, as species diverge from each other, there is essentially no overlap at the molecular level in the complement of venom peptides from one cone snail species to the next, but the structural framework of venom peptides in the same superfamily is conserved. We previously referred to genes encoding the venom peptides as examples of "exogenes", with the gene product not acting endogenously, but on a different organism. A biological rationale for why exogenes are subject to accelerated evolution was previously discussed (Olivera, 2006).
There are at least 6 major clades of other venomous molluscs that exceed Conus in terms of the number of species. Although in most cases, the gene superfamilies in these clades have diverged considerably from those found in cone snails, gene superfamilies with many representatives are found in these venoms as well. However, in at least one major branch of venomous snails (e.g., the clathurellids), there appear to be a far greater number of gene superfamilies, with each gene family encoding only one peptide in a given species (in a Conus species, there are typically dozens of peptides with different sequences in the same gene superfamily in a single venom). The factors that may determine that a few gene superfamilies dominate the complement of venom peptides in a particular clade (or whether many different gene families, each with just a few representatives prodominate) will be discussed.
Olivera, B.M., 2006. Conus peptides: biodiversity-based discovery and exogenomics, Journal of Biological Chemistry 281(42):31173-7. Olivera, B.M., Walker, C., Cartier, G.E., Hooper, D., Santos, A.D., Schoenfeld, R., Shetty, R., Watkins, M., Bandyopadhyay, P., and Hillyard, D.R., 1999. Speciation of cone snails and interspecific hyperdivergence of their venom peptides. Potential evolutionary significance of introns, Ann. N.Y. Acad. Sci. 870:223-237.
Title: A diverse immune gene family in the purple sea urchin. Is diversity driven by controlled genome instability?
The purple sea urchin, Strongylocentrotus purpuratus, is a long-lived echinoderm with a complex and sophisticated innate immune system. There are several large gene families that function in immunity in this species including the Sp185/333 genes. These genes show intriguing sequence diversity and encode a broad array of diverse yet similar proteins that show bacterial binding and agglutination activity. The genes are <2kB with two exons, of which the second encodes the mature protein including repeats and blocks of recognizable sequence called elements. The variable presence/absence of elements generates mosaics of sequence that, with many SNPs, results in sequence diversity among the genes, yet also maintains similarity among the shared elements. Although the sea urchin genome assembly predicts the presence of only six Sp185/333 genes, estimates suggest a family of 50 (+10) members indicating assembly difficulties for region(s) of the genome harboring these genes. To clarify the family structure, a cluster of six Sp185/333 genes was assembled from a sequenced BAC insert, which required bioinformatic and molecular analysis to generate a single insert sequence rather than unconnected fragments, and to verify that no genes had been collapsed or duplicated. The assembly was challenging for a number of reasons. The genes are tightly clustered with five positioned within 20 kB and the sixth located 14 kB away. The outer flanking genes are oriented in the same direction while the four inner genes are oriented in the opposite direction. Each gene is flanked by GA microsatellites and three segmental duplications of 99.7% identity that include three almost identical genes are surrounded by GAT microsatellites. The sequences between the GA microsatellites that include the genes and their flanking regions are much more similar to each other than are the sequences outside the microsatellites. It is noteworthy that the sequence diversity of the genes within the cluster is similar to the diversity among genes with unknown genomic organization that were randomly cloned and sequenced from individual sea urchins. This genomic region harboring the cluster of Sp185/333 genes suggests significant levels of gene conversion, recombination, local and ectopic duplication, and general genomic instability. Yet, no pseudogenes with altered reading frame have been identified. The results present questions about possible mechanisms for promoting gene diversification while blocking both the formation of pseudogenes and the homogenization of the entire region from gene conversion. It is not clear whether the microsatellites are involved in maintaining the structure and sequence diversity of the region. Future work will require an evaluation of the entire gene family to begin to understand how the diversity of this immune response gene family is maintained and promoted within the context of the arms race with pathogens.
Title: Metabolite-dependent alteration of RNA folding, splicing and function
Riboswitches are structural regions of certain mRNAs that undergo alternate folding upon sensing their cognate metabolite and thereby regulate gene expression. More than twenty different classes of riboswitches that have distinct target ligands have been identified so far. These RNA sensors are widely prevalent in bacterial genomes, and one class has been characterized in fungal and plant species.
I will discuss some of the recent discoveries on the distribution, diversity and functions of riboswitches with a particular focus on the identification of an allosteric ribozyme that is controlled by the bacterial second messenger c-di-GMP in Clostridium difficile.
Title: Multiple levels of meaning in DNA sequences, and one more
There are about a dozen known different codes in biological sequences - sequence patterns responsible for specific biomolecular functions. The codes overlap due to their degeneracy and, thus, interact. One example is interaction between triplet code, gene splicing code, the code for amphipathic alpha-helices, and chromatin code. The nucleosomes are preferentially located on the exon sequences, especially at the ends of the exons, thus, protecting the splice junction. Moreover, the orientation of guanines of the GT and AG junctions on the surface of the nucleosomes is such that the vulnerable N9 positions are located closest to the histones. In protein sequences one finds numerous traces of tandem repeats, originally formed by triplet expansions. This constitutes the genome inflation code. Life that is believed to originally emerge as simple tandem repetition of certain aggressive triplets, apparently, continued to emerge here and there along genomes, due to the expansions. Owing to accumulated mutations these repeats are mostly not recognized anymore as such, leaving, however, their well detectable traces.
Title: Control and function of translesion DNA polymerases
Translesion DNA synthesis (TLS) carried out by specialized DNA polymerases is an important mechanism of DNA damage tolerance in all domains of life and is responsible for most of the mutations that result from DNA damage. In both bacteria and eukaryotes, the location and timing of TLS DNA polymerase function is controlled by complex sets of protein-protein interactions. I will summarize the regulation of the bacterial TLS polymerases encoded by dinB (DNA pol IV) and umuDC (DNA pol V) and discuss new insights we have gained into why DinB is lethal when overproduced and their implications for the action of bactericidal antibiotics. The mutagenic branch of translesion DNA synthesis eukaryotes is carried out by Rev1 and DNA pol zeta (Rev3/Rev) and is controlled in part through interactions between the Rev1, Rev3, and Rev7 proteins. I will discuss evidence that interfering with the functioning of Rev1/3/7 system could potentially improve cancer therapy.
Title: The mechanisms underlying mutagenesis in E. coli, the p53 tumor suppressor gene, and somatic hypermutation
The majority of researchers focus upon in vitro studies, for example, in which enzyme concentration is rate-limiting for, and determines the rate of a reaction. However, in vivo analyses indicate that substrate availability may be rate-limiting. This possibility is considered with respect to the mechanism of mutagenesis in both prokaryotes and eukaryotes.
Title: Adaptive Evolution via Repeated Deletions in a Vertebrate Enhancer
Stickleback fish have undergone one of the recent and widespread evolutionary radiations on earth, providing a powerful system for identifying genetic and genomic mechanisms that underlie repeated evolutionary change in vertebrates. At the end of the last ice age, migratory marine sticklebacks colonized thousands of new streams and lakes generated in formerly ice-covered regions. Newly established populations have subsequently evolved dramatic morphological, physiological, and behavioral changes in 10,000 generations or less. Many interesting phenotypes have evolved repeatedly in response to siimilar ecological conditions, including repeated loss of pelvic hindfins in multiple populations from Alaska, British Columbia, Scotland, and Iceland. Genetic crossses between marine and freshwater populations suggest a relatively simple genetic basis for this dramatic change in skeletal anatomy, based on recurrent mutations at a major developmental control gene called Pitx1. Molecular studies suggest that most pelvic loss mutations consist of small deletions of a few hundred to a few thousand base pairs. These mutations remove a tissue-specific hindlimb regulatory enhancer from the Pitx1 locus, making it possible to produce a dramatic morphological change at a specific anatomical site in the body, while preserving overall viability and fitness. We are currently testing whether the unusual spectrum of recurrent deletion mutations observed for the adaptive alleles in this system is influenced by inherent DNA fragility at the Pitx1 locus. With the base pairs responsible for recurrent pelvic evolution now identified, it should soon be possible to measure the absolute rate of similar deletion mutations arising in the germ line of natural populations, and to test whether mutation rates differ in different enviornments, or among populations that have or have not evolved similar phenotypes in response to similar ecological conditions.