DIMACS Working Group on Phylogenetic Trees and Rapidly Evolving Diseases

First Meeting: The DIMACS Symposium on Phylogenetics and Rapidly Evolving Pathogens
Date: September 7 - 8, 2004
Location: Aotea Centre, Auckland, New Zealand
Allen Rodrigo, University of Auckland, a.rodrigo@auckland.ac.nz
Mike Steel, University of Canterbury, M.Steel@math.canterbury.ac.nz
Presented under the auspices of the Special Focus on Computational and Mathematical Epidemiology.

Information on the International Conference on Bioinformatics 2004 can be found at: www.incob.org.

Alexei Drummond, Department of Zoology, University of Oxford, South Parks Road, Oxford, UK
e-mail: alexei.drummond@zoology.oxford.ac.uk

Title: Modeling Virus Evolution and Population Dynamics

Modeling population dynamics and molecular evolution of viruses is of central important to understanding diseases caused by pathogens such as HIV-1, human Influenza A and Hepatitis C. Furthermore, virus evolution is an excellent framework for understanding the interplay between basic evolutionary processes such as mutation, drift, selection and recombination and population dynamics. In this talk I will outline some recent advances in the modeling of virus evolution as well as some tests of the adequacy of these models. I will also outline the nascent field of "phylodynamics" in which virus genetic variation and virus population dynamics are considered within a single coherent framework. Finally I will outline some open questions and problems in the field of virus evolutionary dynamics.

Greg Ewing(1,2)*, Geoff Nicholls(1,3) and Allen Rodrigo(1,2)
*e-mail: gewi001@ec.auckland.ac.nz
  1. Bioinformatics Institute, School of Biological Sciences, University of Auckland, Private Bag 92019, Auckland, New Zealand
  2. Alan Wilson Centre, Massey University, Palmerston North, New Zealand
  3. Department of Mathematics, University of Auckland, Private Bag 92019, Auckland, New Zealand

Title: The Structured Coalescent With Serially-Sampled Sequences

We present a Bayesian approach for simultaneously estimating mutation rate, population sizes and migration rates in an island structured population, using temporally and spatially disjoint sequence data. We estimate migration history and rates from the DNA sequences taken at different times and from different tissues. We fit a model of the joint genealogy and migration process using MCMC over a space of trees labeled with migration events. Since the number and timing of migration events is unknown the MCMC must satisfy detailed balance between states in spaces of unequal dimension. A real HIV DNA sequence dataset with 2 demes, semen and blood, is used as an example to demonstrate the method by fitting asymmetric migration rates and different population sizes. This dataset exhibits a bimodal joint posterior distribution, with modes favouring different preferred migration directions.

T Leitner(1)*, F Salvatori(2), C Ripamonti(2), C-S Tung(1), E Halapi(3), M Jansson(3), G Scarlatti(2)
*e-mail: tkl@lanl.gov
  1. Theoretical Biology and Biophysics, Los Alamos National Laboratory, Los Alamos, NM, USA
  2. Unit of Immunobiology of HIV, DIBIT, San Raffaele Scientific Institute, Milan, Italy 3MTC, Karolinska Institute, Stockholm, Sweden

Title: Recombination, 3D Network Structure, Multiple Transmission and Subpopulation Frequency Shifts in a Mother-to-child Transmission Case

Mother-to-child transmission displays a special case of transfer of an infectious agent. The passage of the agent can take several routes into its new host, and it may do so during a long period of time. Here, we report on a case where at least four different HIV-1 variants were transmitted from mother to child. We investigated the evolution of the virus from its start in the mother during pregnancy, through transmission to the child, and followed to the end of the disease progression of the child using phylogenetic and biological analyses. The natural history of the child was followed by clinical analyses. Standard tree analysis was not able to describe the mother's virus population. A new 3D network approach, however, could reveal the complex structure of this population. It was evident that multiple intrapopulation recombination events had taken place between predicted R5 and X4 genotypes. The mother then transmitted at least four different virus variants to her child. Immediately after birth a monophyletic R5 tropic population was detected, but already after one month, an independent X4R5 dual tropic population emerged and established a second subpopulation. During months 2 and 20 the frequencies of the subpopulations in the child fluctuated, and the X4R5 gradually become more dominant. Towards the end of the disease progression, at 33 months, subpopulation two completely took over and became genetically homogeneous. At 38 months the child died. Hence, the evolution in the child seemed to be mainly driven by intergroup selection, rather than recombination as in the mother. The resulting genetic race may have caused the virus to rapidly adapt into a more pathogenic form that apparently worsened the disease progression.

Geoff Nicholls, Department of Mathematics, University of Auckland, Private Bag 92019, Auckland, New Zealand
e-mail: gk.nicholls@auckland.ac.nz

Title: Measurably Evolving Memes: New Models and Inference Tools for Meme-trait Data

We were inspired by some data (Gray & Atkinson 2003, Nature v426 pp435-439) measuring the evolution of human language. The data is essentially binary trait data in which languages play the role of individuals and the descriptive traits are homologous words. The data record the values of around 2500 traits in 87 Indo-European languages, and include time-stamped fossil languages (such as Hittite). We setup a model of trait-evolution. The model can be thought of as a stochastic variant of Dollo parsimony in which language histories are trees. They are modelled via a branching process of sets. Set elements are trait labels, created (uniquely) at constant rate and destroyed at constant per capita rate. The birth times of traits recorded in sampled languages are biased towards the tree leaves. Because the traits evolve measurably, historical records of language change can be used to determine model rates. We fit the model to data using our own MCMC code.

Roderic D. M. Page, Institute of Biomedical and Life Sciences, University of Glasgow, Glasgow G12 8QQ, United Kingdom
e-mail: r.page@bio.gla.ac.uk

Title: Viruses, Clocks, and Cospeciation

Fully understanding the evolutionary history of a virus-host interaction requires reconstructing the history of that association. Much of the methodology for doing this has been developed in the context of animal parasites (such as the celebrated gopher-louse example). Here I review recent developments in comparing host and parasite phylogenies, with an emphasis on the role estimates of evolutionary time can play in reconstructing host-parasite evolutionary history. I then outline some methodological problems facing this research, such as the assumption that host and parasite evolutionary history is tree-like, the lack of adequate models of cospeciation, and the reliance on a single estimate of host and parasite tree.

Bruce Rannala, Department of Medical Genetics, University of Alberta, 839 Medical Sciences Building Edmonton, Alberta T6G 2H7, Canada
e-mail: brannala@ualberta.ca

Title: Imperfect Molecular Clocks

The original "molecular clock hypothesis" proposed that amino acid substitutions (and by extension nucleic acid substitutions) occur according to a stochastic process with a relatively constant rate per year across species. This led to the development of methods for estimating species divergence times, etc, using molecular data and one or more independently dated fossil "calibration points." Statistical tests of the molecular clock indicate that it is often violated. Substitution rates can vary across lineages due to natural selection, generation time effects, population structure, etc. Accounting for variation in substitution rates is especially important when considering rapidly evolving species such as pathogenic bacteria or viruses that are under constant selective pressure from the host immune system and occur in large fragmented populations. In recent years, several statistical methods have been proposed for estimation of divergence times using an imperfect molecular clock for which the rate of substitution is allowed to vary across lineages according to a stochastic model. Models of across-lineage substitution rate variation are described and contrasted highlighting their potential strengths and weaknesses.

Benjamin D. Redelings* and Marc A. Suchard, Department of Biomathematics, David Geffen School of Medicine at UCLA,Los Angeles, CA 90095-1766
*e-mail: bredelin@ucla.edu

Title: Incorporating Indel Information Into Phylogeny Estimation for Rapidly Emerging Diseases

Phylogenies of rapidly evolving pathogens can be difficult to resolve because of the small number of substitutions that accumulate in the short times since divergence. In order to improve resolution of such phylogenies we propose using indel information in addition to substitution information by simultaneously estimating the alignment and phylogeny. We accomplish this using a joint reconstruction model in a Bayesian framework and draw inference using Markov chain Monte Carlo (MCMC). We introduce a novel Markov chain transition kernel that improves computational efficiency by proposing non-local topology rearrangements and by block sampling alignment and topology parameters. We demonstrate the relevance of indel information in examples drawn from HBV, HIV, and SIV and discuss the importance of taking alignment uncertainty into account when using such information. We also develop a non-codon-based technique that avoids alignments with frame-shift mutations when analyzing coding sequences. We compare the performance of this non-codon-based technique with codon-based models in handling molecular sequences from rapidly evolving diseases.

Allen Rodrigo, Bioinformatics Institute, University of Auckland Auckland, New Zealand
e-mail: a.rodrigo@auckland.ac.nz

Title: Measurably Evolving Populations: Where to Next?

A Measurably Evolving Population (MEP) is defined as any population for which sequences sampled at different times show a statistically significant increase in substitutions. In this talk, I explore the consequences of this definition, and review the advances made in the development of methods for analysing the evolutionary genetics of MEPs. I conclude by examining what remains to be done, and the extent to which these goals are achievable.

Mike Steel, Biomathematics Research Centre, University Of Canterbury, Christchurch, New Zealand
e-mail: m.steel@math.canterbury.ac.nz

Title: Phylogenetic Networks and Biodiversity

Phylogenetic trees are central to evolutionary molecular biology, and associated areas such as molecular epidemiology. In this talk I describe some recent and new theory concerning the use of sequence-based and character-based data for reconstructing and analying phylogenetic trees and networks, and indicate how they may be useful for future applications.

Marc A. Suchard, Department of Biomathematics, David Geffen School of Medicine at UCLA, Los Angeles, CA 90095
e-mail: msuchard@ucla.edu

Title: Resolving the Intra-host Evolution of Rapidly Evolving Pathogens: Common Patterns and Shared Indel Information Under a Bayesian Framework

Intra-host phylogenies of rapidly evolving pathogens can be difficult to resolve because of the small number of substitutions that accumulate in the short time since infection. Nonetheless, gaining an understanding of the intra-host evolution has implications for the development of novel medical therapies. I will discuss Bayesian hierarchical phylogenetic models (HPMs) as one approach to improving resolution by analyzing pathogen evolution across multiple infected subjects simultaneously. The hierarchical framework both pools information across subjects to improve intra-host estimate precision and permits estimation and hypothesis testing of across-subject-level parameters while taking appropriate consideration of subject variability and uncertainty. The HPM assumes standard hierarchical priors on continuous evolutionary parameters and a multinomial prior structure on topologies across subjects. I will illustrate the utility of the HPM with a question regarding HIV-1 compartmentalization by examining subjects whose viral populations exhibit a shift in coreceptor utilization in response to antiretroviral therapy. CXCR4 (X4) strains emerge in all patients, but are suppressed following initiation of new regimens, so that CCR5 (R5) strains predominate. A Bayes factor approach that tests for common patterns across subjects finds that re-emergent R5 virus detectable after therapy is more closely related to predecessor R5 virus than to the X4 strains. If time permits, I will outline a second approach to improving the resolution of intra-host phylogenies. This approach employs shared indel information in addition to substitution information between sequences by simultaneously estimating the alignment and phylogeny. I accomplish this using a joint Bayesian reconstruction model that provides measurements of uncertainty on the alignment, the topology and all other evolutionary parameters.

Jeff Thorne, Bioinformatics Research Center, Box 7566 North Carolina State University Raleigh NC 27695-7566
e-mail: thorne@statgen.ncsu.edu

Title: Making Evolutionary Inferences When There is Dependent Change Among Sequence Positions, With Emphasis on Protein Tertiary Structure and Viral Antigenicity

The relationship between genotype and phenotype is central to evolution. Unfortunately, probabilistic models of sequence evolution typically ignore phenotype when describing how genotypes change. Two general difficulties are largely responsible for this shortcoming of widely used models. The first problem is that the relationship between genotype and phenotype tends to be difficult to quantify. Ideally, the set of possible genotypes could be numerically ranked according to the fitness of the phenotype that they specify. If this could be accomplished, rates of sequence changes that improve fitness could be modelled as being high whereas rates of changes that decrease fitness would be low. The second problem is that the biologically plausible incorporation of fitness into models of sequence evolution would likely produce dependence among nucleotide substitutions that occur at different sequence sites. A consequence of this dependence is the computational infeasibility of conventional methods for making evolutionary inferences. We have been developing procedures for statistical inference when dependent change among sites is built into evolutionary models. We illustrate these techniques with results from our studies on the evolutionary relationship between protein tertiary structure (phenotype) and protein-coding DNA sequences (genotype). We then speculate on the potential for similar statistical techniques to illuminate the study of viral evolution. Specifically, we are beginning to explore the potential for these techniques to quantify the effects of the host immune system on shaping viral genomes.

Previous: Program
Working Group Index
DIMACS Homepage
Contacting the Center
Document last modified on November 23, 2004.