DIMACS Seminar on Math and CS in Biology


A computational method for inferring evolutionary trees and an application to Indo-European family of languages


Tandy Warnow
Department of Computer and Information Science
University of Pennsylvania


11:00 AM
Tuesday, February 20, 1996


The determination of evolutionary trees is a major endeavor within biology and historical linguistics, but current techniques that are used to generate trees are limited either by computational problems or through the use of methods which lose information present in the primary data.

In this talk we will present a method for efficiently inferring evolutionary trees in Historical Linguistics which avoid the difficulties that have made this analysis intractable. We use primary data, and show that an appropriate optimization problem (based solidly upon traditional historical linguistic methodology) can be solved exactly for these data.

We have applied this method to the problem of inferring the evolutionary history for the Indo-European (IE) family of languages, and have made several surprising and strikingly strongly supported findings. We analyzed the IE data with particular interest in determining whether our new methodology could lay to rest the debate on two longstanding conjectures: the {\em Indo-Hittite hypothesis} and the {\em Italo-Celtic hypothesis.}

Our analysis indicates significant support for the Indo-Hittite hypothesis and preliminary (albeit weak) support for the Italo-Celtic hypothesis. It also proposes a reasonable explanation for the surprising dual allegiance of Germanic. Most importantly, it provides a firm methodology by which linguists can test the consistency of their judgements, thus enabling further research into problematic data.

This is joint work with linguists Donald Ringe and Ann Taylor of the University of Pennsylvania.

