A New Method for Shotgun Sequence Assembly

Ramana M. Idury and Michael Waterman

Department of Mathematics, DRB-155 University of Southern California Los Angeles, CA 90089-1113

Since the advent of rapid DNA sequencing methods in 1976, scientists have had the problem of inferring contiguous DNA sequences from sequenced fragments. Today's sequencing projects must process from a few hundred to over 1000 fragments. Here we take a collection of sequence fragments as given. The basic outline of Staden's initial program (Staden, Nucleic Acids Research, 6, 2601-2610 (1979)) is still followed and his program is in use in many laboratories. A good example of assembly programs is CAP (Huang, Genomics, 14, 18-25 (1992)) where modern computer science techniques have been used to speed up various components of the Staden method. Using a different strategy, we have developed a program that solves the basic computational problem very rapidly and accurately. Some relative timings on a Sun SPARCstation 4/670 are given next for comparison. Two sets of data obtained from different sequencing projects are used for this purpose. The first set has 165 fragments from an underlying sequence of length 11,781 bp. The time is 8.59 min. for CAP and 5.7 sec. for our program, giving a speed up of 90.42. The second set has 1905 fragments from an underlying sequence of length 44,180 bp. The time is 3.95 hrs. for CAP and 1 min. for our program, giving a speed up of 237.18.

Program

DIMACS Homepage

Contacting the Center
Document last modified on March 28, 2000.