A Survey of Algorithmic Problems in DNA Physical Mapping

Richard Karp

University of California at Berkeley, Department of Computer Science, 591 Evans Hall, Berkeley, CA 94720

The goal of physical mapping is to determine the locations of a set of landmarks along a DNA strand such as a chromosome. These landmarks can then be used to home in on regions of interest such as genes. The landmarks may be Sequence Tagged Sites - locations uniquely identified by the occurrence of a particular sequence of bases - or clones from a clone library. The two most common physical mapping methods are STS mapping and mapping based on restriction fragment fingerprinting. In STS mapping, PCR techniques are used to determine the incidences between clones and probes, where each probe is a DNA sequence presumed to occur at a unique location. This data, which will inevitably be noisy, is then used to determine the order of occurrence of the probes along the DNA. In restriction fragment mapping, each clone in a clone library is digested with restriction enzymes, and the resulting set of restriction fragment sizes determines a "fingerprint" of the clone. On the basis of these fingerprints the overlap structure of the set of clones is determined. In situ hybridization data and genetic linkage data are often used in conjunction with the above techniques. Other methods include mapping based on hybridization of clones to short oligonucleotides, radiation hybrid mapping, deletion mapping and mapping based on clone-clone hybridization. In all cases the data is noisy and the reconstruction of the physical map from the data involves a complex combinatorial puzzle that must be tackled by heuristic or approximate methods.

In the first part of the talk we survey the most useful algorithmic techniques for physical mapping, emphasizing the Hamming distance traveling-salesman problem as a widely applicable approach. We then focus on STS mapping and describe a battery of programs being developed at Berkeley. Finally, we discuss some probabilistic models that can serve as a guide in the design of physical mapping experiments.

DIMACS Homepage
Contacting the Center
Document last modified on March 28, 2000.