The goal of physical mapping is to identify the locations of landmarks along a DNA molecule such as a chromosome. Physical mapping usually entails the construction of a clone library: a collection of overlapping fragments, or clones, which cover the molecule several times. These clones are much too large to be sequenced directly, but they can be partially characterized experimentally. Given such a partial characterization, the problem is to mathematically reassemble the ordering and arrangement of the clones and of other associated landmarks.
The most common experimental techniques are multiple complete digestion and STS mapping. In multiple complete digestion a clone is digested with two or more restriction enzymes and the sizes of the restriction fragments are measured by gel electrophoresis. In STS mapping the polymerase chain reaction is used to test whether certain DNA sequences called probes occur on the clone. Each of these probes is assumed to occur exactly once on the DNA molecule, and its site of occurrence is called a Sequence tagged Site (STS)
We shall describe a battery of programs for STS mapping constructed by Farid Alizadeh, Deborah Weisser, Geoffrey Zweig and the speaker. Given a matrix of incidences between probes and clones these programs seek to determine the order of the probes along the DNA. The problem is easily solved in the absence of measurement errors. In the presence of errors the problem is more difficult, but we have attacked it successfully by combining local search with preprocessing techniques for screening out some of the errors in the data.
In closing, we shall describe some preliminary approaches to algorithms for multiple-complete-digest mapping.