Center for Human Genome Studies, Los Alamos National Lab, Los Alamos, NM
*School of Mathematical Sciences, University of London, London, El 4NS, UK
Constructing physical maps of a genome requires screening large libraries of clones to determine the clones which contain each of a collection of sites. Screening each clone individually is not feasible for many of the libraries in use. More efficient screening strategies involve constructing pools of clones and testing each pool for the presence of a clone which contains a site. This is an application of group testing.
Unlike many other applications of group testing, the costs involved in constructing pools and repeated testing of pools encourage the use of non-adaptive screening strategies. If the number of clones containing any site is expected to be small, such strategies requiring nearly optimal numbers of pools and tests are available.
One-stage screening strategies involve testing all the constructed pools in parallel and determining the "positive" clones from the results. Such strategies can be designed to "detect" up to r positive clones, even in the presence of up to e errors. The requirements for constructing sets of pools with this property are well understood. However, how small a collection of pools suffices is related to difficult questions in coding theory. Transversal or geometric designs ("row and columns") can often be used as relatively effective, although not optimal designs for one-stage screening.
Trivial two-stage screening strategies involve testing all the constructed pools in parallel and then confirming some or all clones which could be positive given the results. This is one of the primary strategies used for screening libraries. With suitable statistical assumptions on the distribution of positive clones, essentially optimal two-stage strategies can be obtained. Our approach to constructing such strategies is based on random k-set designs. In such designs each clone is added to k randomly selected pools. In practice, the pooling strategy is improved by using a combination of heuristic and derandomization methods. The main advantage of random designs is that they are easy to generate and to adapt to the parameters of a given library.
Hierarchical screening strategies are often used as a compromise between adaptive and non-adaptive methods. These strategies involve a small number of stages (usually three), the last of which consists of confirming individual candidate positive clones. The efficiency and optimality of these designs are not well understood. There are two types of hierarchical strategies: tree-like and interleaved. Interleaved methods have been based on transversal designs and have the advantage of requiring the construction of relatively few pools.
Currently used screening strategies are suitable for libraries which have not been mapped. The overall mapping goal can, however, be achieved without obtaining complete information from each screening. Some of the information can later be recovered at little cost, while other information may be discarded. This suggests that screening strategies with relatively small numbers of pools and tests can be used. We have made some progress in evaluating strategies on this basis.
Once a library has been characterized, the screening strategy should be redesigned to take all information into account. Although adaptive methods for ideal clones are well understood, how to obtain good non-adaptive strategies for the maps which are available today is not as well understood.