Parallel Processing in Genome Mapping and Sequencing

Charles R. Cantor

Center for Advanced Biotechnology, College of Engineering, Boston University, Boston MA 02215


Dramatically faster speeds for DNA sequencing will be needed if areas like clinical diagnostic sequencing, surveys of human diversity, and ecological and environmental surveys are ever to become realistic pursuits. It does not seem likely that information storage or data analysis will ever be a limiting feature in such studies. At present, the limitation is the actual rate at which DNA sequence data can be obtained experimentally. We are exploring faster ways of DNA sequencing by examining methods Eke sequencing by hybridization (SBH) which can be executed in a highly parallel fashion so that many samples or probes are analyzed simultaneously. A key feature in optimizing such strategies is likely to be data quality. To extract the maximum amount of data from a large sample array requires multiplexed strategies in which many probes are used at once. This is likely to require considerably reducing the cross talk or cross hybridization between different samples and probes. We will demonstrate how supplementing physical hybridization with enzymatic steps can increase the rate of data acquisition and also greatly improve the ability to discriminate against mismatches.

While conceived as a direct DNA sequencing procedure, the SBH format we use may be even more useful as a device for the rapid preparation of DNA samples for fast serial methods like capillary arrays or mass spectrometry. For example, an array of only 1024 probes could capture and then generate sequence ladders from any arbitrary DNA sequence. Some of the ways in which this sort of capture device might be used in DNA sequencing will be illustrated. With fast serial methods, the rate of production of arrays of samples may become a limiting step. Some of the probe designs we are using would allow direct production of probe arrays by replication of a master array and transfer to a new surface. This would allow the preparation of large numbers of probe arrays, and thus it would facilitate the parallel preparation of many sample arrays.

As sequencing speeds increase, the rate limiting step is likely to shift from the production of DNA sequence data to the acquisition of samples worth sequencing. In genomic DNA sequencing, the samples needed usually derive from physical maps consisting of contiguous arrays of clones suitable for direct sequence determination. Most past construction of physical maps has involved very tedious top down or bottom up strategies. We will show how more efficient strategies can be developed by using complex mixtures of probes that allow very efficient ordering of overlapping clones.


Program
DIMACS Homepage
Contacting the Center
Document last modified on March 28, 2000.