Institute of Protein Research, Russian Academy of Sciences, Pushchino, Moscow Region, 142292, Russia
In standard SBH, reconstruction of a DNA sequence relies on short-range information provided by the sequences of k-mers that are contained in the DNA strand. As a result, the presence of non-unique (k-l)-long overlaps between the k-mers leads to reconstruction ambiguities. Recently we have suggested a method for obtaining long-range sequence information that eliminates most of the reconstruction problems inherent in SBH [A.B.Chetverin and F.R.Kramer, BioSystems 30, 215-231 (1993); U.S. Patent Application 07/838,607 (1992)].
Unlike standard SBH, where the data are collected in one step by hybridizing a DNA fragment to all possible k-mers, the new method (Sequencing by Nested Strand Hybridization, or SNSH) consists of two steps. At the first step, all possible nested strands of a DNA fragment are generated in separate wells of a preparative oligonucleotide array, where the W-truncated strands are sorted by their W-terminal k-mers. At the second step, the strands that have occurred in each well are hybridized to an analytical oligonucleotide array. This reveals the k-mers that precede each particular k-mer in the DNA sequence, and thus allows the k-mers to be put in their correct order even if many of them are repeated in the sequence.
Furthermore, the collected data provide for sequence reconstruction even if the analyzed sample contains a mixture of different DNA fragments rather than one individual fragment. In this case, W-truncated strands that terminate with the same k-mer all occur in one well of the preparative array, whether they originate from one DNA fragment or from different fragments. The subsequent hybridization of the contents of each well with an analytical array reveals the k-mers that precede each particular k-mer in all fragments where it occurs. The k-mers belonging to different fragments can be distinguished by identifying the sets of k-mers that are all connected by virtue of preceding or following each other. Even if a fragment possesses only a single k-mer that does not occur in other fragments in the mixture, its constituent k-mers can be identified unambiguously, because this single k-mer is connected with all of them and only with them. The sequence of each individual fragment is then reconstructed by using the long-range information on the relative order of its k-mers.
Unlike standard SBH, SNSH does not require quantitative measurements of hybridization signals to resolve repeated sequence segments. Since in SNSH the k-mers are identified repeatedly in different wells of the preparative array, many occasional hybridization errors can be filtered out algorithmically. As opposed to SBH, ambiguities occurring in SNSH do not prevent sequence reconstruction. They appear as small islands within an otherwise unambiguously reconstructed sequence. Presence of such an ambiguity does not interfere with subsequent fragment ordering. As a consequence of its ability to sequence strand mixtures, SNSH does not require cloning of fragments. Pools of DNA fragments suitable for direct sequencing can be obtained by sorting a restriction digest of a DNA on a preparative oligonucleotide array. Finally, as a result of its ability to resolve strands that differ in only a single k-mer, SNSH allows allelic fragments of a diploid genome to be sequenced in one pool, without the need to separate them. Ultimately, SNSH can provide for a completely automatic sequencing of all fragments of a large diploid genome.
This work was supported in part by DOE Grant OR00033-93CIS016.