Traditionally gene recognition programs try to minimize some error measure dependent both on the number of false positive and false negative predictions. Thus their performance is characterized by average correlation between predicted and actual genes or similar measures. However, a predicted complete gene that is approximately 80% correct (the current state of art) has only a limited value to an experimental biologist. A more useful result would be a less ambitious, but much more reliable prediction. We are developing two algorithms aiming at such prediction. The first algorithm is gene recognition in a situation when a related protein is known. It is based on the recently developed spliced alignment technique and currently provides at least 98% accuracy in recognition of human genes given mammalian relatives. If a related protein is unavailable, exact resolution of the exon-intron structure by purely statistical is unattainable. However, it is possible to make reliable partial predictions that can be immediately used for experimental gene structure verification. In this vein, we propose a highly specific algorithm for contruction of oligonucleotide probes and PCR primers, which uses only very simple statistical parameters and thus can be used not only for analysis of mammalian genome, but for much less studied genomes as well.
This is joint work with A.Mironov, P.Pevzner and M.Roytberg