Kevin Karplus, University of California - Santa Cruz
"Estimating Statistical Significance for Reverse-sequence Null Models"

Since 1998, we have been using a somewhat unusual null model with our hidden Markov models: the reverse-sequence null model. This null model uses the same computation as the stochastic model itself, but applies it to the reversed sequence:

        P(x| NULL) = P(reversal(x) | M)
This reverse-sequence null model has been very effective at cancelling "noise" signals such as compositional bias and helicity in our fold-recognition tests.

To use a stochastic model effectively in database search, we need a way to compute the statistical significance of a score (usually expressed as the E-value---the expected number of hits that good by chance). In this talk I'll derive a distribution for the E-values for reverse-sequence null models, and show how to fit the parameters of the distribution.

Results on recent fold-recognition tests will be shown.

Back