[ssml] Finding Matches using N-term & C-term sequences
Dan Bolser
dmb at mrc-dunn.cam.ac.uk
Wed Dec 10 05:08:30 EST 2003
Hello
++ Kevin Karplus--
>
> You don't want to compute E-values for
> the 4 small searches separately---almost nothing will come up as significant. You
> want to combine the scores from the separate searches. You can get a rough
> approximation by
> saying that the p-value for finding the same sequence from query A and query B is
> roughly the product of the p-values (this isn't quite
> right, but is probably close enough for your purposes).
As you mention this technique, can it be applied to multiple hits from a profile
method as the profile is being itteratively developed?
I know that traditionally this technique was used for family identification.
Basically: the more hits you find to your sequence within one family, the more
probable it is that your sequence belongs to that family.
The product of the p-values (the 'family probability') has to be corrected for the
sequence diversity of your family (to weight down identical hits from identical
sequences). Has this been done at the profile-profile level?
Cheers
Dan.
More information about the ssml-general
mailing list