[ssml] Finding Matches using N-term & C-term sequences

Wed Dec 10 05:08:30 EST 2003

Hello

++ Kevin Karplus--
>
> You don't want to compute E-values for
> the 4 small searches separately---almost nothing will come up as significant.  You
> want to combine the scores from the separate searches.  You can get a rough
> approximation by
> saying that the p-value for finding the same sequence from query A and query B is
> roughly the product of the p-values (this isn't quite
> right, but is probably close enough for your purposes).

As you mention this technique, can it be applied to multiple hits from a profile
method as the profile is being itteratively developed?

I know that traditionally this technique was used for family identification.
Basically: the more hits you find to your sequence within one family, the more
probable it is that your sequence belongs to that family.

The product of the p-values (the 'family probability') has to be corrected for the
sequence diversity of your family (to weight down identical hits from identical
sequences). Has this been done at the profile-profile level?

Cheers
Dan.