[Biococoa-dev] Weighted sequence score
biococoa at bioworxx.com
Tue Mar 15 11:16:37 EST 2005
Am 15.03.2005 um 16:54 schrieb John Timmer:
> One of the things the alignment work has gotten me thinking about
> implementing is a weighted sequence score. This is for situations like
> splice sites or transcription factor binding sites, where you don't
> tend to
> have absolute sequences, but often have situations like "80% of the
> the first base is an A, and when it's not, 15% of the time it's a G".
> best you can do is evaluate how close a given sequence is to the ideal
> sequence - ie, the best score you can get at position 1 in the example
> above is only 80%, not 100%.
Seems to be something like sequence profiles, am i right ?
You want to now how good a sequence fits to a profile of other
sequences, which is made for example out of an alignment ?
Thats a very good thing, id like to have this as well. Could be used
for sequence searching, or phylogenetics.
> The actual implementation of this doesn't seem that hard, but the
> are driving me nuts. Three in particular:
> How to provide the user a way to set up the scoring table. My best
> would be to require a formatted string, like this:
> Does this sound good?
> The second is ambiguity. I could just require that the queried
> sequence be
> strict, but that seems pretty limiting. The question then becomes how
> evaluate a situation where the first base in the example above is
> to a purine? It shouldn't score as well as matching A, but it
> shouldn't be
> penalized as much as matching to an N. I could just require the user
> supply a value for purines, but that may become a real pain for fairly
> ambiguous sequences.
> Non-100% value totals. What if the user, for base 1, doesn't supply a
> value, meaning that 5% of the time it could be anything? I could just
> it as 5%. The problem with that is how to score position where
> 100% defined symbols, but it's compared with an N? My gut response
> would be to give a 25% score, but then that's penalized less than a
> base that gets the 5% score, which seems odd.
> Anyway, ideas or suggestions would be welcome. In the mean time, I'm
> probably going to try to dig through BioJava and see what they do.
> This mind intentionally left blank
> Biococoa-dev mailing list
> Biococoa-dev at bioinformatics.org
More information about the Biococoa-dev