[Biococoa-dev] Weighted sequence score

Philipp Seibel biococoa at bioworxx.com
Tue Mar 15 17:06:02 EST 2005


Am 15.03.2005 um 21:03 schrieb Alexander Griekspoor:

> On 15-mrt-05, at 16:54, John Timmer wrote:
>
>> One of the things the alignment work has gotten me thinking about
>> implementing is a weighted sequence score.

Its more a weighted base score, isn't it ?

>> This is for situations like
>> splice sites or transcription factor binding sites, where you don't 
>> tend to
>> have absolute sequences, but often have situations like "80% of the 
>> time,
>> the first base is an A, and when it's not, 15% of the time it's a G". 
>>  The
>> best you can do is evaluate how close a given sequence is to the ideal
>> sequence - ie, the best score you can get  at position 1 in the 
>> example
>> above is only 80%, not 100%.
> Nice idea indeed, perfect to find consensus sequences in your 
> sequence.S

A Profile is nothing else than a bunch of sequences represented by a 
"probabilistic" model. So if you look at it, like
80% of my sequences have at a specific position an A and 15% of them 
have a G, it will bring you to a convenient method like:

+ (BCSequenceProfile *)profileWithSequenceArray:(NSArray *)array;

>>
>> The actual implementation of this doesn't seem that hard, but the 
>> details
>> are driving me nuts.  Three in particular:
>>
>> How to provide the user a way to set up the scoring table.  My best 
>> idea
>> would be to require a formatted string, like this:
>> A:80,G:15,C:5
>> T:60,C:40
>> Etc.
>> Does this sound good?
> Hmm, not really, but I don't have a good alternative either, perhaps 
> some "consensus site object".

Don't think we will need it, because you can construct a profile like 
this:

sequenceA : AAAATATAGC
sequenceB : AAATATATAT
sequenceC:  AAATTATATT

with the previous described method

A: 100
A: 100
A: 100
A: 33 T: 66
A: 33 T: 66
....

Of course profiles could have a convenient method like this:

+ (BCSequenceProfile *)profileWithAlignment:(BCSequenceAlignment 
*)alignment;

Phil
>>
>> The second is ambiguity.  I could just require that the queried 
>> sequence be
>> strict, but that seems pretty limiting.
> Absolutely because that's the idea of the thing right! If I'm not 
> allowed to input W:100, I will just input A:50, T:50 right ;-) In fact 
> that is how you might solve the problem...
> I'll think about the other problems John...
> Alex
>
>
>> The question then becomes how to
>> evaluate a situation where the first base in the example above is 
>> compared
>> to a purine?  It shouldn't score as well as matching A, but it 
>> shouldn't be
>> penalized as much as matching to an N.  I could just require the user 
>> to
>> supply a value for purines, but that may become a real pain for fairly
>> ambiguous sequences.
>>
>> Non-100% value totals.  What if the user, for base 1, doesn't supply 
>> a C
>> value, meaning that 5% of the time it could be anything?  I could 
>> just score
>> it as 5%.  The problem with that is how to score  position where 
>> there's
>> 100% defined symbols, but it's compared with an N?  My gut response 
>> there
>> would be to give a 25% score, but then that's penalized less than a 
>> known
>> base that gets the 5% score, which seems odd.
>>
>> Anyway, ideas or suggestions would be welcome.  In the mean time, I'm
>> probably going to try to dig through BioJava and see what they do.
>>
>> JT
>>
>> _______________________________________________
>> This mind intentionally left blank
>>
>>
>> _______________________________________________
>> Biococoa-dev mailing list
>> Biococoa-dev at bioinformatics.org
>> https://bioinformatics.org/mailman/listinfo/biococoa-dev
>>
>>
> *********************************************************
>                     ** Alexander Griekspoor **
> *********************************************************
>              The Netherlands Cancer Institute
>              Department of Tumorbiology (H4)
>         Plesmanlaan 121, 1066 CX, Amsterdam
>                    Tel:  + 31 20 - 512 2023
>                    Fax:  + 31 20 - 512 2029
>                   AIM: mekentosj at mac.com
>                    E-mail: a.griekspoor at nki.nl
>                Web: http://www.mekentosj.com
>
> 	Claiming that the Macintosh is inferior to Windows
> 	because most people use Windows, is like saying
> 	that all other restaurants serve food that is
> 	inferior to McDonalds
>
> *********************************************************
>
>
> *********************************************************
>                     ** Alexander Griekspoor **
> *********************************************************
>               The Netherlands Cancer Institute
>               Department of Tumorbiology (H4)
>          Plesmanlaan 121, 1066 CX, Amsterdam
>                    Tel:  + 31 20 - 512 2023
>                   Fax:  + 31 20 - 512 2029
>                   AIM: mekentosj at mac.com
>                  E-mail: a.griekspoor at nki.nl
>               Web: http://www.mekentosj.com
>
>               4Peaks - For Peaks, Four Peaks.
>        2004 Winner of the Apple Design Awards
>                Best Mac OS X Student Product
>              http://www.mekentosj.com/4peaks
>
> *********************************************************
>
> _______________________________________________
> Biococoa-dev mailing list
> Biococoa-dev at bioinformatics.org
> https://bioinformatics.org/mailman/listinfo/biococoa-dev
>
>




More information about the Biococoa-dev mailing list