[Biococoa-dev] BCSequenceCluster

Peter Schols peter.schols at bio.kuleuven.be
Thu Oct 6 06:08:04 EDT 2005


Dear Koen and Charles,

I think we have two options: we could either use the gap symbol to  
represent gaps. In that case, there is no need for an offset at all,  
since we can simply use the gap symbol, both at the beginning of the  
sequence and in the middle. This is definitely the easiest way. We  
could then have a BCSequenceGroup that consists of BCSequences of  
equal length containing gaps.

The only problem I see with this option is that this design does not  
correspond to the real world: individual sequences don't have gaps /  
indels. Indels only exist when comparing (aligning) sequences. So  
from a framework-design point of view, gaps should be a property of  
BCSequenceGroups not a property of individual sequences. That brings  
us to the second option:

Use BCSequenceGroups to add an array of offsets for every sequence.  
This option would correspond with the interface Koen has written.  
However, I don't see why we would need the representativeSequence.  
Unless you want this class to actually align a set of sequences  
itself, I think this representativeSequence is not necessary. I  
suppose that the BCSequenceGroup class will receive its already  
aligned sequences either from the I/O class of from a BCAligner class  
(that could be just a wrapper for existing CLI alignment apps). In  
this case, removing any sequence from the BCSequenceGroup should not  
affect the alignment so there is no need for a representativeSequence.

This is just my point of view, from a phylogenetics background. Maybe  
we could also have a look at how other Bio frameworks solve this  
problem.

Cheers,

Peter




On 06 Oct 2005, at 02:16, Koen van der Drift wrote:


>
> On Oct 5, 2005, at 7:51 PM, Charles Parnot wrote:
>
>
>
>> I think that returning the array of offset to the user is like  
>> giving the user the Bible not as a string, but as a list of  
>> positions of all the letters of the alphabet!!
>>
>> The offset array should remain private, I think. We should return  
>> sequences with gaps (either generate them using the offsetArray,  
>> or use gaps symbols from the start).
>>
>>
>>
>
> Making the offsetArray private sounds like a good idea. I also like  
> the use of the gap symbol.  However, what I am starting to sense  
> now is that we are actually making a BCAlignment class in disguise  
> and I am not sure if we should do that in this case. It's not clear  
> to me from Peter's original request if that's what he meant. What  
> type of alignment is used for instance for phylogenetic sequence  
> clusters, I assume that's what he is talking about?
>
> - Koen.
> _______________________________________________
> Biococoa-dev mailing list
> Biococoa-dev at bioinformatics.org
> https://bioinformatics.org/mailman/listinfo/biococoa-dev
>
>
>



Disclaimer: http://www.kuleuven.be/cwis/email_disclaimer.htm




More information about the Biococoa-dev mailing list