[Biococoa-dev] BCSequenceCluster

Koen van der Drift kvddrift at earthlink.net
Thu Oct 6 06:38:44 EDT 2005

On Oct 6, 2005, at 6:08 AM, Peter Schols wrote:

> Dear Koen and Charles,
> I think we have two options: we could either use the gap symbol to  
> represent gaps. In that case, there is no need for an offset at  
> all, since we can simply use the gap symbol, both at the beginning  
> of the sequence and in the middle. This is definitely the easiest  
> way. We could then have a BCSequenceGroup that consists of  
> BCSequences of equal length containing gaps.
> The only problem I see with this option is that this design does  
> not correspond to the real world: individual sequences don't have  
> gaps / indels. Indels only exist when comparing (aligning)  
> sequences. So from a framework-design point of view, gaps should be  
> a property of BCSequenceGroups not a property of individual  
> sequences. That brings us to the second option:
> Use BCSequenceGroups to add an array of offsets for every sequence.  
> This option would correspond with the interface Koen has written.  
> However, I don't see why we would need the representativeSequence.  
> Unless you want this class to actually align a set of sequences  
> itself, I think this representativeSequence is not necessary. I  
> suppose that the BCSequenceGroup class will receive its already  
> aligned sequences either from the I/O class of from a BCAligner  
> class (that could be just a wrapper for existing CLI alignment  
> apps). In this case, removing any sequence from the BCSequenceGroup  
> should not affect the alignment so there is no need for a  
> representativeSequence.

In the case where the input sequences are already aligned, we're  
ready to go, I guess. Also note we already have a BCAlignment class  
which we could use. However, I have not yet adapted that to use the  
NSData structure. I didn't write the alignment code, and I don't want  
to mess up someone elses work ;-)

I added the representative sequence after looking up some info about  
sequence clusters (see eg http://en.wikipedia.org/wiki/ 
Sequence_clustering). But if you suggest it is not needed in this  
case, I'll be happy to remove it!

- Koen.

More information about the Biococoa-dev mailing list