[Biococoa-dev] BCSequenceCluster

Peter Schols peter.schols at bio.kuleuven.be
Sat Oct 8 09:32:16 EDT 2005

> Thanks Peter, for the code example, this is the kind of stuff I was  
> thinking about. For the sake of simplicity, should not we have a  
> single class for such related concepts as "BCSequenceGroup",  
> "BCSequenceCluster", and "BCSequenceAlignment", or even, let's  
> throw one more, "BCSequenceContig"? For all of these, we are just  
> talking about a bunch of sequences positioned a specific way with  
> respect to each other. I just want to make sure I am not missing  
> something.
> The conceptual differences are quite small, and I think to make the  
> framework really taste like Cocoa, it has to have a simple  
> interface. In fact, this is really the essence of what I was  
> wondering about. How many classes do we need for all these related  
> concepts? Is one enough?

I think this is a very good suggestion, I totally agree. One class  
should fit them all. The only potential "problem" is that the overlap  
with contigs is quite small, so the BCSequences will contain plenty  
of gaps while with the other "formats", the sequences will be much  
more similar. In fact that's not really a problem at all... just a  
thought. For entire genomes, this might become a problem, though, as  
every sequence might become very long.
This class could have a -(BOOL)isAligned method or something similar  
to easily test whether sequences are aligned (and thus have equal  
length) because I can imagine I/O formats that return a group of non- 
homologous sequences with different lengths.

> Regarding the implementation, I would tend to prefer using gaps,  
> because we already have the BCSymbol, we can easily generate  
> sequences without them (good suggestion, Peter), and it is easy to  
> use for display, comparisons,... On the other hand, using offsets  
> renders some task like "get the symbols at position 827 of all the  
> sequences" a bit hard (basically trading off simplicity and speed  
> for data compression). But in the end, I would be happy with any  
> implementation, as long as it works, particularly because I am  
> probably not going to be doing much of it!!

I'd also prefer the gap approach above the offset approach. It will  
keep things much simpler, it will make I/O much simpler and we can  
easily switch between a gapped and non-gapped sequence.


Disclaimer: http://www.kuleuven.be/cwis/email_disclaimer.htm

More information about the Biococoa-dev mailing list