[Biococoa-dev] BCSequence implementation

Wed Feb 23 08:39:57 EST 2005

> you are completely right. It was my fault. I think its nice to have 
> some categories in one header file, but not due to performance 
> issues.. (you are right).
No problem, indeed it's a good idea to organize the code in categories 
inside a single header file, it nicely groups all the related code.

> I took a look into the BCAbstract Sequence and recognized that the 
> Object stores the Sequence in a NSArray of BCSymbols. Thats not really 
> good i think. Imagine handling complete genome sequences or other 
> stuff. I think we need to store it in a NSString or even simmpe char 
> array. There could be of course accessor methods for BCSymbols .... 
> but we really need to care about memory and performance issues. 
> Especially in the Foundation framework.
Again this discussion also predates your arrival at the framework, 
perhaps you can take a look in the archives...
The basic thought here is that we have made the decision very carefully 
to go for our own BCSequence and BCSymbol class (although the design 
has recently changed quite dramatically with the arrival of Charles 
;-). The reason is pretty simple, although many similarities an 
NSString is not the same as a sequence. The characters are different, 
many features are different.
Of course we could go for char arrays but that will basically get rid 
of all the benefits Cocoa (and object oriented design) has to offer us 
from the start (and thus basically kill the reason to build the 
framework in the first place).

By having our own BCSymbol and BCSequence we think we have an oriented 
design mimicking NSString (but better) which is way more powerful than 
basic c arrays can ever be. Well, I hear you think, "that's nice, but 
my computer will never work with a genome of objects (memory and 
speedwise)!" That correct, but therefore we are using a simple trick. 
All symbols are so-called shared instances, meaning that only a single 
instance is allocated and in memory, and all a sequence array consists 
of are pointers to this one instance. Yes, this will take up more 
memory than a char (around 4 times?) but that's more than worth the 
benefits Cocoa will give us. And to relieve you further, yes there are 
char and NSString accessors that will give you the desired variable if 
you need them. But just to make sure, it's a deliberate choice that ALL 
internal representations of sequences should be in the form of our own 
BCSequence objects wherever possible. Ideally this includes alignments. 
I don't think it's a good thing if a method would consist of converting 
a BCSequence to a string, do the manipulation, and reconvert the string 
to a BCSequence. All this should be done natively in the BCSequence 
format, and if that gives one trouble, we should rethink/extend the 
BCSequence class.

Now, I do realize that with the arrival of more people, it's obvious 
that they are gonna ask themselves and the list (no offense, please 
do!) the questions that we asked ourselves as well during the initial 
design of the setup we now implement. Therefore, I think that once the 
basic BCSequence system is up and running (BCSequence et al, 
annotations & features, and SeqIO) documentation will become the number 
one priority. As I want to do spend a bit more (PR)  words on BioCocoa 
on our website anyway, to generate more knowledge and traffic. I'll see 
if I can combine that with some more explanation of the basic 
architecture of the BCSequence setup. Until the 1.0 release of BioCocoa 
and if anyone agrees with the idea, it can become the temporarily 
(developer) homepage of the new BioCocoa framework, leaving the current 
one intact as long as we're still in beta (or alpha ;-) phase. Peter, 
any thoughts on this one?

> Another Question:
>
> What about a BCMutableSequence ... id like to implement one for the 
> Alignment classes
At the moment we've decided to go for a class that's mutable from the 
beginning, mainly for both performance and technical reasons. Perhaps 
Charles and John can talk a bit more about this, and I remember a 
discussion about this issue, so there must be a thread in the 
archives... It would be nice to have an optimized immutable version in 
the future, but again Charles might explain you better why that's not 
so easy with the current implementation, he designed the class cluster 
approach.

So all in all, please don't feel offended by the answers, all your 
comments are more than welcome and the lack of documentation doesn't 
really help starters. Feel free to ask us to explain the rational 
behind different design choices, it will help writing the documentation 
and FAQs for one!
Cheers,
Alex

Ps. finally I don't want to sound too motherly but as a general "rule" 
please first let the list know the plans we all have on what we will do 
before submitting anything in the CVS (for instance the 
BCAnnotableSequence.h/m files I noticed). This especially when you 
would like to see folders added and/or files (simply outline your 
proposed work in a post), then at least we know not to remove them ;-) 
Right now my current focus is the BCAnnotation/Feature part, the 
implementation of them in BCSequence and BCSequenceReader/Writer.

*********************************************************
                       ** Alexander Griekspoor **
*********************************************************
                 The Netherlands Cancer Institute
                 Department of Tumorbiology (H4)
           Plesmanlaan 121, 1066 CX, Amsterdam
                     Tel:  + 31 20 - 512 2023
                     Fax:  + 31 20 - 512 2029
                    AIM: mekentosj at mac.com
                     E-mail: a.griekspoor at nki.nl
                 Web: http://www.mekentosj.com

           LabAssistant - Get your life organized!
           http://www.mekentosj.com/labassistant

*********************************************************