[Biococoa-dev] BCSequence implementation
Alexander Griekspoor
mek at mekentosj.com
Wed Feb 23 08:39:57 EST 2005
> you are completely right. It was my fault. I think its nice to have
> some categories in one header file, but not due to performance
> issues.. (you are right).
No problem, indeed it's a good idea to organize the code in categories
inside a single header file, it nicely groups all the related code.
> I took a look into the BCAbstract Sequence and recognized that the
> Object stores the Sequence in a NSArray of BCSymbols. Thats not really
> good i think. Imagine handling complete genome sequences or other
> stuff. I think we need to store it in a NSString or even simmpe char
> array. There could be of course accessor methods for BCSymbols ....
> but we really need to care about memory and performance issues.
> Especially in the Foundation framework.
Again this discussion also predates your arrival at the framework,
perhaps you can take a look in the archives...
The basic thought here is that we have made the decision very carefully
to go for our own BCSequence and BCSymbol class (although the design
has recently changed quite dramatically with the arrival of Charles
;-). The reason is pretty simple, although many similarities an
NSString is not the same as a sequence. The characters are different,
many features are different.
Of course we could go for char arrays but that will basically get rid
of all the benefits Cocoa (and object oriented design) has to offer us
from the start (and thus basically kill the reason to build the
framework in the first place).
By having our own BCSymbol and BCSequence we think we have an oriented
design mimicking NSString (but better) which is way more powerful than
basic c arrays can ever be. Well, I hear you think, "that's nice, but
my computer will never work with a genome of objects (memory and
speedwise)!" That correct, but therefore we are using a simple trick.
All symbols are so-called shared instances, meaning that only a single
instance is allocated and in memory, and all a sequence array consists
of are pointers to this one instance. Yes, this will take up more
memory than a char (around 4 times?) but that's more than worth the
benefits Cocoa will give us. And to relieve you further, yes there are
char and NSString accessors that will give you the desired variable if
you need them. But just to make sure, it's a deliberate choice that ALL
internal representations of sequences should be in the form of our own
BCSequence objects wherever possible. Ideally this includes alignments.
I don't think it's a good thing if a method would consist of converting
a BCSequence to a string, do the manipulation, and reconvert the string
to a BCSequence. All this should be done natively in the BCSequence
format, and if that gives one trouble, we should rethink/extend the
BCSequence class.
Now, I do realize that with the arrival of more people, it's obvious
that they are gonna ask themselves and the list (no offense, please
do!) the questions that we asked ourselves as well during the initial
design of the setup we now implement. Therefore, I think that once the
basic BCSequence system is up and running (BCSequence et al,
annotations & features, and SeqIO) documentation will become the number
one priority. As I want to do spend a bit more (PR) words on BioCocoa
on our website anyway, to generate more knowledge and traffic. I'll see
if I can combine that with some more explanation of the basic
architecture of the BCSequence setup. Until the 1.0 release of BioCocoa
and if anyone agrees with the idea, it can become the temporarily
(developer) homepage of the new BioCocoa framework, leaving the current
one intact as long as we're still in beta (or alpha ;-) phase. Peter,
any thoughts on this one?
> Another Question:
>
> What about a BCMutableSequence ... id like to implement one for the
> Alignment classes
At the moment we've decided to go for a class that's mutable from the
beginning, mainly for both performance and technical reasons. Perhaps
Charles and John can talk a bit more about this, and I remember a
discussion about this issue, so there must be a thread in the
archives... It would be nice to have an optimized immutable version in
the future, but again Charles might explain you better why that's not
so easy with the current implementation, he designed the class cluster
approach.
So all in all, please don't feel offended by the answers, all your
comments are more than welcome and the lack of documentation doesn't
really help starters. Feel free to ask us to explain the rational
behind different design choices, it will help writing the documentation
and FAQs for one!
Cheers,
Alex
Ps. finally I don't want to sound too motherly but as a general "rule"
please first let the list know the plans we all have on what we will do
before submitting anything in the CVS (for instance the
BCAnnotableSequence.h/m files I noticed). This especially when you
would like to see folders added and/or files (simply outline your
proposed work in a post), then at least we know not to remove them ;-)
Right now my current focus is the BCAnnotation/Feature part, the
implementation of them in BCSequence and BCSequenceReader/Writer.
*********************************************************
** Alexander Griekspoor **
*********************************************************
The Netherlands Cancer Institute
Department of Tumorbiology (H4)
Plesmanlaan 121, 1066 CX, Amsterdam
Tel: + 31 20 - 512 2023
Fax: + 31 20 - 512 2029
AIM: mekentosj at mac.com
E-mail: a.griekspoor at nki.nl
Web: http://www.mekentosj.com
LabAssistant - Get your life organized!
http://www.mekentosj.com/labassistant
*********************************************************
More information about the Biococoa-dev
mailing list