[Biococoa-dev] BCSequence implementation

Wed Feb 23 16:30:59 EST 2005

> I took a look into the BCAbstract Sequence and recognized that the
> Object stores the Sequence in a NSArray of BCSymbols. Thats not really
> good i think. Imagine handling complete genome sequences or other
> stuff. I think we need to store it in a NSString or even simmpe char
> array. There could be of course accessor methods for BCSymbols ....
> but we really need to care about memory and performance issues.
> Especially in the Foundation framework.

As one of the people that Alex converted to this idea, let me provide a bit
more of an explanation here.  There's no question that a raw C array of
unichar would be faster for some things than our implementation, and more in
line with what other frameworks are doing.  In order to do anything with
that sort of sequence, however, you have to interpret it, which means
handing it off to other objects, finding the appropriate information, using
lookup tables, etc.  A lot of the basic efficiency of the storage will be
lost when you actually try to learn something about the sequence.

The symbol objects we have provide for a richer experience.  Want to
complement a nucleotide?  Ask it what its complement is.  Want to know what
nucleotides are represented by "Y"?  Just ask it.  How much does Threonine
wieght?  Etc.  Basically, they cut down on the intervening objects/methods
you need in order to interpret the information in a sequence.  They also
make methods like reverse-complementation a three line bit of code.

Anyway, if you dig through the sequences a bit, you'll also see that we can
still get speed boosts by targeting bottlenecks.  In some cases, it's much
faster to use the underlying CoreFoundation array structures, which are
closer in speed to basic C-structs, but provide a lot of the flexibility of
Cocoa. 

JT

_______________________________________________
This mind intentionally left blank