[Biococoa-dev] Optimizations

John Timmer jtimmer at bellatlantic.net
Tue Mar 15 19:29:26 EST 2005

> On Mar 15, 2005, at 4:34 PM, John Timmer wrote:
>> In answer to my own question, NSSet seems to be more efficient than an
>> array
>> for this use.  Any thoughts on using one?  We could either use an
>> internal,
>> private ivar only for tests such as this, or change the array to a set.
>> Arrays and sets seem pretty readily convertible, and all this is in the
>> BCSymbol class, so this shouldn't be a big deal.
> Where do you want to use an NSSet, as a return value for findSequence?
> The advantage of the array is that the found sequences are in the
> 'right order'.

Sorry for my lack of clarity.  Shark says that over 30% of the execution
time in the "findSequence" method is spent checking whether one symbol
represents another.  Currently, that's done by checking whether the
submitted symbol occurs in the array of represented symbols.  According to
the docs, making the represented symbols a set instead of an array will
speed this up significantly.

Returning an array from the method doesn't enter into this issue, and
definitely should not be changed.

I may be obsessing about this, but my tests earlier today showed that the
non-strict version of the code to take 4-5X the time to execute compared to
the strict one.  In a 1.2Kb sequence, it's the difference between barely
perceptible and wondering whether something's broken.


PS - Once symbol sets are done, a quick test for the symbol set used would
also allow us to set the strict flag, even if the user hasn't done so, and
speed up many cases, so let me know when it's done and in use.

This mind intentionally left blank

More information about the Biococoa-dev mailing list