[Biococoa-dev] Optimizations

John Timmer jtimmer at bellatlantic.net
Sat Mar 26 16:56:01 EST 2005

Okay, I changed the represents/representedBy collections from arrays to sets
and fixed the all the related calls to adjust to that.  It definitely makes
a difference - cuts the time about in half.  On a dual 1.8 G5, I got the
following results using a 6-mer searching a 1.2KB DNA sequence 50 times:

2005-03-26 16:38:51.902 Translation[8375] ambiguous finding took -0.821547
2005-03-26 16:38:52.775 Translation[8375] ambiguous old finding took
-0.873676 seconds
2005-03-26 16:38:53.034 Translation[8375] strict finding took -0.258846
2005-03-26 16:38:53.466 Translation[8375] strict old finding took -0.431471

This is after catching two bugs, one in the old and one in the new method.
It was pretty funny - the old version kept coming in faster, so I knew there
had to be something wrong ;).

For the curious, extrapolating from this single data point indicates that
the ambiguous search is faster than searching for each of its possible
strict sequences as soon as the ambiguity can't be resolved into <4 strict

Given the big boosts, I'm going to do the same for complements now - I
expect that will significantly boost translation speeds.

This mind intentionally left blank

More information about the Biococoa-dev mailing list