[Biococoa-dev] Optimizations

Sat Mar 26 16:56:01 EST 2005

Okay, I changed the represents/representedBy collections from arrays to sets
and fixed the all the related calls to adjust to that.  It definitely makes
a difference - cuts the time about in half.  On a dual 1.8 G5, I got the
following results using a 6-mer searching a 1.2KB DNA sequence 50 times:

2005-03-26 16:38:51.902 Translation[8375] ambiguous finding took -0.821547
seconds
2005-03-26 16:38:52.775 Translation[8375] ambiguous old finding took
-0.873676 seconds
2005-03-26 16:38:53.034 Translation[8375] strict finding took -0.258846
seconds
2005-03-26 16:38:53.466 Translation[8375] strict old finding took -0.431471
seconds

This is after catching two bugs, one in the old and one in the new method.
It was pretty funny - the old version kept coming in faster, so I knew there
had to be something wrong ;).

For the curious, extrapolating from this single data point indicates that
the ambiguous search is faster than searching for each of its possible
strict sequences as soon as the ambiguity can't be resolved into <4 strict
sequences. 

Given the big boosts, I'm going to do the same for complements now - I
expect that will significantly boost translation speeds.

JT
_______________________________________________
This mind intentionally left blank