schristley at mac.com
Mon Sep 24 12:39:30 EDT 2007
Along this lines, the current BCSequenceReader is somewhat memory
inefficient for medium to large sequences. For example, attempting
to load in a 120 Mbp fasta file containing a few thousands sequences,
I ran out of memory (and my machine has 6GB). The main issue was the
way fasta files where parsed which creates lots of temporary strings;
I have some code currently enabled which optimizes this but there
could be more improvement.
One definite improvement is not to automatically read in the whole
file as a string. This tends to be automatically Unicode so doubles
the size of the file in memory. It would be better I think to rework
some of the readers to read directly from the file, and construct the
NSData on the fly.
On Sep 22, 2007, at 11:36 AM, Scott Christley wrote:
> On Sep 21, 2007, at 8:30 PM, Koen van der Drift wrote:
>> Thanks for adding these files, they seems very useful. I was
>> thinking based on how you factored out the BCCachedFastaFile
>> class, maybe we should do the same for BCSequenceReader as well?
>> This makes it maybe a little easier to maintain and add other
>> formats. Just a thought.
> Yes, that is a good idea. Makes the interface simple and clean.
> One disadvantage is that it creates a lot of classes, but I guess
> that doesn't really matter. The same idea could also be applied to
> BCSequenceWriter, though it looks like only fasta output is
> supported now, no reason more formats aren't added in the future.
More information about the Biococoa-dev