[Biococoa-dev] BCCachedSequenceFile

Mon Sep 24 12:39:30 EDT 2007

Along this lines, the current BCSequenceReader is somewhat memory  
inefficient for medium to large sequences.  For example, attempting  
to load in a 120 Mbp fasta file containing a few thousands sequences,  
I ran out of memory (and my machine has 6GB).  The main issue was the  
way fasta files where parsed which creates lots of temporary strings;  
I have some code currently enabled which optimizes this but there  
could be more improvement.

One definite improvement is not to automatically read in the whole  
file as a string.  This tends to be automatically Unicode so doubles  
the size of the file in memory.  It would be better I think to rework  
some of the readers to read directly from the file, and construct the  
NSData on the fly.

cheers
Scott

On Sep 22, 2007, at 11:36 AM, Scott Christley wrote:

>
> On Sep 21, 2007, at 8:30 PM, Koen van der Drift wrote:
>
>> Thanks for adding these files, they seems very useful. I was  
>> thinking based on how you factored out the BCCachedFastaFile  
>> class, maybe we should do the same for BCSequenceReader as well?  
>> This makes it maybe a little easier to maintain and add other  
>> formats.  Just a thought.
>
> Yes, that is a good idea.  Makes the interface simple and clean.   
> One disadvantage is that it creates a lot of classes, but I guess  
> that doesn't really matter.  The same idea could also be applied to  
> BCSequenceWriter, though it looks like only fasta output is  
> supported now, no reason more formats aren't added in the future.
>