[Biococoa-dev] BCCachedSequenceFile

Koen van der Drift kvddrift at earthlink.net
Mon Sep 24 12:52:15 EDT 2007

The current BCSequenceReader (and BCSequenceWriter) code is largely based on the way it was designed in the original BioCocoa framework that was written by Peter a couple of years ago. We updated it to work with BCSequence, but I don't think it has ever been tested for large files. So any improvement to read files more efficently is more than welcome.

- Koen.

-----Original Message-----
>From: Scott Christley <schristley at mac.com>
>Sent: Sep 24, 2007 12:39 PM
>To: biococoa-dev at bioinformatics.org
>Subject: Re: [Biococoa-dev] BCCachedSequenceFile
>Along this lines, the current BCSequenceReader is somewhat memory  
>inefficient for medium to large sequences.  For example, attempting  
>to load in a 120 Mbp fasta file containing a few thousands sequences,  
>I ran out of memory (and my machine has 6GB).  The main issue was the  
>way fasta files where parsed which creates lots of temporary strings;  
>I have some code currently enabled which optimizes this but there  
>could be more improvement.
>One definite improvement is not to automatically read in the whole  
>file as a string.  This tends to be automatically Unicode so doubles  
>the size of the file in memory.  It would be better I think to rework  
>some of the readers to read directly from the file, and construct the  
>NSData on the fly.
>On Sep 22, 2007, at 11:36 AM, Scott Christley wrote:
>> On Sep 21, 2007, at 8:30 PM, Koen van der Drift wrote:
>>> Thanks for adding these files, they seems very useful. I was  
>>> thinking based on how you factored out the BCCachedFastaFile  
>>> class, maybe we should do the same for BCSequenceReader as well?  
>>> This makes it maybe a little easier to maintain and add other  
>>> formats.  Just a thought.
>> Yes, that is a good idea.  Makes the interface simple and clean.   
>> One disadvantage is that it creates a lot of classes, but I guess  
>> that doesn't really matter.  The same idea could also be applied to  
>> BCSequenceWriter, though it looks like only fasta output is  
>> supported now, no reason more formats aren't added in the future.
>Biococoa-dev mailing list
>Biococoa-dev at bioinformatics.org

More information about the Biococoa-dev mailing list