[Biococoa-dev] BCCachedSequenceFile
Koen van der Drift
kvddrift at earthlink.net
Mon Sep 24 12:52:15 EDT 2007
The current BCSequenceReader (and BCSequenceWriter) code is largely based on the way it was designed in the original BioCocoa framework that was written by Peter a couple of years ago. We updated it to work with BCSequence, but I don't think it has ever been tested for large files. So any improvement to read files more efficently is more than welcome.
- Koen.
-----Original Message-----
>From: Scott Christley <schristley at mac.com>
>Sent: Sep 24, 2007 12:39 PM
>To: biococoa-dev at bioinformatics.org
>Subject: Re: [Biococoa-dev] BCCachedSequenceFile
>
>
>Along this lines, the current BCSequenceReader is somewhat memory
>inefficient for medium to large sequences. For example, attempting
>to load in a 120 Mbp fasta file containing a few thousands sequences,
>I ran out of memory (and my machine has 6GB). The main issue was the
>way fasta files where parsed which creates lots of temporary strings;
>I have some code currently enabled which optimizes this but there
>could be more improvement.
>
>One definite improvement is not to automatically read in the whole
>file as a string. This tends to be automatically Unicode so doubles
>the size of the file in memory. It would be better I think to rework
>some of the readers to read directly from the file, and construct the
>NSData on the fly.
>
>cheers
>Scott
>
>On Sep 22, 2007, at 11:36 AM, Scott Christley wrote:
>
>>
>> On Sep 21, 2007, at 8:30 PM, Koen van der Drift wrote:
>>
>>> Thanks for adding these files, they seems very useful. I was
>>> thinking based on how you factored out the BCCachedFastaFile
>>> class, maybe we should do the same for BCSequenceReader as well?
>>> This makes it maybe a little easier to maintain and add other
>>> formats. Just a thought.
>>
>> Yes, that is a good idea. Makes the interface simple and clean.
>> One disadvantage is that it creates a lot of classes, but I guess
>> that doesn't really matter. The same idea could also be applied to
>> BCSequenceWriter, though it looks like only fasta output is
>> supported now, no reason more formats aren't added in the future.
>>
>
>_______________________________________________
>Biococoa-dev mailing list
>Biococoa-dev at bioinformatics.org
>https://bioinformatics.org/mailman/listinfo/biococoa-dev
More information about the Biococoa-dev
mailing list