[Biococoa-dev] BCCachedSequenceFile
Thomas Keller
kellert at ohsu.edu
Mon Sep 24 12:49:30 EDT 2007
Do you need any ab1 files (direct output from the ABI 3130xl)?
regards,
Tom K
Thomas J Keller PhD
kellert at ohsu.edu
4-2442
On Sep 24, 2007, at 9:39 AM, Scott Christley wrote:
>
> Along this lines, the current BCSequenceReader is somewhat memory
> inefficient for medium to large sequences. For example, attempting
> to load in a 120 Mbp fasta file containing a few thousands
> sequences, I ran out of memory (and my machine has 6GB). The main
> issue was the way fasta files where parsed which creates lots of
> temporary strings; I have some code currently enabled which
> optimizes this but there could be more improvement.
>
> One definite improvement is not to automatically read in the whole
> file as a string. This tends to be automatically Unicode so
> doubles the size of the file in memory. It would be better I think
> to rework some of the readers to read directly from the file, and
> construct the NSData on the fly.
>
> cheers
> Scott
>
> On Sep 22, 2007, at 11:36 AM, Scott Christley wrote:
>
>>
>> On Sep 21, 2007, at 8:30 PM, Koen van der Drift wrote:
>>
>>> Thanks for adding these files, they seems very useful. I was
>>> thinking based on how you factored out the BCCachedFastaFile
>>> class, maybe we should do the same for BCSequenceReader as well?
>>> This makes it maybe a little easier to maintain and add other
>>> formats. Just a thought.
>>
>> Yes, that is a good idea. Makes the interface simple and clean.
>> One disadvantage is that it creates a lot of classes, but I guess
>> that doesn't really matter. The same idea could also be applied
>> to BCSequenceWriter, though it looks like only fasta output is
>> supported now, no reason more formats aren't added in the future.
>>
>
> _______________________________________________
> Biococoa-dev mailing list
> Biococoa-dev at bioinformatics.org
> https://bioinformatics.org/mailman/listinfo/biococoa-dev
More information about the Biococoa-dev
mailing list