[Biococoa-dev] BCCachedSequenceFile

Mon Sep 24 12:49:30 EDT 2007

Do you need any ab1 files (direct output from the ABI 3130xl)?
regards,
Tom K

Thomas J Keller PhD
kellert at ohsu.edu
4-2442

On Sep 24, 2007, at 9:39 AM, Scott Christley wrote:

>
> Along this lines, the current BCSequenceReader is somewhat memory  
> inefficient for medium to large sequences.  For example, attempting  
> to load in a 120 Mbp fasta file containing a few thousands  
> sequences, I ran out of memory (and my machine has 6GB).  The main  
> issue was the way fasta files where parsed which creates lots of  
> temporary strings; I have some code currently enabled which  
> optimizes this but there could be more improvement.
>
> One definite improvement is not to automatically read in the whole  
> file as a string.  This tends to be automatically Unicode so  
> doubles the size of the file in memory.  It would be better I think  
> to rework some of the readers to read directly from the file, and  
> construct the NSData on the fly.
>
> cheers
> Scott
>
> On Sep 22, 2007, at 11:36 AM, Scott Christley wrote:
>
>>
>> On Sep 21, 2007, at 8:30 PM, Koen van der Drift wrote:
>>
>>> Thanks for adding these files, they seems very useful. I was  
>>> thinking based on how you factored out the BCCachedFastaFile  
>>> class, maybe we should do the same for BCSequenceReader as well?  
>>> This makes it maybe a little easier to maintain and add other  
>>> formats.  Just a thought.
>>
>> Yes, that is a good idea.  Makes the interface simple and clean.   
>> One disadvantage is that it creates a lot of classes, but I guess  
>> that doesn't really matter.  The same idea could also be applied  
>> to BCSequenceWriter, though it looks like only fasta output is  
>> supported now, no reason more formats aren't added in the future.
>>
>
> _______________________________________________
> Biococoa-dev mailing list
> Biococoa-dev at bioinformatics.org
> https://bioinformatics.org/mailman/listinfo/biococoa-dev