[Biococoa-dev] BCCachedSequenceFile

Scott Christley schristley at mac.com
Sat Sep 22 11:36:36 EDT 2007


On Sep 21, 2007, at 8:30 PM, Koen van der Drift wrote:

> Thanks for adding these files, they seems very useful. I was  
> thinking based on how you factored out the BCCachedFastaFile class,  
> maybe we should do the same for BCSequenceReader as well? This  
> makes it maybe a little easier to maintain and add other formats.   
> Just a thought.

Yes, that is a good idea.  Makes the interface simple and clean.  One  
disadvantage is that it creates a lot of classes, but I guess that  
doesn't really matter.  The same idea could also be applied to  
BCSequenceWriter, though it looks like only fasta output is supported  
now, no reason more formats aren't added in the future.


> Also, the way your new class is now set up is quite different from  
> BCSequenceReader, the latter which returns an BCSequenceArray (even  
> if there's only one sequence in the file). Is it possible to use a  
> similar approach for BCCachedSequenceFile as well? I think we need  
> to make sure that we use a consistent approach throughout the  
> framework, not only for the developers, but also for the (possible)  
> users. Again, just a thought.


I was thinking about this when first designing the class, and I agree  
I would like to go in this direction.  The idea would be a subclass  
of BCSequence, like BCCachedSequence, that overrides methods to  
encapsulate interaction with the sequence file.  What I haven't quite  
figured out yet is how to support all of the functionality in  
BCSequence.

There are some design issues I'm still mulling over.  For example,  
should each BCCachedSequence hold meta-data about that particular  
sequence (or all the sequences) in the file, should all of its  
interaction go strictly through BCCachedSequenceFile?  Currently  
BCCachedSequenceFile isn't thread safe, and in the future I will want  
it to be as I expect genome-wide algorithms to take advantage of the  
multi-core Macs.

Also, BCSequence is currently expensive for accessing single sequence  
data, constructing a BCSymbol just to get a character is a bit too  
much.  So part of this would be to think how to extend BCSequence  
with more cache-friendly functionality.

cheers
Scott





More information about the Biococoa-dev mailing list