[Biococoa-dev] BCCachedSequenceFile
Scott Christley
schristley at mac.com
Sat Sep 22 11:36:36 EDT 2007
On Sep 21, 2007, at 8:30 PM, Koen van der Drift wrote:
> Thanks for adding these files, they seems very useful. I was
> thinking based on how you factored out the BCCachedFastaFile class,
> maybe we should do the same for BCSequenceReader as well? This
> makes it maybe a little easier to maintain and add other formats.
> Just a thought.
Yes, that is a good idea. Makes the interface simple and clean. One
disadvantage is that it creates a lot of classes, but I guess that
doesn't really matter. The same idea could also be applied to
BCSequenceWriter, though it looks like only fasta output is supported
now, no reason more formats aren't added in the future.
> Also, the way your new class is now set up is quite different from
> BCSequenceReader, the latter which returns an BCSequenceArray (even
> if there's only one sequence in the file). Is it possible to use a
> similar approach for BCCachedSequenceFile as well? I think we need
> to make sure that we use a consistent approach throughout the
> framework, not only for the developers, but also for the (possible)
> users. Again, just a thought.
I was thinking about this when first designing the class, and I agree
I would like to go in this direction. The idea would be a subclass
of BCSequence, like BCCachedSequence, that overrides methods to
encapsulate interaction with the sequence file. What I haven't quite
figured out yet is how to support all of the functionality in
BCSequence.
There are some design issues I'm still mulling over. For example,
should each BCCachedSequence hold meta-data about that particular
sequence (or all the sequences) in the file, should all of its
interaction go strictly through BCCachedSequenceFile? Currently
BCCachedSequenceFile isn't thread safe, and in the future I will want
it to be as I expect genome-wide algorithms to take advantage of the
multi-core Macs.
Also, BCSequence is currently expensive for accessing single sequence
data, constructing a BCSymbol just to get a character is a bit too
much. So part of this would be to think how to extend BCSequence
with more cache-friendly functionality.
cheers
Scott
More information about the Biococoa-dev
mailing list