[Biococoa-dev] reading large fasta files
Koen van der Drift
kvddrift at earthlink.net
Sun Oct 21 13:30:38 EDT 2007
On Oct 21, 2007, at 11:22 AM, Charles Parnot wrote:
> In general, it would be best to have the implementation hidden, so
> that indeed, the framework decides when to use one subclass or
> another. Just like NSString, NSData, or NSArray use different
> underlying data structures depending on the size of the data (I
> think). This is of course all hidden behind the class cluster
> design...
>
> I also don't know how things are already implemented, maybe things
> are already addressed this way?
Yes, I agree with having all that code hidden, so that there's only
one class for users to implement when reading data, whether it's from
a path or a string or data. Right now the class to read large (fasta)
files is a separate class that works with a filePath, but is not a
subclass of BCSequenceReader. So we need to think about how to
implement it. The way we use BCSequenceReader right now is as follows:
BCSequenceReader *sequenceReader = [[BCSequenceReader alloc] init];
BCSequenceArray *sequenceArray = [sequenceReader readFileUsingPath:
aPath];
BCSequence *mySequence = [sequenceArray objectAtIndex: i];
We could change this (or add the possibility) to use it as follows:
BCSequenceReader *sequenceReader = [[BCSequenceReader alloc]
initWithPath: aPath];
BCSequenceArray *sequenceArray = [sequenceReader readSequenceArray];
BCSequence *mySequence = [sequenceArray objectAtIndex: i];
However, to make it more complicated, BCCachedFastaFile doesn't
return an array of sequences, IIRC, it is actually a standalone
object that can be used to access regions of very large files,
without reading the whole sequence. I can't think of a way right now
to combine this with BCSequenceReader. Anyone has a suggestion?
cheers,
- Koen.
More information about the Biococoa-dev
mailing list