[Biococoa-dev] reading large fasta files

Sun Oct 21 13:30:38 EDT 2007

On Oct 21, 2007, at 11:22 AM, Charles Parnot wrote:

> In general, it would be best to have the implementation hidden, so  
> that indeed, the framework decides when to use one subclass or  
> another. Just like NSString, NSData, or NSArray use different  
> underlying data structures depending on the size of the data (I  
> think). This is of course all hidden behind the class cluster  
> design...
>
> I also don't know how things are already implemented, maybe things  
> are already addressed this way?

Yes, I agree with having all that code hidden, so that there's only  
one class for users to implement when reading data, whether it's from  
a path or a string or data. Right now the class to read large (fasta)  
files is a separate class that works with a filePath, but is not a  
subclass of BCSequenceReader. So we need to think about how to  
implement it. The way we use BCSequenceReader right now is as follows:

	BCSequenceReader	*sequenceReader = [[BCSequenceReader alloc] init];
	BCSequenceArray	*sequenceArray = [sequenceReader readFileUsingPath:  
aPath];			
	BCSequence		*mySequence = [sequenceArray objectAtIndex: i];

We could change this (or add the possibility) to use it as follows:

	BCSequenceReader 	*sequenceReader = [[BCSequenceReader alloc]  
initWithPath: aPath];
	BCSequenceArray	*sequenceArray = [sequenceReader readSequenceArray];			
	BCSequence		*mySequence = [sequenceArray objectAtIndex: i];

However, to make it more complicated, BCCachedFastaFile doesn't  
return an array of sequences, IIRC, it is actually a standalone  
object that can be used to access regions of very large files,  
without reading the whole sequence. I can't think of a way right now  
to combine this with BCSequenceReader. Anyone has a suggestion?

cheers,

- Koen.