[Biococoa-dev] More on BCSymbolSets

Koen van der Drift kvddrift at earthlink.net
Sun Feb 27 20:05:42 EST 2005


Hi,

Again I was looking at the BCSymbolSet code to implement it more in the 
BCSequence code. However with the new BCSequence class structure in 
place I am not so sure yet how to do this. For instance, we have the 
following method in each subclass:

- (id) initWithString:(NSString *)entry 
skippingUnknownSymbols:(BOOL)skipFlag;

I guess these are intended to be the designated initializer, although 
they have not been labeled as such in all classes. Now in BCSymbolSet 
we have the following (eg for DNA):

dnaStrictSymbolSet (for C G T A) and dnaSymbolSet (for all possible 
nucleotides, including the ambiguous ones).

Similar symbolsets are available for the other sequence types. Both 
symbolsets are possible in the method above, the skipFlag is not 
related to either symbolset. So what I can do is, is to test 
immediately for ambiguous symbols when creating the sequence (using 
containsAmbiguousSymbols), and based on that set the appropriate 
symbolset. Or even, to avoid a double iteration, test immediately for 
isCompoundSymbol when each symbol is added.

I think this code should only go in the designated initializer, because 
that should be called by all other initializers. Would this be a 
reasonable approach?

Then of course we have the 'unknown symbols' flag. I still am not sure 
what the purpose of this is. Is it to prevent illegal characters to be 
converted to a symbol. This could happen if the string contains 
numbers, or other characters not defined to be symbols. I noticed that 
the implementation for the skip flag is slightly different in the code 
for proteins versus that for DNA/RNA.

For proteins it looks like:

		if ( (skipFlag==NO) || (aminoAcid!=[BCAminoAcid undefined]) )
			[tempSequence addObject: aminoAcid];

For DNA/RNA it looks like:

		if ( aBase != [BCNucleotideDNA undefined] )
			[tempSequence addObject: aBase];
		else {
			if ( !skipFlag )
				[tempSequence addObject: [BCNucleotideDNA undefined]];


The protein adds the aminoAcid if skipFlag is NO, the DNA/RNA adds an 
undefined symbol. I guess we should settle on one, anyone has a 
preference?


thanks,

- Koen.




More information about the Biococoa-dev mailing list