[Biococoa-dev] More on BCSymbolSets

John Timmer jtimmer at bellatlantic.net
Sun Feb 27 22:26:17 EST 2005

Okay, looking at things, we definitely have to try to make things a bit more
consistent in terms of the init methods.  I think they're mostly holdovers
from before amino acids had ambiguous members.

I think the idea of using a symbol set to limit the possible options for
initializing a sequence is a good one, provided we make the process very
streamlined.  The class itself looks to be good in that regard.

I've got a couple of ideas on the implementation.  One idea that suggests
itself is to have a BCSequencType variable for the symbol set - that way the
sequence being initialized could pick up its sequence type from the set it
gets passed during initialization.

The other thing I'd do is take the methods like "baseForSymbol" and
"aaForSymbol" and formalize them to be a single selector, like
"symbolForCharacter".  That way, you could call the same selector on any
class.  This would allow you to make code that looked something like this
(given passedString and passedSet as the arguments):
TheClass = [passedSet anyObject];
TheChar = [passedString characterAtIndex: loopCounter];
TheSymbol = [theClass symbolForCharacter: theChar];
If ( [passedSet containsObject: theSymbol] )
    // add it to our sequence

If we're making symbol sets this central to sequence creation, though, I'd
make a lot of combinations, rather than the two we have for each type.  We
don't want any of the commonly used sets more than a single call away.
Basically, I'd do strict, ambiguous, those with gap, those with undefined,
those with both, etc.  We may also want to make the standard ones

This sound like the sort of thing you were looking to do?

> Again I was looking at the BCSymbolSet code to implement it more in the
> BCSequence code. However with the new BCSequence class structure in
> place I am not so sure yet how to do this. For instance, we have the
> following method in each subclass:
> - (id) initWithString:(NSString *)entry
> skippingUnknownSymbols:(BOOL)skipFlag;
> I guess these are intended to be the designated initializer, although
> they have not been labeled as such in all classes. Now in BCSymbolSet
> we have the following (eg for DNA):
> dnaStrictSymbolSet (for C G T A) and dnaSymbolSet (for all possible
> nucleotides, including the ambiguous ones).
> Similar symbolsets are available for the other sequence types. Both
> symbolsets are possible in the method above, the skipFlag is not
> related to either symbolset. So what I can do is, is to test
> immediately for ambiguous symbols when creating the sequence (using
> containsAmbiguousSymbols), and based on that set the appropriate
> symbolset. Or even, to avoid a double iteration, test immediately for
> isCompoundSymbol when each symbol is added.
> I think this code should only go in the designated initializer, because
> that should be called by all other initializers. Would this be a
> reasonable approach?
> Then of course we have the 'unknown symbols' flag. I still am not sure
> what the purpose of this is. Is it to prevent illegal characters to be
> converted to a symbol. This could happen if the string contains
> numbers, or other characters not defined to be symbols. I noticed that
> the implementation for the skip flag is slightly different in the code
> for proteins versus that for DNA/RNA.
> For proteins it looks like:
> if ( (skipFlag==NO) || (aminoAcid!=[BCAminoAcid undefined]) )
> [tempSequence addObject: aminoAcid];
> For DNA/RNA it looks like:
> if ( aBase != [BCNucleotideDNA undefined] )
> [tempSequence addObject: aBase];
> else {
> if ( !skipFlag )
> [tempSequence addObject: [BCNucleotideDNA undefined]];
> The protein adds the aminoAcid if skipFlag is NO, the DNA/RNA adds an
> undefined symbol. I guess we should settle on one, anyone has a
> preference?
> thanks,
> - Koen.
> _______________________________________________
> Biococoa-dev mailing list
> Biococoa-dev at bioinformatics.org
> https://bioinformatics.org/mailman/listinfo/biococoa-dev

This mind intentionally left blank

More information about the Biococoa-dev mailing list