[Biococoa-dev] More on BCSymbolSets

Koen van der Drift kvddrift at earthlink.net
Thu Mar 3 21:56:05 EST 2005

On Mar 3, 2005, at 9:47 PM, Charles PARNOT wrote:

>> I think it would be a good idea if we allow the user to pass a 
>> symbolset, defining the type of sequence. In fact you not only make a 
>> filter for whatever string or array is supplied to create the 
>> sequence, but you also have immediately an identifier of the 
>> sequence.
> So we would need to provide an initializer with a symbolSet argument, 
> e.b. 'initWithSymbolArray:symbolSet'. OK, we agree :-)
> What do you mean an identifier?

I mean the sequence type.

> OK, we basically agree that sequence type as it is now is not super 
> useful, except as a shortcut for the sequence class.
> I still think that a BCSequenceType has a use. A symbolSet should not 
> be allowed to hold symbols of different types/classes. So symbolSet 
> would have a type.

This will be taken care of when the symbolset is created, see the 
BCSymbol class. The dnaSymbolSet only holds nucleotides, the 
proteinSymbolSet holds only amino acids.

> And a symbolSet should be allowed to be associated with a sequence 
> only if the right type.
> Instead of checking the class all the time, it is probably better to 
> use an enum like BCSequenceType.

This won't happen that much, maybe only during creation, so I don't 
think there will be much slowdown by calling the class instead of the 

>>> * Will all instances of one given sequence classalways have the same 
>>> sequenceType? e.g. all instances of BCDNASequence will be of type 
>>> 'BCDNASequence'.
>> Probably not. A BCSequenceDNA can have ambiguous symbols, but can 
>> also be strict. It can allow for gaps in an alignment, etc. By 
>> assigning it a sequence type, still doesn't tell anything about the 
>> possible symbols. Therefore a symbolset will be much more useful. 
>> Another thing that bugs me is that the sequence is BCSequenceDNA but 
>> the type is BCDNASequence. Very confusing :)
> I agree that symbolSets will be different for each instance. But the 
> sequenceType, if we keep it in addition of the symbolSet (for the 
> reason above), then it will be always the same for all instances of a 
> class.
> Regarding the naming conventions, BCSequenceDNA for the class, 
> BCDNASequence for the type, it is indeed quite confusing; how about 
> BCSequenceTypeDNA et al.?

*If* we decide to keep it, that would indeed be better, yes.

- Koen.

More information about the Biococoa-dev mailing list