[Biococoa-dev] More on BCSymbolSets

Koen van der Drift kvddrift at earthlink.net
Mon Feb 28 20:03:40 EST 2005

On Feb 28, 2005, at 6:34 PM, Charles PARNOT wrote:

> Anyway, the question at this point is: what do we want to do with 
> symbolSet? If they are just a way to provide a refinement on the 
> sequenceType, they we may not need a full class, but just an enum. And 
> if we don't enforce the sequence contents to be consistent with the 
> symbolSet, then it is useless.

The idea of the symbolSet originates from a similar approach in BioPerl 
and BioJava, where they are called 'Alphabets'. Basically they can be 
used in cases when a users add a 'T' to a sequence, and wants to be 
sure they are a thymidine in DNA or a treonine in a protein. See also 
http://www.biojava.org/tutorials/chap1.html for more background. 
Although we all agree that the BioJava approach is cumbersome, I still 
like the idea of using a symbolset to define which symbols are allowed 
in a sequence. So it is not neccesarily a sequence identifier, but more 
a filter which defines which symbols are allowed in a specific type of 
sequence. Another possible reason at that time was that the symbolset 
could act as a sequece identifier, and thereby removing the need to 
subclass BCSequence. But that idea was not much appreciated here ;-)

> So, what do you think symbolSet should be used for? The way I see it 
> now is as a filter to restrict the symbols used in a given sequence. 
> In fact, the more I think about it, the more 'filtering' seems like 
> what it should do. And if we don't want any restriction, then one can 
> always create very broad symbol sets.

Exactly, see my point above.

> I don't know what Koen had in mind when creating the symbol set class, 
> because I see a 'complementSet' method there.

This was actualy introduced by Alex. He wrote the interface, I filled 
in some of the implementation. And indeed I had no idea what to do with 
complementSet and a few other methods, so I have left those empty :-)

- Koen.

ps you guys are going *way* too fast with all those emails. I only have 
a limited time each day to read them, understand them and possibly 
reply to them. Sorry if I don't address all issues :(

More information about the Biococoa-dev mailing list