[Biococoa-dev] I am watching you

Sat Dec 11 13:58:45 EST 2004

Hi Charles,

Welcome to the world of BioCocoa. I almost marked your mail as junk, 
because of the subject ;-)

Right now BioCocoa only has a few developers, so we can use all the 
help we can get. I guess developing for an open source project is 
similar to setting up an Xgrid project. Not all developers are working 
full time at the project, only when they have some cpu cycles left.

Peter Schols started BioCocoa a while ago as a framework to read and 
write various sequence formats, with an emphasis on phylogenetic 
formats, which is his field. I joined his project early this year and 
added some methods to read various protein formats. This is still the 
version that you can doenload from the website. Then in the summer John 
Timmer and Alex Griekspoor (mek from mekentosj) joined and the project 
started from scratch in the current setup. Peter was really busy, so it 
were basically the three of us that coded what is now in CVS.

> As a starter, I am humbly asking one of you, whenever he/she has time, 
> to summarize the different design options you had in the past or are 
> stille considering for the BCSequence object (from the archives, I 
> could only grab part of the debate).

There are two different opinions about the use of BCSequence. My own 
idea is that we should have only one BCSequence class that takes care 
of managing the BCSymbols in it. To identify the sequence, I proposed 
we should have a symbolset member, eg dnaSymbolSet, 
proteinStrictSymbolSet. These are similar to the Alphabets you find in 
BioPerl and BioJava. This way you only have to keep the sequence 
related code in one class, instead of every possible subclass with 
small variations. The other idea, which is favored by John and Alex, is 
to subclass BCSequence, and have only code that is sensible for the 
specific subclass in that class. Eg a protein would never need to 
calculate the GC content, or a DNA doesn't need a isoelectric point 
calculator. Both designs have their advantages and disadvantages, right 
now we came up with a compromise: we subclass BCSequence, but the 
subclasses only contain convenience methods that call wrapper objects 
(BCTools) to perform a specific action for that subclass.

>  I know this is quite a big question, but I don't ask for too many 
> details, just a quick overview of the different options and I think I 
> can fill in the blanks. Then a related question is: why do you need a 
> BCSequenceFactory, and not just use factory methods defined in the 
> BCSequence superclass (when unknown sequence type) or subclasses (when 
> known types). I should add that I have no intention to question any of 
> the design decisions ;-) , and don't want to revive any past debate, I 
> just want to be brought up to speed...

The idea of a factory class is to have all code that creates sequences 
in one central location, instead of spread out through various 
subclasses of BCSequence. It's just a way of factoring out code into 
smaller modules. The advantage is that when something changes/added in 
the way a sequence is created this only has to be done in one class 
(the factory class). This is also a well established design pattern, 
and used in many projects.

>
> Thanks to whoever answer those questions, and again, the BioCocoa 
> project is a great initiative, and it looks really promising :-)

I hope I answered you questions, feel free to ask more and hopefully 
add code in a short while.

cheers,

- Koen.