[Biococoa-dev] I am watching you
Koen van der Drift
kvddrift at earthlink.net
Sat Dec 11 13:58:45 EST 2004
Welcome to the world of BioCocoa. I almost marked your mail as junk,
because of the subject ;-)
Right now BioCocoa only has a few developers, so we can use all the
help we can get. I guess developing for an open source project is
similar to setting up an Xgrid project. Not all developers are working
full time at the project, only when they have some cpu cycles left.
Peter Schols started BioCocoa a while ago as a framework to read and
write various sequence formats, with an emphasis on phylogenetic
formats, which is his field. I joined his project early this year and
added some methods to read various protein formats. This is still the
version that you can doenload from the website. Then in the summer John
Timmer and Alex Griekspoor (mek from mekentosj) joined and the project
started from scratch in the current setup. Peter was really busy, so it
were basically the three of us that coded what is now in CVS.
> As a starter, I am humbly asking one of you, whenever he/she has time,
> to summarize the different design options you had in the past or are
> stille considering for the BCSequence object (from the archives, I
> could only grab part of the debate).
There are two different opinions about the use of BCSequence. My own
idea is that we should have only one BCSequence class that takes care
of managing the BCSymbols in it. To identify the sequence, I proposed
we should have a symbolset member, eg dnaSymbolSet,
proteinStrictSymbolSet. These are similar to the Alphabets you find in
BioPerl and BioJava. This way you only have to keep the sequence
related code in one class, instead of every possible subclass with
small variations. The other idea, which is favored by John and Alex, is
to subclass BCSequence, and have only code that is sensible for the
specific subclass in that class. Eg a protein would never need to
calculate the GC content, or a DNA doesn't need a isoelectric point
calculator. Both designs have their advantages and disadvantages, right
now we came up with a compromise: we subclass BCSequence, but the
subclasses only contain convenience methods that call wrapper objects
(BCTools) to perform a specific action for that subclass.
> I know this is quite a big question, but I don't ask for too many
> details, just a quick overview of the different options and I think I
> can fill in the blanks. Then a related question is: why do you need a
> BCSequenceFactory, and not just use factory methods defined in the
> BCSequence superclass (when unknown sequence type) or subclasses (when
> known types). I should add that I have no intention to question any of
> the design decisions ;-) , and don't want to revive any past debate, I
> just want to be brought up to speed...
The idea of a factory class is to have all code that creates sequences
in one central location, instead of spread out through various
subclasses of BCSequence. It's just a way of factoring out code into
smaller modules. The advantage is that when something changes/added in
the way a sequence is created this only has to be done in one class
(the factory class). This is also a well established design pattern,
and used in many projects.
> Thanks to whoever answer those questions, and again, the BioCocoa
> project is a great initiative, and it looks really promising :-)
I hope I answered you questions, feel free to ask more and hopefully
add code in a short while.
More information about the Biococoa-dev