[Biococoa-dev] I am watching you

Alexander Griekspoor mek at mekentosj.com
Sun Dec 12 18:55:33 EST 2004

I promised Charles to give a state-of-the-union on the list, but Koen 
did a very nice job in summarizing the current status!
I just wanted to add the current focus and problems. As you read much 
of the archives of the last two months, you have seen the discussion 
about subclassing or not BCSequence, nicely summarized below. This will 
probably stay an issue of debate for a while, and is something to keep 
in the back of our heads a bit, we simply have to see which option in 
the end fits our framework best.

At the moment the main focus in on the sequence IO, reading and writing 
the different file formats into BCSequence objects. Koen has done quite 
some work on this and the basics are working really well. More formats 
will come easy now, and also Peter has added a number of new formats to 
the 1.6 (original) version which he will port to the new framework as 
well. What we have to do now is design a the annotations/features part 
of the sequences, as well as the grouping of sequences into bundles.
The basic idea would be to have a hierarchy like this:

BCSequence - the basic sequence object

BCAnnotatedSequence - a wrapper object containing a dictionary of 
BCFeatures , a BCSequence, a dictionary of BCAnnotations, or a subclass 
of BCSequence which is the second possibility

BCAnnotatedSequenceBundle - a bundle of BCAnnotatedSequences, including 

The question now is how to implement this system... All ideas, comments 
and suggestions are more than welcome!
You see, lot's of work to do ;-)

Koen did a really nice
Op 11-dec-04 om 19:58 heeft Koen van der Drift het volgende geschreven:

> Hi Charles,
> Welcome to the world of BioCocoa. I almost marked your mail as junk, 
> because of the subject ;-)
> Right now BioCocoa only has a few developers, so we can use all the 
> help we can get. I guess developing for an open source project is 
> similar to setting up an Xgrid project. Not all developers are working 
> full time at the project, only when they have some cpu cycles left.
> Peter Schols started BioCocoa a while ago as a framework to read and 
> write various sequence formats, with an emphasis on phylogenetic 
> formats, which is his field. I joined his project early this year and 
> added some methods to read various protein formats. This is still the 
> version that you can doenload from the website. Then in the summer 
> John Timmer and Alex Griekspoor (mek from mekentosj) joined and the 
> project started from scratch in the current setup. Peter was really 
> busy, so it were basically the three of us that coded what is now in 
> CVS.
>> As a starter, I am humbly asking one of you, whenever he/she has 
>> time, to summarize the different design options you had in the past 
>> or are stille considering for the BCSequence object (from the 
>> archives, I could only grab part of the debate).
> There are two different opinions about the use of BCSequence. My own 
> idea is that we should have only one BCSequence class that takes care 
> of managing the BCSymbols in it. To identify the sequence, I proposed 
> we should have a symbolset member, eg dnaSymbolSet, 
> proteinStrictSymbolSet. These are similar to the Alphabets you find in 
> BioPerl and BioJava. This way you only have to keep the sequence 
> related code in one class, instead of every possible subclass with 
> small variations. The other idea, which is favored by John and Alex, 
> is to subclass BCSequence, and have only code that is sensible for the 
> specific subclass in that class. Eg a protein would never need to 
> calculate the GC content, or a DNA doesn't need a isoelectric point 
> calculator. Both designs have their advantages and disadvantages, 
> right now we came up with a compromise: we subclass BCSequence, but 
> the subclasses only contain convenience methods that call wrapper 
> objects (BCTools) to perform a specific action for that subclass.
>>  I know this is quite a big question, but I don't ask for too many 
>> details, just a quick overview of the different options and I think I 
>> can fill in the blanks. Then a related question is: why do you need a 
>> BCSequenceFactory, and not just use factory methods defined in the 
>> BCSequence superclass (when unknown sequence type) or subclasses 
>> (when known types). I should add that I have no intention to question 
>> any of the design decisions ;-) , and don't want to revive any past 
>> debate, I just want to be brought up to speed...
> The idea of a factory class is to have all code that creates sequences 
> in one central location, instead of spread out through various 
> subclasses of BCSequence. It's just a way of factoring out code into 
> smaller modules. The advantage is that when something changes/added in 
> the way a sequence is created this only has to be done in one class 
> (the factory class). This is also a well established design pattern, 
> and used in many projects.
>> Thanks to whoever answer those questions, and again, the BioCocoa 
>> project is a great initiative, and it looks really promising :-)
> I hope I answered you questions, feel free to ask more and hopefully 
> add code in a short while.
> cheers,
> - Koen.
> _______________________________________________
> Biococoa-dev mailing list
> Biococoa-dev at bioinformatics.org
> https://bioinformatics.org/mailman/listinfo/biococoa-dev
                     ** Alexander Griekspoor **
               The Netherlands Cancer Institute
               Department of Tumorbiology (H4)
          Plesmanlaan 121, 1066 CX, Amsterdam
                     Tel:  + 31 20 - 512 2023
                     Fax:  + 31 20 - 512 2029
                     AIM: mekentosj at mac.com
                     E-mail: a.griekspoor at nki.nl
                 Web: http://www.mekentosj.com

Windows is a 32-bit patch to a 16-bit shell for an 8-bit
operating system, written for a 4-bit processor by a 2-
bit company without 1 bit of sense.


More information about the Biococoa-dev mailing list