[Biophp-dev] Genephp and Biophp

Serge Gregorio biophp-dev@bioinformatics.org
Tue, 29 Apr 2003 16:20:46 +0800

Hello all!

>SC: It just seems we have two different approaches going on at the same time - 
>with GenePHP it has more of a "Windows" type approach of tying things together 
>into larger sets of easy-to-use segments ... while I have been going for more 
>of a "Unix" type of approach of lots of small, independent modules ... 

I think the analogy is not accurate as Windows is more than just large sets of easy to use segments.  It is a CLOSED system.  GenePHP and its code is very much "open". Having said that, I suggest we avoid making broad, sweeping comments like the above, and be more specific like below.  Using labels like "monolithic" is acceptable, but controversial labels like "Windows" (synonymous to "Nazi" for some) must be avoided. We dont' want to drive away Windows developers from the project, do we?

Finally, "easy-to-use" shouldn't be a bad thing.  PHP, Mandrake Linux, and a host of other open source products take pride in being "easy-to-use". 

My guess is you're getting at the issue of "granularity" of modules.  You find GenePHP to have a "large granularity" (like rocks) instead of a "small/fine granularity" (like sand).  Did I read you right?

>SC: If, instead of automatically forcing the data directly into a >seq object, the parser returned an array ("label"=>"Some_Sequence", "sequence" => "AAGGCCTT" - 
>or whatever "standard" keys we decide on), and the seq object >instead added a method to IMPORT its data via this array  format, >there wouldn't need to be much change, but a 
>parser could STILL stand all on its own, rather than also needing to >be designed to know about the Parser object and the Seq object.

From what I understand, Sean seems to be suggesting this:

      outputs                          outputs
 parser ---> string, array ---> importer ---> Seq object, 
                                              other GenePHP objects

There is value to that scheme from a team development point of view (i.e. Sean need not understand or use GenePHP objects).

However, the alternative scheme (shown below) which Sean labels as "tight-coupling" is not inherently bad or flawed. 

 parser ---> object 

If it were, then BioPerl and other OO BioXXX are flawed because their parsers return objects, instead of strings or arrays.  For instance, BioPerl's SeqIO parser/writer returns a data stream object.

If, for argument's sake, we were to follow Sean's logic to the extreme, and limit our parsers (and other functions) to return ONLY strings or arrays, then we would end up with a procedural and not an object-oriented BioPHP.  That is not bad by itself, but such a construct should not claim to be OO.

And so the issue boils down to whether people agree to the GenePHP class framework in its present form, or they would like to revise it, or propose another one altogether. I hope it's not because "someone had an irrational hatred of the GenePHP seq class" or a fear that "everything one has done disappears anyway".

> paragraph.  Aren't 'standard arrays' and Genephp objects the same?  > I mean the classes that Serge proposes are open to discussion (I 
> hope) 

Yes, they are, you have my word on that.  =)  

> NS: and classes are nothing more than arrays to which you can add 
> functions.  So, aren't we talking about the same thing?

Well, yes, that's one way of looking at it.

>SC: I'm certain a "middle ground" can be worked out, but there isn't
>much point in immediately "throwing together" both sets of code as >they are now - the two sets of code are CURRENTLY approaching from >different (somewhat incompatible) philosophies.

I guess no one is opposed to the idea of merging the code base.  The disagreement lies in WHEN this should take place.  Nico wants it early, Sean wants it later. I have no problems with either, though I'm slightly leaning towards one camp.

In my previous posting, I've already clarified the "supporting" role of the SF CVS vis-a-vis bioinformatics.org CVS in the meantime, so there shouldn't be any confusion about that.

>SC: The solution/"middle ground" would mean GenePHP being somewhat >redesigned to allow for more "independent" modules to be a part of >it (e.g. to pass data more often between modules/classes as standard >arrays rather than GenePHP objects, so as to make it easier to "drop >in" new components) but I'm reluctant to ask that.  Not because I >don't think it'd be a good idea but because I hate the idea of >making demands that other independent volunteer programmers rewrite >their code to accomodate me. Maybe I'm just insecure that way :-)

Well, you shouldn't be reluctant to ask.  And if nobody listens, you can always rewrite the code yourself.  That's why it's called open source. =)

> SC: Bear in mind also that my impression comes mostly from the >current design of the GenePHP parser system, which is CURRENTLY very >co-dependent on three different sets of "custom" files (seq class, >parser class, and the sets of single-function parsers) - I looked at >these for comparison

I'm not sure I get what you meant by the word "custom".  The Seq class, Parser class are "custom" for bioinformatics. I certainly wouldn't use them for space flight simulations or anything like that. =)

But if "custom" means "it can't be used by other bioinformatics developers or applications", well, that certainly isn't the intention.

> SC: And If I just sound "cranky", well, it's been a very stressful month, but > I reiterate my permission in this case to make snide comments at me until> I cheer up...

Oh, that explains.  I've attached some stuff here to cheer you up.  =)

> NS: It will be easiest if we all decide on the underlying datastructures 
> we are going to use (and Serge is doing a great job there, even though I
> don't agree with everything he suggests).  

Thanks for that.  I don't expect everyone to agree with everything I suggest. The IO class isn't one of those "to-die-for" suggestions.  In fact, there is an alternative to it (BioPerl's), which I will describe/discuss more fully at the SF site. 

> SC: maybe the NEXT question is, aside from the parser design 
> discussion, what should we consider to be the level and type of 
> functionality that needs to be working for GenePHP/BioPHP's first 
> formal Alpha release? :-)

Oh alright, just so we're not confused.  Here's my proposal:  

When referring to code, I'll use the term BiogenePHP or BgPHP to refer to the whatever common code base we can come up with in the future.  In this context, BioPHP and GenePHP are "branches" or "flavors" or "distributions" of BgPHP.  

As an analogy, BiogenePHP is Linux, BioPHP is RedHat Linux, while GenePHP is Mandrake Linux. The success of one should be the success of all. (See Fig. 1 way below.)

Technically, GenePHP 1.0 is a beta release.  However, since I've prematurely announced it as 1.0 without the beta, the next release will be called 1.1, and will be out in two weeks' time.  

(May I suggest that your codes to be integrated into GenePHP in 
its release 2.0?  I've downloaded your codes "manually" as I'm 
not registered as a developer here yet.  Sean, I think I've
registered as dgregorio or d.gregorio but never mind this. 
I'll scrap it and re-enter as flipmozart, okay?)
As projects, they are two separate projects with two separate websites, hosts (SF and bioinfo.org), organization, etc. and may 
be run in whatever manner as their members see fit.

(One can be a "democratic state", another can be a "fascist state", etc. LOL)

However, for the sake of unity and practicality, I am putting the CVS and mailing lists of GenePHP under the auspices of BioPHP project.  Membership may overlap which isn't unheard of, and isn't necessarily a bad thing.   

If you think the term BiogenePHP sucks, well, tell me if you think of any other catchy name.  As I've explained to Nico earlier, I avoided using the term BioPHP in deference to the Norway group which owns www.biophp.org. 

But if it's alright with Sean, we can call the entire thing (common code base) as BioPHP, and Sean's branch as umm.. SeanPHP?  LOL  =)  (See Fig 2.)   

Should the merger take place, then we'll fold everything up under one banner. (See figure 3.)

Figure 1. 
        BiogenePHP *
     |               |
  BioPHP         GenePHP

  * Or whatever alternative name you can think of.
Figure 2.       

     |               |
  SeanPHP*        GenePHP
  * Or whatever alternative name you can think of.

Figure 3. 

          BiogenePHP or BioPHP **

  ** With headquarters located at One BiogenePHP Way, 
     Shuttle, Worseshington.  =)

Again, this is *STILL* a proposal I've come up with to reflect current realities.  

Having said that, coming up with some common code base is a shared objective, and should be a priority.



Need a new email address that people can remember
Check out the new EudoraMail at