[Biophp-dev] The humunguous IO class etc

Serge Gregorio biophp-dev@bioinformatics.org
Mon, 28 Apr 2003 14:37:29 +0800


Wonderful to hear you're making good progress!  I've been
doing a lot of thinking about the design and organization 
of Gene/BioPHP and here's what I've come up with:


I've also attached a code snippet from GenePHP 2.0's IO
class, in which the "parser class" you were mentioning 
earlier are its methods.  The general idea here is that
people only use ONE AND ONLY ONE class to read/write 
data from WHATEVER SOURCE (console, variable, file, SQL

Also, with this, I'd like to push your idea of auto-
detection of file format one step further.  Not only 
will the IO class (thru its autodetect_IO() function)
detect the format used (e.g. GenBank, Swissprot), but
it will also detect WHAT KIND OF DATA (e.g. genephp 
class) it contains or represents.

>It should actually not be very hard to write these 
>things back to the appropriate file formats.

The IO class (or DataIO class if you prefer) should be
able to handle this also.  Just pass "W" (for "WRITE")
as the third argument.  

I've also renamed the Seq class as XNA (or Xna) class,
which represents a DNA, RNA, cDNA, mRNA sequence, etc.
The Seq class will become a parent class of both Xna 
and Protein, and will contain minimal attributes and
methods common to both sequences (e.g. id, seqlen(), etc.).
In practical use, a Seq object should map to a sequence in 
FASTA format, while an Xna object maps to GenBank nucleic
sequence, a Protein object maps to a Swissprot or PDB 
protein sequence, etc.

I'll also be introducing new classes like Enzyme, Ligand,
etc. to correspond to the CONCEPT MAP and BIO DATABASES
shown in the diagram at:


What you guys think?  =)



Nico wrote:

On Sun, 27 Apr 2003 21:41:04  
 nicos wrote:
>Just finished uploading the code I had from Serge with my additions to
>cvs.  I did put it a directory called 'genephp' untill we all agree what
>should be in the 'real' biophp distribution.  I also cleaned up quite a
>few empty directories which I assumed were put there by accident (???).  
>The parse class worked nicely for me.  There are now four parsers: for
>genbank, swissprot (both from Serge), and for Fasta and pDRAW files
>(those were simple...).  There are two subdirectories in genephp
>containing testdata and testscripts.  The scripts should run if you place 
>the whole structure under your webserver.
>The idea is to first create a parse object:
>$myParser=new parse($data);
>where data can be a file or a string containing the relevant data
>yields the first seqobject.
>moves to the next (if available)
>goes one back
>And test for 
>$myParser->eof (true when there are no more records)
>$myParser->bof (true when at start of data)
>See the script testfiles/test.php for the thing in action.
>It should actually not be very hard to write these things back to the
>appropriate file formats.
>Biophp-dev mailing list

Need a new email address that people can remember
Check out the new EudoraMail at