[Biophp-dev] seqdb and seq classes in genephp

biophp-dev@bioinformatics.org biophp-dev@bioinformatics.org
Thu, 24 Apr 2003 21:06:44 PST

I have been looking a bit more in detail to Serge's seqdb and seq classes 
and this seems a good place to report on that (hope you don't mind
Serge!). There is a lot of good stuff there, but I think it needs a bit
of restructuring to make it more widely useful.

First, I don't understand the seqdb class very well.  I think that the
idea is to make a local, flat field database (as well as SQL?) that holds 
all the sequence data.  Seems a good idea, but it will be important to
split out the functionality into small chunks so that those smaller units 
can be reused by others for purposes we can not even begin to imagine.

Specifically, it is a big hurdle for me that a seq object currently can
only be created with the seqdb->fetch() function (which does the parsing
of a sequence file).  It seems more logical to me to have a seq
contructor that can take a file (or maybe a string), possible with
pointers to beginning and end of the desired sequence, and initially with 
something indicating what type of sequence file this is (which eventually 
should be replaced by 'autodiscovery')  that returns a seq object.  If I
am not mistaken such a constructor could be used by seqdb to to exactly
what it is doing now.

Also, the function writing the seq object to a SQL database should not be 
part of the parser function (I want to parse, but write to a SQL database 
with completely different structure!).  Writing (and reading) seq objects 
could be functions of the seq class, or - preferably - be in a class of
themselves.  In any case this functionality should be separated in
smaller chunks.

Looking at the code, I am not completely sure why there is a parse_int
file in the main body and a similar one within the seqdb class.  Is the
first one a remnant?

Serge, does it make sense to move the parsers to the seq class?