[Biophp-dev] Serge's questions and comments

S Clark biophp-dev@bioinformatics.org
Wed, 30 Apr 2003 11:05:24 -0600


On Tuesday 29 April 2003 11:46 pm, Serge Gregorio wrote:
> Sean,
>
> I'm now registered as user "flipmozart" at bioinformatics.org.
> Kindly add me to the list of developers.

I do'd it.  

> On the Parser class issue, I haven't really waded deep into
> the code.  However, by just reading the discussion so far,
> it *SEEMS* the Parser class isn't far off from the IO class,
> and may only differ in name and in scope/granularity.
>
> May I suggest renaming it from "Parser class" to some other
> name that is more "data-centric"?  I'll explain why later.

perhaps "file_parser", or maybe "fileIO" since it's focussed on reading data
formats that are commonly found in files (though the parser may actually
be reading a "stream" or network socket or strings, the techniques
involved are all basically the same.  Reading from "database servers"
should probably be a different module, as the techniques are somewhat
different for that (but similar to the file parser module, could be structured
with lower-level modules specific to reading from MySQL, Postgresql, acedb,
etc.)

> Btw, I have a question for Sean.  I've written an Amateur Gene
> Finder demo script at:
>
>    http://genephp.sourceforge.net/genefind_par.html
>
> I'd like to make the protein sequences (e.g. GAVLIFYW) "clickable",
> so that it forwards the string as a query to protein database sites like
> PROSITE, get info on it, an display this info in another PHP page WITHOUT
> ever leaving the SF site.
>
> My practical question is: can you eSearch/eUtils do this (or be
> easily modified to do this)?

With prosite SPECIFICALLY...the answer is "yes and no"...

The EUtils are specifically the Entrez database interfaces at
NCBI.  Prosite doesn't seem to offer an XML data format, so 
a new parsing module would need to be written for the format
(not that big a deal - and it'll be handy to have as a module
for the regular GenePHP parser as well) and a module written
to handle the specific format of the prosite queries...BUT that 
shouldn't be too difficult to arrange.

(short answer - eutils is limited to NCBI's databases, but writing
a new module that does the same thing for Prosite [and other sites]
is planned and shouldn't be too difficult)

I haven't actually tried sending a protein sequence or sequence fragment yet
as a query to the  "protein" database available through EUtils, but I suspect
there is a way to make it work (specifying field=sequence, or some equivalent
in the query, perhaps?  Looking at the "fetch" record for proteins it looks
like the field may be named "sequence" or "GBSeq_sequence".).

> However, I see nothing wrong with the project getting known as
> giving special emphasis on DNA and proteins, and being under the umbrella
> of a larger BioPHP project, hosted/administered here by you.

Of course, none of that is mandatory by any means - it just seemed
like a natural way to classify everything.  Nothing says we can't just
call the whole thing "BioGenePHP" to refer to all of the development
that is done by us (while "BioPHP" collectively would include work done by
other groups, e.g. the ones at BioPHP.org, if/when they get around to putting
their project online).

> Lastly, to show that this is a team effort, I've revised the SF
> site to make greater use of the word "PROPOSED" as in "Proposed GenePHP
> Bioinformatics Concept Map".  I do not want to convey the impression that
> they are FINAL or CLOSED to discussion.

Good thinking - we're still early enough in that it's hard to predict how much
development will go where, or how wide it may spread (depending on my
educational near-future and new job prospects, I could imagine myself doing
some work with "BioGISPHP", so as to combine GIS mapping with sequence
data for a particular microorganism to trace and predict the geographical
spread of it...after a bit of study first, though.)  If I or someone else
did something like that, we'll have to figure out where to fit it in with
the rest of the scheme...

We also should decide where the ESearch and related utilities should fit in
and go ahead and move them over into the main development tree there.