On Tuesday 29 April 2003 11:46 pm, Serge Gregorio wrote: > Sean, > > I'm now registered as user "flipmozart" at bioinformatics.org. > Kindly add me to the list of developers. I do'd it. > On the Parser class issue, I haven't really waded deep into > the code. However, by just reading the discussion so far, > it *SEEMS* the Parser class isn't far off from the IO class, > and may only differ in name and in scope/granularity. > > May I suggest renaming it from "Parser class" to some other > name that is more "data-centric"? I'll explain why later. perhaps "file_parser", or maybe "fileIO" since it's focussed on reading data formats that are commonly found in files (though the parser may actually be reading a "stream" or network socket or strings, the techniques involved are all basically the same. Reading from "database servers" should probably be a different module, as the techniques are somewhat different for that (but similar to the file parser module, could be structured with lower-level modules specific to reading from MySQL, Postgresql, acedb, etc.) > Btw, I have a question for Sean. I've written an Amateur Gene > Finder demo script at: > > http://genephp.sourceforge.net/genefind_par.html > > I'd like to make the protein sequences (e.g. GAVLIFYW) "clickable", > so that it forwards the string as a query to protein database sites like > PROSITE, get info on it, an display this info in another PHP page WITHOUT > ever leaving the SF site. > > My practical question is: can you eSearch/eUtils do this (or be > easily modified to do this)? With prosite SPECIFICALLY...the answer is "yes and no"... The EUtils are specifically the Entrez database interfaces at NCBI. Prosite doesn't seem to offer an XML data format, so a new parsing module would need to be written for the format (not that big a deal - and it'll be handy to have as a module for the regular GenePHP parser as well) and a module written to handle the specific format of the prosite queries...BUT that shouldn't be too difficult to arrange. (short answer - eutils is limited to NCBI's databases, but writing a new module that does the same thing for Prosite [and other sites] is planned and shouldn't be too difficult) I haven't actually tried sending a protein sequence or sequence fragment yet as a query to the "protein" database available through EUtils, but I suspect there is a way to make it work (specifying field=sequence, or some equivalent in the query, perhaps? Looking at the "fetch" record for proteins it looks like the field may be named "sequence" or "GBSeq_sequence".). > However, I see nothing wrong with the project getting known as > giving special emphasis on DNA and proteins, and being under the umbrella > of a larger BioPHP project, hosted/administered here by you. Of course, none of that is mandatory by any means - it just seemed like a natural way to classify everything. Nothing says we can't just call the whole thing "BioGenePHP" to refer to all of the development that is done by us (while "BioPHP" collectively would include work done by other groups, e.g. the ones at BioPHP.org, if/when they get around to putting their project online). > Lastly, to show that this is a team effort, I've revised the SF > site to make greater use of the word "PROPOSED" as in "Proposed GenePHP > Bioinformatics Concept Map". I do not want to convey the impression that > they are FINAL or CLOSED to discussion. Good thinking - we're still early enough in that it's hard to predict how much development will go where, or how wide it may spread (depending on my educational near-future and new job prospects, I could imagine myself doing some work with "BioGISPHP", so as to combine GIS mapping with sequence data for a particular microorganism to trace and predict the geographical spread of it...after a bit of study first, though.) If I or someone else did something like that, we'll have to figure out where to fit it in with the rest of the scheme... We also should decide where the ESearch and related utilities should fit in and go ahead and move them over into the main development tree there.