[Biophp-dev] Quick update - reply to Greg, etc.

biophp-dev@bioinformatics.org biophp-dev@bioinformatics.org
Sat, 7 Jun 2003 22:31:06 -0600

I'm quite a long way from my system on a borrowed computer at the moment, but wanted to reply briefly:

Greg: I've got no complaints if you want to further "PEAR-ify" the existing EUtils code.  I've worked a little on that but haven't gotten around to finishing the necessary tweaks to get the last of it finished.

Handling of error messages is a "to-be-added" feature - I figured I'd get the modules to a "usable" level and then go back and add functionality a bit at a time after that - if the new EUtils site you mention has decent documentation on the potential errors and such, it shouldn't take too much to add handling it.

EFetch hasn't gotten started yet, as each type of record will require a different parser, it appears (unlike ESummary and ESearch which use reply formats that can be parsed reasonably by a single parser), but Pubmed is at the top of my list with nucleotide and protein sequences for the first EFetch record types to write support for.  At the moment the NCBI interface that I'm working on is for NCBI's online BLAST server.

The reason for the "GenePHP/BioPHP" split is a coincidence - it seems Serge had  independently started a "Bioinformatics for PHP" project about a week or two before I got some code online myself, and we didn't discover that we both had projects going at first.  Merging is ongoing - the parsers that I had written myself have been re-done and fit into the data import interface that Nico designed.  Once the GenePHP "seq" object has interface methods, I can adjust my original version of the "nuc_sequence" object to be compatible with it and we can merge the features of the two.  Similarly, I imagine we will be merging the functionality of my own "seq_list" object with Serge's seqalign object at some point.

P.S. Lucky bum, I wish I could get to BOSC, but there's just no way I could afford it at the moment - by all means, in my opinion it would be great to have mention of BioPHP.  In addition to bringing more input to the project, I've got a selfish desire for constructive criticism of my own code so I can improve my skills...

On Zend accelerator, etc.:
My PERSONAL opinion on incorporation of more computationally intensive capabilities into BioPHP is that it would be best done as either frontends to existing (freely available) compiled programs (e.g. clustalx) or as Java modules accessed via PHP's optional-but-standard Java interface capabilities, but I'm certainly not going to discourage anyone from looking into other options as well.  I just think those two options are the most "portable".

One of the things I'm going to try to get some work done on while I'm on the road is the export module.  My design premise at the moment is to have the exporter keep a "stack" of arrays of relevant data (sequence name, sequence, etc.), and functions for exporting whatever sequences are in the stack out to the relevant formats.  I'm going to try to stick to the "load the specific-file-format modules only as needed" method that Nico came up with for the importer (Parser object).  The exporter will accept either an array of sequence data (i.e. "id"=>"AB12345","sequence"=>"AAGGCCTT") or an actual seq object from which it will extract the data - I'd like to implement some sort of "as_array()" method in the seq object to export this sort of data if nobody objects.

Anyway, I'm on the road and after tomorrow won't likely have internet access again for another week or so, but I'll check in when I can.  Hopefully I'll have a variety of updates to commit at that point.

Thanks, everyone!