[Biophp-dev] Export module design

S Clark biophp-dev@bioinformatics.org
Thu, 10 Jul 2003 00:14:36 -0600


'seqIOexport' is now checked into CVS.  If you can't find it on sourceforge
(they seem to still having problems getting CVS working properly) you can get
it at bioinformatics.org.

At the moment, there is only a FASTA export module, but others will follow
shortly.

Here's the way I've got it designed currently - obviously this is subject to 
change.  It's the first-draft design, so comments and suggestions are
encouraged.  It's late - so if any part of this makes no sense, that's 
what I get for posting while I'm tired.  Say something and I'll try to be more
coherent...

seqIOexport has six main public methods:

setOutput($output) - can accept a filehandle or filename/URL.  If
not passed a filehandle, it assumes what it got is supposed to be a filename
URL, and will also attempt to automatically guess the format based on the name
(e.g. setOutput("blahblahblah.fasta"); will automatically guess fasta format.)

setFormat($format) - passed a string indicating a format, this sets the format
and instantiates the appropriate filetype exporter if possible.  Obviously
optional in many cases, assuming a "guessable" name/URL was passed to
setOutput.

setParam($parameter,$value) - for export filetypes that have optional
parameters.  When called, this stores the parameter and setting in an array
to be passed to the filetype exporter at export time.

addSequence($sequence) - sequence can be a seq object (TODO - this WILL also
support being passed a seq_align or other 'sequence group/list' object as well
later), an array of sequence information("id"=>"AF12345",
"sequence"=>"AAGGCCTT", etc.), or a simple string containing a sequence
(in which case the sequence will be given a 'default' id, with an incrementing
number at the end to keep the id's unique).  It stores this information on a
"stack" to be passed to the individual exporter at export time.

doExport() - passes any specified optional parameters, to the filetype 
exporter, then the stack of sequence information for the exporter to write.

clearSequences() - clears the sequence information stack (and resets the 
"number of unlabelled sequences so far" count)

export modules go in the "exporters" subdirectory, named
"export_(format).inc.php".  While they're being designed with use
in the seqIOexport() class in mind, they should all be usable on
their own (e.g. if someone wants to write a simple 'convert one format
to another' script the filetype exporters should be usable independently for
this.)

Export modules need to implement 3 methods:

setOutput($output) - where "$output" is either null, a filename/URL, or an 
already-open filehandle.  Passing "null" indicates that the exporter should 
simply return text rather than writing to a file.

setParam($parameter,$value) - in MOST cases this method will simply return
"false" - this is used to set additional file parameters, where applicable
(e.g. 'fastDNAml' options in Phylip files).  Returns 'true' if it uses any
of the information passed.

writeSequences($sequencesArray) - gets passed a 'flat' array of sequence 
information (by reference, to conserve memory), e.g.
 "id"=>"AF12345","sequence"=>"AAGGCCTT","organism"=>"Bacillus natto" (etc.).
It is up to the module to recognize which of the keys it can use.
Should return a 'non-false' response if it writes successfully (I suggest
returning the text that was generated by the exporter, since that makes
it very simple to handle 'generate data in that format but don't actually 
write to a file' requests (setOutput("");), though for large amounts of
data it might be better to just return 'true' or the number of records
written unless explicitly asked for the text). 

Comments?