[Biophp-dev] Import modules structure and interrelation
Nico Stuurman
biophp-dev@bioinformatics.org
22 Mar 2004 08:11:27 -0800
Great work here!
This should go straight into
biophp/genephp/devdocs/parsers.txt
or something similar...
Best,
Nico
> Hash: SHA1
>
> Depending on how you count, there are either 2 or 3 'modules' that all
> go together to make up the import capabilities.
>
> seqIOImport itself
> the 'specific file type' parser module
> and (depending how you count)
> seqFactory.
>
> There's very little that needs to be done to seqIOimport (and nothing
> for seqFactory) to add a new import module.
>
> There are only a couple of requirements to fit a 'specific file type'
> module (such as the locuslink parser you are working on)
> 1.)the class needs to be able to accept a file name, a file handle, or
> "text" (which, I suppose, could actually be binary data) as an input source.
> (this is so that we can handle data from a network socket connected to
> a server, http:// or ftp:// URL's, files on the local hard drive, or data
> already read into memory from other sources)
>
> 2.)the class needs to accept the input source on instantiation
> (i.e. $parser = new locuslink_import($input_source) )
>
> 3.)the class SHOULD have a "setSource()" interface (which sets or
> changes the input source - seqIOimport doesn't currently use this, but
> it could in the future - i.e. for parsing multiple files in one shot).
>
> 4.)the class MUST have a fetchNext() interface, which returns an associative
> array with the next parsed sequence data. (e.g. 'id'=>'(name of
> sequence)','sequence'=>'ACGTACGTACGT...') ) We're using this type of
> 'generic' associative array as a format for exchange sequence information
> between modules so as to make the individual modules usable by themselves
> (i.e. you can use the fasta parser module all by itself [outside of
> seqIOimport] without knowing anything about the seq class format...)
>
> 5.)When imported into the BioPHP framework, it goes into the 'parsers'
> section, named (filetype).inc.php (e.g. "swissprot.inc.php").
>
> That last requirement is just so that it can be found and auto-loaded
> by the seqIOimport module.
>
> seqIOimport is only a 'go-between' - it handles (where possible)
> auto-detection of filetypes and calling of the appropriate parser, and
> acting as a frontend to the parsed sequence data (it can either return
> the 'raw' associative array results from the 'filetype' parsers, or it
> can pass the data to 'seqFactory', which is in charge of generating
> seq objects from the data.)
>
> Adding a new filetype parser to seqIOimport takes only one to three additional
> steps:
>
> 1.)REQUIRED - add the name of the filetype (e.g. 'locuslink') to
> the list of recognized filetypes.
> ( $this->seqfiletypes=array('fasta','clustal','lasergene','pdraw','genbank','swissprot'); )
>
> 2.)OPTIONAL (but desirable) - add the 'file extension' to the 'detect filetype
> by filename' feature, if applicable (the typeByName($name) method)
>
> 3.)OPTIONAL (but desirable) - and add pattern of the first line of data by
> which seqIOimport can recognize the type of data (the autodetect() method)
>
> Everything's been designed as much as possible so far such that each
> individual component needs to know only the barest minimum about the
> other components - seqIOimport only needs to know 'call the filetype parser
> with the data source' and 'call fetchNext() to get the next sequence', (and to
> call seqFactory to generate sequence objects) and that's it. The filetype
> parser only needs to know it's getting a data source on instantiation, and
> that it needs to respond to 'fetchNext()' with the next parsed sequence's
> information. seqFactory only needs to know that it's getting an associative
> array (and what common terms will be in the array) and how to feed that info
> to the seq object.
>
> It's hoped that this will make it very easy for people to pop in and
> contribute (in this case) import modules, since you don't need to 'learn' the
> rest of the modules to do so.
>
> Does any of this help?....
>
> P.S. to answer your SPECIFIC question - $flines is the data read from the
> source passed to the swissprot parser - the swissprot parser has no
> knowledge at all of the existence of the seqIOimport module that loads
> it (and, indeed, might conceivably be called directly in a script rather
> than through the seqIOimport 'wrapper'). (I note that the version of
> the parser that I'm looking at reads:
>
> while ( list($no, $linestr) = each($sourcelines) ) {
>
> so you probably do have a slightly older version.
>
> I've probably mangled this whole explanation, so please feel free
> to ask me what the heck I mean :-)
>
> Sean
>
> On Friday 19 March 2004 03:24 am, Frederic.Fleche@aventis.com wrote:
> > Hello all,
> >
> > I am planning to do a locuslink-file parser.
> > So I read the swissprot parser in order to get some good ideas.
> > Since my knowledge in php is not as good as yours I have a newbie question
> > concerning the following line of the function parse_swissprot
> >
> > while (list($no, $linestr) = each($flines))
> >
> > if $flines is from $seqIOimport->flines, I understand cause it is an array
> >
> > if $fines is from $seqIOimport->fp, I don't understand cause it is a file
> > handle or does it work in the same way ?
> -----BEGIN PGP SIGNATURE-----
> Version: GnuPG v1.9.5 (GNU/Linux)
>
> iD8DBQFAXToPJ6yQLhNTzSkRAnkUAKCvpA7cqQDaMnm0sJFZ4RX1lQ42ZACdFtE6
> Kv1WWSpIElN2YxreLYT5avc=
> =CT1a
> -----END PGP SIGNATURE-----
> _______________________________________________
> Biophp-dev mailing list
> Biophp-dev@bioinformatics.org
> https://bioinformatics.org/mailman/listinfo/biophp-dev
>
>