Great work here! This should go straight into biophp/genephp/devdocs/parsers.txt or something similar... Best, Nico > Hash: SHA1 > > Depending on how you count, there are either 2 or 3 'modules' that all > go together to make up the import capabilities. > > seqIOImport itself > the 'specific file type' parser module > and (depending how you count) > seqFactory. > > There's very little that needs to be done to seqIOimport (and nothing > for seqFactory) to add a new import module. > > There are only a couple of requirements to fit a 'specific file type' > module (such as the locuslink parser you are working on) > 1.)the class needs to be able to accept a file name, a file handle, or > "text" (which, I suppose, could actually be binary data) as an input source. > (this is so that we can handle data from a network socket connected to > a server, http:// or ftp:// URL's, files on the local hard drive, or data > already read into memory from other sources) > > 2.)the class needs to accept the input source on instantiation > (i.e. $parser = new locuslink_import($input_source) ) > > 3.)the class SHOULD have a "setSource()" interface (which sets or > changes the input source - seqIOimport doesn't currently use this, but > it could in the future - i.e. for parsing multiple files in one shot). > > 4.)the class MUST have a fetchNext() interface, which returns an associative > array with the next parsed sequence data. (e.g. 'id'=>'(name of > sequence)','sequence'=>'ACGTACGTACGT...') ) We're using this type of > 'generic' associative array as a format for exchange sequence information > between modules so as to make the individual modules usable by themselves > (i.e. you can use the fasta parser module all by itself [outside of > seqIOimport] without knowing anything about the seq class format...) > > 5.)When imported into the BioPHP framework, it goes into the 'parsers' > section, named (filetype).inc.php (e.g. "swissprot.inc.php"). > > That last requirement is just so that it can be found and auto-loaded > by the seqIOimport module. > > seqIOimport is only a 'go-between' - it handles (where possible) > auto-detection of filetypes and calling of the appropriate parser, and > acting as a frontend to the parsed sequence data (it can either return > the 'raw' associative array results from the 'filetype' parsers, or it > can pass the data to 'seqFactory', which is in charge of generating > seq objects from the data.) > > Adding a new filetype parser to seqIOimport takes only one to three additional > steps: > > 1.)REQUIRED - add the name of the filetype (e.g. 'locuslink') to > the list of recognized filetypes. > ( $this->seqfiletypes=array('fasta','clustal','lasergene','pdraw','genbank','swissprot'); ) > > 2.)OPTIONAL (but desirable) - add the 'file extension' to the 'detect filetype > by filename' feature, if applicable (the typeByName($name) method) > > 3.)OPTIONAL (but desirable) - and add pattern of the first line of data by > which seqIOimport can recognize the type of data (the autodetect() method) > > Everything's been designed as much as possible so far such that each > individual component needs to know only the barest minimum about the > other components - seqIOimport only needs to know 'call the filetype parser > with the data source' and 'call fetchNext() to get the next sequence', (and to > call seqFactory to generate sequence objects) and that's it. The filetype > parser only needs to know it's getting a data source on instantiation, and > that it needs to respond to 'fetchNext()' with the next parsed sequence's > information. seqFactory only needs to know that it's getting an associative > array (and what common terms will be in the array) and how to feed that info > to the seq object. > > It's hoped that this will make it very easy for people to pop in and > contribute (in this case) import modules, since you don't need to 'learn' the > rest of the modules to do so. > > Does any of this help?.... > > P.S. to answer your SPECIFIC question - $flines is the data read from the > source passed to the swissprot parser - the swissprot parser has no > knowledge at all of the existence of the seqIOimport module that loads > it (and, indeed, might conceivably be called directly in a script rather > than through the seqIOimport 'wrapper'). (I note that the version of > the parser that I'm looking at reads: > > while ( list($no, $linestr) = each($sourcelines) ) { > > so you probably do have a slightly older version. > > I've probably mangled this whole explanation, so please feel free > to ask me what the heck I mean :-) > > Sean > > On Friday 19 March 2004 03:24 am, Frederic.Fleche@aventis.com wrote: > > Hello all, > > > > I am planning to do a locuslink-file parser. > > So I read the swissprot parser in order to get some good ideas. > > Since my knowledge in php is not as good as yours I have a newbie question > > concerning the following line of the function parse_swissprot > > > > while (list($no, $linestr) = each($flines)) > > > > if $flines is from $seqIOimport->flines, I understand cause it is an array > > > > if $fines is from $seqIOimport->fp, I don't understand cause it is a file > > handle or does it work in the same way ? > -----BEGIN PGP SIGNATURE----- > Version: GnuPG v1.9.5 (GNU/Linux) > > iD8DBQFAXToPJ6yQLhNTzSkRAnkUAKCvpA7cqQDaMnm0sJFZ4RX1lQ42ZACdFtE6 > Kv1WWSpIElN2YxreLYT5avc= > =CT1a > -----END PGP SIGNATURE----- > _______________________________________________ > Biophp-dev mailing list > Biophp-dev@bioinformatics.org > https://bioinformatics.org/mailman/listinfo/biophp-dev > >