On Sunday 11 May 2003 11:52 am, nicos@itsa.ucsf.edu wrote: [...] > > Clustalw. Have a look at the code in cvs. Which, incidentally, I'd accidentally broken on a previous commit - I fixed it again yesterday. Should be working now. I also committed the initial implementation of the "handle common synonyms" layer (_convertTerms()) for translating common terms to the terms used in the seq object. > Actually, I do think that names and naming conventions are going to be > important in the long run. How well we choose the names, naming > conventions and how well we stick to them will determine how easy biophp > can be used. Some combination of "write" (or "export") and "seq" seems appropriate to me for this particular section. I kind of like "export" just because it doesn't imply that the destination of the data going out is a file (or printout :-) ), but that's pure semantic niggling and doesn't really matter... > But first a strcuture for the IOwrite class. I would go for a constructor > that takes an argument specifying the type of output desired (string, > array, file, filehandle?, or simply always return a string?), and the > type of sequence file desired (fasta,swissprot, genbank, etc..). There > should be a IO->write->add($seq) function that calls seq_factory, which > should translate the items of object $seq in items that can be directly > incorporated in the output. The actual 'write' methods could almost be > just a template where php's variable interpolation can do the work. Hmmm, how's this: 1) Add a "getAsArray()" method to the seq object, which returns an array containing all of the 'set' attributes and their values (key=attribute ["sequence","id", etc.], value=value of that attribute). This will also substitute as a "wrapper" for all of the other interface methods at once (i.e. so the user doesn't have to do "getId(); getSequence();" (etc...) if they want all of the seq object's data.) 2)The IOwrite (or IOWriteSeq?) should include methods to set the destination (as you describe above - string, array, file, handle...) and type. (this way the user can use the same instance of the writer object to produce multiple files if desired). 3)The IOwrite object can have a "stack" where the extracted attributes get stored as "generic arrays" (this way someone can write a file converter [e.g. genbank to fasta, or clustal to phylip] without the extra baggage of creating seq objects [which are only going to be read back out of and destroyed anyway in that case] - the 'fetchRawRecord()' method of the Parse object is for this sort of thing). 4)if given (to an "add()" method) an "array" of attributes, IOWrite just shoves them on the stack. If passed a seq object , IOWrite calls its "getAsArray()" method and shoves the results of that on the stack. (The "stack" is necessary when export is to interleaved file formats). We MIGHT include a "write()" (or some similar name) method to allow bypassing the "stack" and writing immediately for non-interleaved formats (returns false if called while set to an interleaved format). 5)Perhaps I should move the "translation" layer back out of seq_factory and into a separate class. The "Translate" class wouldn't need to be instantiated, but it would make a variety of minor "correction" functions available everywhere as, e.g. "Translate::toSeq()". More an "ease of re-use" issue than anything technical, though. There's no reason I can't make "_convertTerms()" into a public method and have people call it from outside as "seq_factory::convertTerms();" If I DID make a separate "Translate" class to be used like this, it might also include things like "Translate::NCBIDeflineExtract($field)" which one could use to get, e.g., just the accession number out of an NCBI Defline. It might also be worth the trouble to move a lot of the "common" functions that are currently in the class files but not part of the classes (e.g. the "complement()" function in seq.inc.php) where they can be accessed by other object (or have the file be utilized by itself by other projects). (I think doing that will also make the actual seq objects [and others] take up less resources since there'll only be one copy of the "common" methods rather than a copy in each instance of the classes). I'd strongly advocate getting interface methods implemented in the seq object soon - as I read up on Object Oriented design I keep seeing it said that that you're "supposed to" use them instead of setting variables directly (even for public variables, it would seem), and I'm beginning to see why - when you have people using an interface method to set variables, you can do things like validity checking, error correction, and transparently handling internal changes (e.g. changing variable names [e.g. to meet PEAR standards on naming], "splitting" variables, moving variables into an array for easier handling, etc.) without breaking other objects, etc. For example, right now everyone is expected to directly set $seq->sequence and $seq->moltype directly, which means I can easily accidently $seq->sequence='ZXKUQYB'; $seq->moltype='DNA'; whereas if people are able to use a "setSequence()" method, we can add auto-detection of the type whenever the sequence is set (and "setMolType()" can check the existing sequence to see if it's valid for that type...) I was thinking about editing my old sequence class to make it "seq compatible" and dropping it in as "alt_seq.inc.php", where we can compare them side-by-side and merge the useful features of each. Thoughts? I'm thinking I should quit stalling and get back to finishing the NCBI Blast query handler first, though...