[Biophp-dev] export/write object

Wed, 14 May 2003 18:02:26 -0600

On Wednesday 14 May 2003 03:21 pm, Nico Stuurman wrote:
[...]
> > 1) Add a "getAsArray()" method to the seq object, which returns an
> > array containing all of the 'set' attributes and their values
> > (key=attribute
> > ["sequence","id", etc.], value=value of that attribute).  This
> > will also substitute as a "wrapper" for all of the other interface
> > methods
> > at once (i.e. so the user doesn't have to do "getId(); getSequence();"
> > (etc...) if they want all of the seq object's data.)
>
> Isn't this functionality supposed to part of seq_factory()?  Maybe I
> still don't get the concepts behind this structure.

seq_factory's ONLY purpose as designed is go generate seq objects, rather
than "disassembling" them.  'course, there are two different viewpoints
here:

On one hand, adjusting seq_factory's design goals to make it an
"interconverter" or "translator" rather than a seq object creator
wouldn't be THAT big of a change.

On the other hand, getting "generic" data back out of the seq object
seems to fit naturally into it - currently this is done by direct
access ("$sequence=$seqobj->sequence;") and hopefully in the near future
through the more "object-orientationally-correct" use of interface
methods in the object ("$sequence=$seqobj->getSequence();").  I'm not sure
moving retrieval of the data from the seq object out into another
layer would necessarily be helpful in this case.

(Additionally, for what its worth, the way "blahblahblah_factory"  objects
seem to be used elsewhere [C++, Java, etc.] are where I got the notion
of a standalone object-generation-dedicated class.  I don't know if
that's necessarily "correct" design, but it does seem to be common).

I'm not strongly opposed to broadening seq_factory's purpose a bit if we
want to, though ('course, we'll want to rename it if we do.)

> > 4)if given (to an "add()" method) an "array" of attributes, IOWrite
> > just shoves them on the stack. If passed a seq object , IOWrite calls
> > its
> > "getAsArray()" method and shoves the results of that on the stack.
> > (The
> > "stack" is necessary when export is to interleaved file formats).  We
> > MIGHT
> > include a "write()" (or some similar name) method to allow bypassing
> > the "stack" and writing immediately for non-interleaved formats
> > (returns false
> > if called while set to an interleaved format).
>
> How important are interleaved formats going to be?  They complicate
> matters quite a bit, and if we can do without....  I would all be for a
> 'write' method.  Also, how is an interleaved format going to be
> 'written'?  By calling the 'write' method?

Well, for MY purposes, converting from clustal to phylip is one thing I could
see myself doing fairly often (both interleaved formats), as well as at some
point reading clustal data in to 'cull' badly-aligning sequences from the list
and writing back out.  (Not something that needs to be done often when you're
selecting the sequences to align by hand, but in an automated system that
takes a not-human-reviewed list of sequences and aligns them, a future module
to evaluate the quality of the individual alignments and cull bad ones for
phylogenetic analysis could be handy).

And, yeah, I figured the "write" method would be analogous to PHP's "flush()"
- basically signalling the exporter to write whatever it's got saved up in
its stack out to the destination [be it a variable of text, a file, or
whatever).  This MIGHT be done in some cases for non-interleaved formats, too, 
for data being sent over the 'net (for the small speed benefit of sending
the data all at once rather than send a bit, read a bit, send a bit, read a
bit...also beneficial if saving to media that "wears out", like compact flash
cards, though I don't imagine that will be a really frequent concern.)

[...]
> > If I DID make a separate "Translate" class to be used like this, it
> > might
> > also include things like "Translate::NCBIDeflineExtract($field)" which
> > one could use to get, e.g., just the accession number out of an NCBI
> > Defline.
>
> I can't oversee the advantages/disadvantages completely here.

The main advantages as I see it are easier code re-use, and lower resource
usage for objects (in other words, currently every single seq object would 
contain a full copy of the code for some method, whereas if the "common" 
methods were moved to a separate class, they would only contain a "wrapper" 
which points to a single copy.  (In the case of "complement()", this is 
already the case - just moving it into a separate file/class makes it easier 
to find and re-use in other modules, potentially.  Inside the seq object we
could implement a "getComplement()" method which simply does:
return (sequence_Common::complement($this->seq));

Having said that - I have to admit I've never actually TRIED this before, 
so I don't know how easy it is to use or how well it works, but it SEEMS
like it would be good for some things.

My opinion at this point is that it's something we ought to CONSIDER, and
probably something we'll eventually want to do, but that it's not something
that we have a genuine NEED for yet and so can pretty much drop it if nobody
else thinks the idea is useful at the moment.

> > It might also be worth the trouble to move a lot of the "common"
> > functions
> > that are currently in the class files but not part of the classes (e.g.
> > the "complement()" function in seq.inc.php) where they can be accessed
> > by other object (or have the file be utilized by itself by other
> > projects).
[...]
> Hmm.  Doesn't it make more sense to make the part of the seq objects?

In the specific case of complement (and others) they do seem a natural fit
inside the seq object - in those cases really the only benefit the concept
gives is reduced resource usage for each seq object (and the ability to get
to a "complement()" function outside of the seq class file - though that's
not much of a benefit by itself since someone can just as easily call
seq::complement($sequence) as sequence_Common::complement($sequence) )

> Right, although it [error checking in interface methods] all adds overhead.

Definitely true, though I think the ability to have checking and correction
and so on is worth the trouble (not that every - or even "most" - methods
need to include all of that.  Most of mine are generally just wrappers 
for returning or setting the internal attributes.)

The main reason I bring it up is that it seems like anyone who comes in
from an OO background is going to be expecting to work through interface
methods rather than accessing variables directly, so in addition to "possible"
benefits that may or may not be used in an individual method, it also will
accomodate people used to OO design (without harming anyone who still wants
to deal with the attribute variables in the object directly).

> > I was thinking about editing my old sequence class [...]
> Good plan.

I'll add that to my list, then...Once I get the NCBI-Blast to a 
minimally-complete point I'll get to work on that - I don't think
it'll take long.