[Biophp-dev] New Parse object code, and FASTA parser

S Clark biophp-dev@bioinformatics.org
Tue, 6 May 2003 13:11:44 -0600


On Monday 05 May 2003 11:13 pm, nicos@itsa.ucsf.edu wrote:
> The parse class now accepts filehandles, filenames, arrays and text.
> The autodetect function will open a file (if there is a filehandle), read
> the first line and rewind the filepointer.  Users can avoid this by
> specifying a seqfiletype. There might be some junk left in the parse
> class (and we should rename the thing, Serge, what should it be called?).
>
> Sean,could you rework the fasta parser so that it will also work with
> files/streams directly (instead of reading them into memory)?  Once you
> have done that I'll have a look at the other parsers.

Will do - should have that working today (nothing else on today's
schedule to keep me from working on it.)

I have an urge to worry about relying on "rewinding" the file 
pointer (since that fails for http/ftp/fsockopen handles)...but I'm going to
tell myself in this case to shut the heck up, because:

1)It makes the filetype parsers less labor intensive to write (not
having to worry about extra data along with filehandle, etc.).

2)It's really no different from MY original notion that users would
specify a parser directly when calling for online streams

and

3)It's easily solved by noting in the documentation somewhere that 
"users SHOULD always specify the filetype when possible". :-)

The new code looks good to me - looks like the filetype parser objects now
"MUST accept an array of lines OR a filehandle resource (and SHOULD accept 
filename or text - but don't NEED to since neither of those get passed by the
upper-level Parse object [nor do I think they need to] so those two options
will only be useful when using the filetype parsers outside the context of
the Parse object)".

One quick suggestion on the new code - it might be worth exchanging
the "fseek($this->fp,0)" with just "rewind($this->fp)" - it doesn't
really make a functional difference, but "rewind()" behavior looks
a little more consistent with the rest of the sytem (i.e. returns 0/false
if it can't rewind, 1/true otherwise, whereas fseek returns 0/false if
it IS successful, and "-1" (which is also "true" as nonzero?) if
unsuccessful.)  If we later add some sort of detection of non-rewindable
streams to the auto-detection routine it'll make the code slightly easier
to follow.

There, got my USRDA of nitpicky suggestions out of the way.  Back to work
for me on the FASTA parser - should have it posted up a little later today.