[Biophp-dev] Parser object - streams and files

S Clark biophp-dev@bioinformatics.org
Wed, 30 Apr 2003 11:21:02 -0600


Okay, I think I've figured out how to handle it such that both
streams and files behave exactly the same way from the end-user's
perspective.

I was ORIGINALLY going to move all of the data storage and
manipulation down to the individual filetype parsers, while 
the upper-level Parser object would be little more than a "wrapper".

Instead, how about this - the Parser object would now aquire a
"maximum history" attribute, and a "stack" to keep the returned
data arrays on.  The maximum history attribute tells the upper-level
parser how many records to keep track of before it starts discarding
the earlier ones.  I move the "next/previous" functions back up
to the upper-level parser where they were in the first place (hey, I'm
learning), and now the filetype parser goes back to being much simpler
and more portable, as it only needs to be able to open
and parse the data and return the records one at a time when asked rather
than also tracking them as I was going to try to do.

"fetch()" then returns the sequence object (and/or perhaps depending
on a passed parameter, the data array itself) derived from the
upper-level parser's current position in its stack.

"moveNext()", then:
1)checks to see if it's on the last record of the stack
2)if it is, it calls the "gimme the next record" function of the filetype 
parser and appends the resulting array to the stack (unless it's at eof)  If 
not, skip to step 4
3)if the size of the array is now larger than maximum history, array_shift the
first record into oblivion.
4)advance the pointer to the next record

There's still the limitation that you can't "movePrevious" any further than
(maximum history) records, but I figure with a good default size, it
won't matter much, and you can then move both forward and backwards in 
the data even when parsing a stream [without fear of ending up with 2GB of
data stored in memory].  (I was thinking 1000 records as the default maximum
size).

What do you all think?