>> So far,autodetection relied only on the first line. Even for a stream >> reading a single line is not a high price to be paid. Simply rewrite >> the >> autodetect code to read a single line instead of the whole file. > > My only worry there is that I think SOME formats might not be > detectable > that way (some XML records, or if someone ever adds a parser for HTML > output > from a site, or something of the sort), though the basic idea is still > quite > feasible - we could have it buffer a certain number of lines (just > enough to > ensure that it'll reach some identifiable characteristic in just about > any > format). > > It'll take a LITTLE bit of extra code then, in the event that someone > wants > to parse a stream (the filetype parser will need to be able to accept > "some header text AND a file resource" and know to do them in > sequence, but > I think I can see how to deal with that without too much trouble.) What about coding lazy and simply have the parser re-open the stream? Unless it costs much to open stream, there will not be much in terms of performance penalty and it will make the code look prettier. >> Will it be possible to do the actual parsing from an array (in >> memory). >> For a fasta parser it is not much work writing two parsers (one for in >> memory parsing, the other for streams), for the genbank,swissport and >> others it is going to be a pain. Having the parsers read a record into >> memory and then parse the whole thing would make it easier. > > Hmmm, perhaps having the filetype parsers reading one line at a time > into a > "temporary record" variable, then parsing that variable when it hits > the > "end of record" marker for the format, you mean? Something like a method readRecordinArray() which simply fills an array with lines until it finds the end of record mark (or whatever clues there are it is a the end of a record). The array is then passed to the 'real' parser that only works with arrays of lines. > The "set how big of a buffer (of parsed records) you will need" method > I currently have in there is just the best compromise I've come up > with so far, so there could easily be a better solution waiting just > outside of my brain... > > Hmmm...under what circumstances will people need to move back and > re-fetch > a previous record? That may get me thinking a little clearer on the > buffer > issue... > Right, who would ever want to go back. It just seems good design to allow for it.... > I have to run back out again (yesterday and today's schedule, in > particular, > is a horrendous mess on my end) but I will be back later this evening, > and > I'll put them up then. Shall I go ahead and rename the file I have as > "parse2.inc.php" back to "parse.inc.php"? > Yes. Nico