[Biophp-dev] Fasta filetype parser updated

S Clark biophp-dev@bioinformatics.org
Wed, 7 May 2003 10:48:42 -0600


On Wednesday 07 May 2003 09:49 am, Nico Stuurman wrote:

> Good idea.  I guess that seqfactory should call a seq->seqlength (if I
> am not mistaken, it is already there) to set seqlength.  Only exception
> is when the sequence length is indicated in the file (and the only
> problem arises when the two do not match....)

Hmmm, actually, that's a good point - if there's a case where the sequence
length is indicated but doesn't match the "actual" sequence length, it
may be a good idea to allow the length to be overridden, just in case they
did it for a reason (though I can't think what that reason would be...)

The interface method for "getSeqLength()" (or whatever it gets called) can do

if ($this->seqlength) { 
	return $this->seqlength; 
} else { 
	return (strlen($this->sequence));
}

and we're back to the user not needing to worry about it :-)

> I know, I can see you did quite a bit of perl.  I always have to look
> them up, I simply keep forgetting the syntax (and you can not accuse
> regular expressions of being very intuitive, although, I guess you
> could, but I can't).

Ironically, I have NOT done much perl...and it's all PHP's fault :-)

I started working with Perl off and on not long before I started getting
into PHP, which I started finding could do most of what Perl can, but 
more readably.  If it weren't for PHP, I might very well be a highly skilled
Perl guru by now.  As it is, I got as far as trying to understand exactly what
it is that the "bless()" function does in Perl OO, and no further.  (I never
did quite "get" bless()...but then, I didn't spend much time on it since
I had PHP to work with instead.)

It DID give me my start on regular expressions, though.  Even the basic
text editor in KDE now has support for regular expressions in it's
"Find/Replace" dialogs - very handy.

I can't argue with you about them being not very intuitive, though, especially
at first.  I'm only just recently getting into using backreferences, but even
just being able to use \d, \s, \w (etc.) becomes very handy.  I can report
that in my experience, in the end it actually is worth the mental anguish
that it takes to initially figure out how to use them, though.

> I'll try to move the pdraw parser (a very simple one) to this new
> approach.  I must confess that I do not find the fasta parser code easy
> to read.  If you don't mind, I'll rename some of the functions (in the
> pdraw parser) and set it all up so that I can easily understand it.

No complaints here - I know my coding style has to be somewhat idiosyncratic
after all this time of coding by myself (my joke about giving people
"parentheses poisoning" is, I must confess, based in fact :-) ).  Thankfully, 
the OO design means that other than the name of the class, file, and 
"fetchNext()" method, so long as what comes out matches what's expected, 
the guts can be written any way at all.  

Incidentally, if you have time to comment on the parts that are hard to follow
in my code, I'd be interested - I imagine that in at least some of the cases
I may be missing more appropriate techniques (e.g. my use of double-quotes
exclusively, out of habit).  Heck, it's your "fault" that I actually used 
"each()" to iterate through an array for the first time rather than my
usual tactic of

foreach(array_keys($array) as $key) { doSomethingTo($array[$key]); } 

(I've always known that next(),prev(),current(),each() all existed, I just
never thought much about them before.  Their use in your code got me
thinking.  They work better in some cases...)