[Biophp-dev] Fasta filetype parser updated

Nico Stuurman biophp-dev@bioinformatics.org
Wed, 7 May 2003 08:49:05 -0700


>
> I was actually going to suggest that length instead be implemented 
> inside
> the seq object - (strlen($this->sequence)) rather than having the 
> parser
> specify - seems odd to be able to "lie" about the sequence length.  
> That is,
> as it currently stands, seqlength really doesn't have any "real" 
> relation to
> the sequence...other than the fact that the parsers CLAIM the seqlength
> is particular number.  (But if one clips or appends to the sequence 
> later,
> the "seqlength" stays the same...)
>

Good idea.  I guess that seqfactory should call a seq->seqlength (if I 
am not mistaken, it is already there) to set seqlength.  Only exception 
is when the sequence length is indicated in the file (and the only 
problem arises when the two do not match....)



> I left the dashes in intentionally - those are the "gap" markers in
> the sequences.  I figure that'll come in handy when loading up a 
> seqalign
> full of seq objects. That way if you load up a bunch of sequences from
> an alignment, you keep the alignment information.
>

OK

> In my version of the nuc_sequence object, I actually just implemented a
> "removeGaps()" methods to get rid of them if desired.
>

Should be implemented in the seq class (or whatever it is now) as well..

> Regular expressions are a bit cryptic to learn at first, but I've been 
> finding
> that they are ridiculously useful when dealing with 
> text...("Perl-Compatible
> Regular Expressions" are one of the things that make Perl so good at 
> dealing
> with text...)
>

I know, I can see you did quite a bit of perl.  I always have to look 
them up, I simply keep forgetting the syntax (and you can not accuse 
regular expressions of being very intuitive, although, I guess you 
could, but I can't).

I'll try to move the pdraw parser (a very simple one) to this new 
approach.  I must confess that I do not find the fasta parser code easy 
to read.  If you don't mind, I'll rename some of the functions (in the 
pdraw parser) and set it all up so that I can easily understand it.



Nico