> > I was actually going to suggest that length instead be implemented > inside > the seq object - (strlen($this->sequence)) rather than having the > parser > specify - seems odd to be able to "lie" about the sequence length. > That is, > as it currently stands, seqlength really doesn't have any "real" > relation to > the sequence...other than the fact that the parsers CLAIM the seqlength > is particular number. (But if one clips or appends to the sequence > later, > the "seqlength" stays the same...) > Good idea. I guess that seqfactory should call a seq->seqlength (if I am not mistaken, it is already there) to set seqlength. Only exception is when the sequence length is indicated in the file (and the only problem arises when the two do not match....) > I left the dashes in intentionally - those are the "gap" markers in > the sequences. I figure that'll come in handy when loading up a > seqalign > full of seq objects. That way if you load up a bunch of sequences from > an alignment, you keep the alignment information. > OK > In my version of the nuc_sequence object, I actually just implemented a > "removeGaps()" methods to get rid of them if desired. > Should be implemented in the seq class (or whatever it is now) as well.. > Regular expressions are a bit cryptic to learn at first, but I've been > finding > that they are ridiculously useful when dealing with > text...("Perl-Compatible > Regular Expressions" are one of the things that make Perl so good at > dealing > with text...) > I know, I can see you did quite a bit of perl. I always have to look them up, I simply keep forgetting the syntax (and you can not accuse regular expressions of being very intuitive, although, I guess you could, but I can't). I'll try to move the pdraw parser (a very simple one) to this new approach. I must confess that I do not find the fasta parser code easy to read. If you don't mind, I'll rename some of the functions (in the pdraw parser) and set it all up so that I can easily understand it. Nico