[Biophp-dev] Swissprot and Genbank parsers done

biophp-dev@bioinformatics.org biophp-dev@bioinformatics.org
Sun, 11 May 2003 10:52:44 PDT


Btw, the ID field is set to the ENTRY NAME (right after LOCUS)
> but should be the primary accession code.  That's one "bug"
> that needs correcting.  Luckily, in most cases, their values
> are the same.  =)  

Could you fix that?


> >Nevertheless, five sequence file formats down!  
> 
> Five?  What's the fifth?  Genbank, Swissprot, Fasta, PDraw, and...?

Clustalw.  Have a look at the code in cvs.

> The Kegg Enzyme parser?  

????

> > >Shall we call the current Parse class: IOseq->read?  (and start
> >working on IOseq->write?)
> 
> Whatever you call the class, the "write" seems the next logical

Actually, I do think that names and naming conventions are going to be
important in the long run.  How well we choose the names, naming
conventions and how well we stick to them will determine how easy biophp
can be used.

> thing to do.  I guess I'll write the "write" for Genbank and
> Swissprot. 


Great!  

But first a strcuture for the IOwrite class. I would go for a constructor 
that takes an argument specifying the type of output desired (string,
array, file, filehandle?, or simply always return a string?), and the
type of sequence file desired (fasta,swissprot, genbank, etc..).  There
should be a IO->write->add($seq) function that calls seq_factory, which
should translate the items of object $seq in items that can be directly
incorporated in the output.  The actual 'write' methods could almost be
just a template where php's variable interpolation can do the work.

 
Best,


Nico