[Biophp-dev] seqdb and seq classes in genephp

Fri, 25 Apr 2003 17:47:59 -0700

Hi Serge (and others)

> Just a quick note... yes, I got your attachment.  Will go
> through it this weekend.  As for the seq and seqdb codes,
> yes, the parser will have to be somehow attached to seq,
> if we are to do a "quick or express parse" of any data
> file.  As for the fate of seqdb, let's ask the opinion
> of others first before "axe-ing it".

I did  not mean at all to get rid of it.  Rather, some of the 
functionality now in there could be moved to separate, smaller, classes 
(like class parse and class sql) that can be used by seqdb as well as 
other scripts.  I do think there is a place for a flat file database 
system like you have in seqdb, it just should not be the only way to 
use the code.

> Where do you think we should put such code (that stores
> parsed data into a MySQL database)?  I am not that comfy
> about making it an official part of GenePHP/BioPHP as it
> reflects a particular database structure/design.  (See
> the sample MySQL database schema in the GenePHP site).

You could make a class sql and distribute it with a script that 
generates the needed tables.  It would also need to know the database 
server, database name, username and password.  These things could live 
in a biophp configuration file.

>>> function parse_ANSI ()
>>
>> (Shouldn't that be "ASN.1"?)
>
> Yes, that should be ASN.1.  Andres and I stand corrected. =)
>

Sorry, a little bit of dyslexia here...

> Lately, I've been busy writing scripts that actually do
> something useful like translating proteins in all six
> reading frames, reverse translating a protein into its
> nucleic acid counterparts, etc.  While it's admittedly
> time-consuming, I am LEARNING A LOT about what needs to
> be done with the existing code.  I've posted those demo
> scripts at http://genephp.sourceforge.net/applist.html.
>

Sounds cool.  Can this be made part of class seq?

> Kurt: Still haven't touched your code.  Been busy lately
> (see above paragraph).   I've been to the Vector NTI
> (Infomax?) site but I couldn't find any formal definition
> or specification of their molecule document format, which
> according to you, is supposed to be a superset of GenBank.
>
> My only other concern here is, given Nicos' suggestion of
> having a function that "AUTO-DETECTS" a file, how would
> we then distinguish a Vector NTI file from a GenBank file
> (given they have a lot of similarities)?

Don't know, depends on the exact file format.

It is probably best to add the fileformat as an (optional) parameter to 
the constructor of class parse.  That way we can postpone the 
autodetection until we have a bunch of parsers written and 
documentation for the various fileformats.  If you all don't mind I 
will put some work into coding the framework for class parse.

>>
>> I have FASTA and Clustal (.aln) parsers in the module code section 
>> already, if
>> those are helpful at all.

Sean, to what do your FASTA and Clustal parsers parse?  To Serge's seq 
objects?  If not, I am not sure how we can use them.  B.t.w. now is 
probably a good time to look carefully at the seq object (I did not do 
that), since lots of future work will depend on it.

>>
>>> The SQL stuff could be made self-contained in a similar fashion.  I 
>>> would
>>> strongly advice though to stop using the direct MySQL calls but 
>>> instead
>>> immediately start using a database abstraction layer like adodb (my
>>> favorite, I can help out with this one) or PEAR (might finally be 
>>> usable).
>>
>> I would personally vote for PEAR, mainly to minimize dependencies on
>> "non-default" components.  Not that I would MANDATE it, even if I 
>> thought
>> I could get away with it...

  I agree with that idea, I have simply much more experience with adodb 
(and I simply distribute it with the phplabware project, people 
downloding it probably are not even aware thay are using it).  The 
choice is up to the person writing the sql interface..

Best,

Nico

Nico Stuurman
Vale Lab
HHMI / Dept. of Cellular and Molecular Pharmacology
University of California, San Francisco
Genentech Hall, Room N316
600 16th street

For mail:
San Francisco, CA 94143-2200

For deliveries:
San Francisco, CA 94107

email: nicos@itsa.ucsf.edu
phone: (415) 514-3927
fax: (415) 476-5233