[Biophp-dev] seqdb and seq classes in genephp
Nico Stuurman
biophp-dev@bioinformatics.org
Fri, 25 Apr 2003 17:47:59 -0700
Hi Serge (and others)
> Just a quick note... yes, I got your attachment. Will go
> through it this weekend. As for the seq and seqdb codes,
> yes, the parser will have to be somehow attached to seq,
> if we are to do a "quick or express parse" of any data
> file. As for the fate of seqdb, let's ask the opinion
> of others first before "axe-ing it".
I did not mean at all to get rid of it. Rather, some of the
functionality now in there could be moved to separate, smaller, classes
(like class parse and class sql) that can be used by seqdb as well as
other scripts. I do think there is a place for a flat file database
system like you have in seqdb, it just should not be the only way to
use the code.
> Where do you think we should put such code (that stores
> parsed data into a MySQL database)? I am not that comfy
> about making it an official part of GenePHP/BioPHP as it
> reflects a particular database structure/design. (See
> the sample MySQL database schema in the GenePHP site).
You could make a class sql and distribute it with a script that
generates the needed tables. It would also need to know the database
server, database name, username and password. These things could live
in a biophp configuration file.
>>> function parse_ANSI ()
>>
>> (Shouldn't that be "ASN.1"?)
>
> Yes, that should be ASN.1. Andres and I stand corrected. =)
>
Sorry, a little bit of dyslexia here...
> Lately, I've been busy writing scripts that actually do
> something useful like translating proteins in all six
> reading frames, reverse translating a protein into its
> nucleic acid counterparts, etc. While it's admittedly
> time-consuming, I am LEARNING A LOT about what needs to
> be done with the existing code. I've posted those demo
> scripts at http://genephp.sourceforge.net/applist.html.
>
Sounds cool. Can this be made part of class seq?
> Kurt: Still haven't touched your code. Been busy lately
> (see above paragraph). I've been to the Vector NTI
> (Infomax?) site but I couldn't find any formal definition
> or specification of their molecule document format, which
> according to you, is supposed to be a superset of GenBank.
>
> My only other concern here is, given Nicos' suggestion of
> having a function that "AUTO-DETECTS" a file, how would
> we then distinguish a Vector NTI file from a GenBank file
> (given they have a lot of similarities)?
Don't know, depends on the exact file format.
It is probably best to add the fileformat as an (optional) parameter to
the constructor of class parse. That way we can postpone the
autodetection until we have a bunch of parsers written and
documentation for the various fileformats. If you all don't mind I
will put some work into coding the framework for class parse.
>>
>> I have FASTA and Clustal (.aln) parsers in the module code section
>> already, if
>> those are helpful at all.
Sean, to what do your FASTA and Clustal parsers parse? To Serge's seq
objects? If not, I am not sure how we can use them. B.t.w. now is
probably a good time to look carefully at the seq object (I did not do
that), since lots of future work will depend on it.
>>
>>> The SQL stuff could be made self-contained in a similar fashion. I
>>> would
>>> strongly advice though to stop using the direct MySQL calls but
>>> instead
>>> immediately start using a database abstraction layer like adodb (my
>>> favorite, I can help out with this one) or PEAR (might finally be
>>> usable).
>>
>> I would personally vote for PEAR, mainly to minimize dependencies on
>> "non-default" components. Not that I would MANDATE it, even if I
>> thought
>> I could get away with it...
I agree with that idea, I have simply much more experience with adodb
(and I simply distribute it with the phplabware project, people
downloding it probably are not even aware thay are using it). The
choice is up to the person writing the sql interface..
Best,
Nico
Nico Stuurman
Vale Lab
HHMI / Dept. of Cellular and Molecular Pharmacology
University of California, San Francisco
Genentech Hall, Room N316
600 16th street
For mail:
San Francisco, CA 94143-2200
For deliveries:
San Francisco, CA 94107
email: nicos@itsa.ucsf.edu
phone: (415) 514-3927
fax: (415) 476-5233