[Biophp-dev] seqdb and seq classes in genephp

Serge Gregorio biophp-dev@bioinformatics.org
Sat, 26 Apr 2003 01:02:13 +0800


Just a quick note... yes, I got your attachment.  Will go 
through it this weekend.  As for the seq and seqdb codes,
yes, the parser will have to be somehow attached to seq,
if we are to do a "quick or express parse" of any data
file.  As for the fate of seqdb, let's ask the opinion
of others first before "axe-ing it".  

Admittedly, it's a lot easier importing the parsed data
into a relational database like MySQL or PostgreSQL than
writing our own flat file database managment system.

Where do you think we should put such code (that stores
parsed data into a MySQL database)?  I am not that comfy
about making it an official part of GenePHP/BioPHP as it
reflects a particular database structure/design.  (See 
the sample MySQL database schema in the GenePHP site).

>> function parse_ANSI ()
>(Shouldn't that be "ASN.1"?)

Yes, that should be ASN.1.  Andres and I stand corrected. =)  

Lately, I've been busy writing scripts that actually do
something useful like translating proteins in all six 
reading frames, reverse translating a protein into its
nucleic acid counterparts, etc.  While it's admittedly
time-consuming, I am LEARNING A LOT about what needs to
be done with the existing code.  I've posted those demo
scripts at http://genephp.sourceforge.net/applist.html.

Kurt: Still haven't touched your code.  Been busy lately
(see above paragraph).   I've been to the Vector NTI
(Infomax?) site but I couldn't find any formal definition
or specification of their molecule document format, which 
according to you, is supposed to be a superset of GenBank.

My only other concern here is, given Nicos' suggestion of
having a function that "AUTO-DETECTS" a file, how would 
we then distinguish a Vector NTI file from a GenBank file
(given they have a lot of similarities)?
Sean: Where can I get/try out your eFetch code?  I've
visited your site but it says there "NO FILES AVAILABLE".
Am I missing something?



On Fri, 25 Apr 2003 09:59:52  
 mail-lists+biophpdev wrote:
>On Friday 25 April 2003 09:19 am, nicos@itsa.ucsf.edu wrote:
>> [...]I enclose a tar.gz file with the
>> code so that you can have a look (don't know if it makes it through teh
>> mailing list, not a good idea to include a file, but....)
>Postings over 40k currently "pause" in the queue with a message to the list
>administrator, who can approve or reject it.  I just approved it, naturally...
>I like this idea, though the individual format parsers might end up
>being classes themselves (and "enclosed" within the "wrapper" parser
>Of course, the really difficult part may be:
>> function autodetect () // figures out what seqfiletype this file is,
>Then again, that depends on how many different formats we want to be
>able to auto-detect.  It may also be worthwhile to have "forced" format
>parsing enabled (e.g. the ability to directly call a particular parser without
>going through auto-detection, in case auto-detection proves problematic
>for some formats).
>> function parse_ANSI ()
>(Shouldn't that be "ASN.1"?)
>I have FASTA and Clustal (.aln) parsers in the module code section already, if
>those are helpful at all.
>> The SQL stuff could be made self-contained in a similar fashion.  I would
>> strongly advice though to stop using the direct MySQL calls but instead
>> immediately start using a database abstraction layer like adodb (my
>> favorite, I can help out with this one) or PEAR (might finally be usable).
>I would personally vote for PEAR, mainly to minimize dependencies on
>"non-default" components.  Not that I would MANDATE it, even if I thought
>I could get away with it...
>Biophp-dev mailing list

Need a new email address that people can remember
Check out the new EudoraMail at