[Biodevelopers] Subcellular localization?

Ethan Strauss ethan.strauss at promega.com
Wed Jan 4 10:10:49 EST 2006


Hi All, 
	I asked this question quite awhile ago and promised to send a
summary when I had one. Well, I think I finally have something which is
going to work for me. 
	The basic method I have used is to take the GeneOntology
subcellular localization annotation from the Gene Database at NCBI, but
I am doing it in a complex way....
	I ended up sort of throwing out all my RefSeq numbers as being
too difficult to work with for now, but I can map my results to them
when I am done. 
What I have done is:
Download the Gene Database from ftp://ftp.ncbi.nlm.nih.gov/gene/DATA/
Upload all those tables into a local SQL database
Download the GeneOntology database from
http://www.geneontology.org/GO.downloads.shtml in MYSQL format. 
Upload the Gene Ontology definitions into my local SQL database (this
was a real pain as I had to convert from MYSQL to SQL server)
Download the Taxonomy and Homologene databases from
http://www.ncbi.nlm.nih.gov/Ftp/ and
ftp://ftp.ncbi.nih.gov/pub/HomoloGene/ and upload them

I can then pick out the location annotations from the gene database and,
in those cases where the human gene does not have a location annotation,
I can look at the homologous genes from other organisms and look for
location annotations for them.  

This gives me location info on ~7000 of the ~30000 human genes from the
gene database and includes most of the genes I am interested in. It also
gives me a Gene Ontology Evidence code which is a measure of the
reliability of the data. I should also note that I am actually trying to
get much more than just the location information, so all the work I have
put into this is worthwhile for other purposes as well. 

Thanks everyone for your help!
Ethan

-----Original Message-----
From: biodevelopers-bounces+ethan.strauss=promega.com at bioinformatics.org
[mailto:biodevelopers-bounces+ethan.strauss=promega.com at bioinformatics.o
rg] On Behalf Of Ethan Strauss
Sent: Wednesday, November 23, 2005 2:22 PM
To: biodevelopers at bioinformatics.org
Subject: [Biodevelopers] Subcellular localization?

Hi, 
   I have a list of about 2500 accession numbers from Genbank Refseq.
All of them are human coding sequences and I can easily get the complete
sequence and other information from Genbank, but I can't figure out a
way to get subcellular localization information. I have pulled some data
from UniProt and from DBSubLoc
(http://www.bioinfo.tsinghua.edu.cn/dbsubloc.html) and have been able to
match about 10% of my sequences to subcellular localizations from these
databases, but that still leaves about 90% unknown. One problem is that
I can't find a way to match Genbank Accession # with the IDs in
Swiss-Prot and DBSubLoc. I have just gone on sequence identity (So far I
only call it a match when it is 100% identical). 
Do you have any ideas about how I can get subcellular localization info
for the rest of my sequences? 
Thanks for any help or suggestions!
Ethan

Ethan Strauss Ph.D.
Bioinformatics Scientist
Promega Corporation
2800 Woods Hollow Rd.
Madison, WI 53711
608-274-4330
800-356-9526
ethan.strauss at promega.com

_______________________________________________
Biodevelopers mailing list
Biodevelopers at bioinformatics.org
https://bioinformatics.org/mailman/listinfo/biodevelopers



More information about the Biodevelopers mailing list