Hi All, I asked this question quite awhile ago and promised to send a summary when I had one. Well, I think I finally have something which is going to work for me. The basic method I have used is to take the GeneOntology subcellular localization annotation from the Gene Database at NCBI, but I am doing it in a complex way.... I ended up sort of throwing out all my RefSeq numbers as being too difficult to work with for now, but I can map my results to them when I am done. What I have done is: Download the Gene Database from ftp://ftp.ncbi.nlm.nih.gov/gene/DATA/ Upload all those tables into a local SQL database Download the GeneOntology database from http://www.geneontology.org/GO.downloads.shtml in MYSQL format. Upload the Gene Ontology definitions into my local SQL database (this was a real pain as I had to convert from MYSQL to SQL server) Download the Taxonomy and Homologene databases from http://www.ncbi.nlm.nih.gov/Ftp/ and ftp://ftp.ncbi.nih.gov/pub/HomoloGene/ and upload them I can then pick out the location annotations from the gene database and, in those cases where the human gene does not have a location annotation, I can look at the homologous genes from other organisms and look for location annotations for them. This gives me location info on ~7000 of the ~30000 human genes from the gene database and includes most of the genes I am interested in. It also gives me a Gene Ontology Evidence code which is a measure of the reliability of the data. I should also note that I am actually trying to get much more than just the location information, so all the work I have put into this is worthwhile for other purposes as well. Thanks everyone for your help! Ethan -----Original Message----- From: biodevelopers-bounces+ethan.strauss=promega.com at bioinformatics.org [mailto:biodevelopers-bounces+ethan.strauss=promega.com at bioinformatics.o rg] On Behalf Of Ethan Strauss Sent: Wednesday, November 23, 2005 2:22 PM To: biodevelopers at bioinformatics.org Subject: [Biodevelopers] Subcellular localization? Hi, I have a list of about 2500 accession numbers from Genbank Refseq. All of them are human coding sequences and I can easily get the complete sequence and other information from Genbank, but I can't figure out a way to get subcellular localization information. I have pulled some data from UniProt and from DBSubLoc (http://www.bioinfo.tsinghua.edu.cn/dbsubloc.html) and have been able to match about 10% of my sequences to subcellular localizations from these databases, but that still leaves about 90% unknown. One problem is that I can't find a way to match Genbank Accession # with the IDs in Swiss-Prot and DBSubLoc. I have just gone on sequence identity (So far I only call it a match when it is 100% identical). Do you have any ideas about how I can get subcellular localization info for the rest of my sequences? Thanks for any help or suggestions! Ethan Ethan Strauss Ph.D. Bioinformatics Scientist Promega Corporation 2800 Woods Hollow Rd. Madison, WI 53711 608-274-4330 800-356-9526 ethan.strauss at promega.com _______________________________________________ Biodevelopers mailing list Biodevelopers at bioinformatics.org https://bioinformatics.org/mailman/listinfo/biodevelopers