Hi Ethan et al I work on bacterial subcellular localization prediction, and at the PSORT.org site that our group maintains (http://www.psort.org), you'll find an extensive list of links to predictive tools for both prokaryotes and eukaryotes. There are multiple tools available for eukaryotic protein prediction, and of the many choices I recommend Proteome Analyst (http://www.cs.ualberta.ca/%7Ebioinfo/PA/Sub/index.html). It uses an annotation keyword-based approach to prediction, wherein it finds homologs to your query protein in SwissProt and passes keywords from the homologs' SwissProt entries to a machine learning-based classifier. I would also recommend trying a few other methods and collating the results to form a consensus prediction. You can try using signal peptide-based methods, but you have to screen your dataset and remove any membrane proteins beforehand, so these don't get erroneously predicted as organellar. BLASTing your sequences against a database of proteins with annotated localization sites also works quite well - in PSORTb we carry out an analysis using this technique, which requires a hit at an e value less than e-10 and for the HSP region to span at least 80% of the length of the query and the subject (avoiding matches to a single domain of a protein). Hope that helps, and don't hesitate to drop me a line if you need any further advice on the subject. Cheers Jenn ----------------------------- Jennifer Gardy, PhD Candidate The Brinkman Lab Simon Fraser University Ph. 604 291 5414 www.sfu.ca/~jlgardy ----------------------------- -----Original Message----- From: biodevelopers-bounces+jlgardy=sfu.ca at bioinformatics.org [mailto:biodevelopers-bounces+jlgardy=sfu.ca at bioinformatics.org] On Behalf Of biodevelopers-request at bioinformatics.org Sent: Thursday, November 24, 2005 9:00 AM To: biodevelopers at bioinformatics.org Subject: Biodevelopers Digest, Vol 8, Issue 1 Message: 1 Date: Wed, 23 Nov 2005 14:21:40 -0600 From: "Ethan Strauss" <ethan.strauss at promega.com> Subject: [Biodevelopers] Subcellular localization? To: <biodevelopers at bioinformatics.org> Message-ID: <D8D8119118899D4A8EB5AD9BD24C1932011322E6 at MADMSG003.promega.com> Content-Type: text/plain; charset="us-ascii" Hi, I have a list of about 2500 accession numbers from Genbank Refseq. All of them are human coding sequences and I can easily get the complete sequence and other information from Genbank, but I can't figure out a way to get subcellular localization information. I have pulled some data from UniProt and from DBSubLoc (http://www.bioinfo.tsinghua.edu.cn/dbsubloc.html) and have been able to match about 10% of my sequences to subcellular localizations from these databases, but that still leaves about 90% unknown. One problem is that I can't find a way to match Genbank Accession # with the IDs in Swiss-Prot and DBSubLoc. I have just gone on sequence identity (So far I only call it a match when it is 100% identical). Do you have any ideas about how I can get subcellular localization info for the rest of my sequences? Thanks for any help or suggestions! Ethan Ethan Strauss Ph.D. Bioinformatics Scientist Promega Corporation 2800 Woods Hollow Rd. Madison, WI 53711 608-274-4330 800-356-9526 ethan.strauss at promega.com