[Biodevelopers] Subcellular localization?

Ethan Strauss ethan.strauss at promega.com
Mon Nov 28 12:28:26 EST 2005


Hi Dan, 
	Thanks for the advice. I am afraid that what you have suggested
is basically to hard for me to do without a lot of work for each entry.
I don't know a way to automate the 90% match for thousands of entries vs
thousands of possible targets. I am thinking could probably pull down a
BLAST program and work with that, but that is more effort than I want to
put in right now. Is there some "easy" way to do this? I also dont' see
a way to analyze abstracts as you have described without looking at each
one individually. Is there a way to automate something like this? 
	I have found several online servers which do predictions, that
is working fairly well for me and I think that is what I will go with. I
will send a summary of what I found and used when I finish. 
Thanks again, 
Ethan 

-----Original Message-----
From: biodevelopers-bounces+ethan.strauss=promega.com at bioinformatics.org
[mailto:biodevelopers-bounces+ethan.strauss=promega.com at bioinformatics.o
rg] On Behalf Of dmb at mrc-dunn.cam.ac.uk
Sent: Wednesday, November 23, 2005 3:07 PM
To: General discussions about software development in bioinformatics
Cc: biodevelopers at bioinformatics.org
Subject: Re: [Biodevelopers] Subcellular localization?

> Hi,
>    I have a list of about 2500 accession numbers from Genbank Refseq.
> All of them are human coding sequences and I can easily get the 
> complete sequence and other information from Genbank, but I can't 
> figure out a way to get subcellular localization information. I have 
> pulled some data from UniProt and from DBSubLoc
> (http://www.bioinfo.tsinghua.edu.cn/dbsubloc.html) and have been able 
> to match about 10% of my sequences to subcellular localizations from 
> these databases, but that still leaves about 90% unknown. One problem 
> is that I can't find a way to match Genbank Accession # with the IDs 
> in Swiss-Prot and DBSubLoc. I have just gone on sequence identity (So 
> far I only call it a match when it is 100% identical).
> Do you have any ideas about how I can get subcellular localization 
> info for the rest of my sequences?
> Thanks for any help or suggestions!
> Ethan

Hey Ethan.

What coverage do you get if you move to 90% identity matching? Dual
localisation and 'localisation shift' in evolution could cause you
problems, but my feeling is that very similar sequences will have
similar localisations.

For the remainder you could try running software to predict localisation
signal peptides. I have no idea which software is best, and how reliable
these can be (its a whole sub field in itself), but probably worth
investigation as part of an overall assignment strategy.

One strategy I was thinking about for minimum effort (to help a
colleague of mine) was to use the GI to PMID links in the Gene database
at the NCBI, and then lookup keywords in the article abstracts (or
keywords section) of PUBMED.

So you could find all abstracts that mention words to do with
experimental localisation techniques (I don't have a list, but we should
make one somewhere - biowiki?) and specific localisations, and then link
all those abstracts to genes. This is a very rough and ready approach,
but gives you
(hopefully) a lot of data, so you can measure reliability of assignment
by 'weight' of data for a certain gene. So you may find - 'fluorescence
tagging'  + endoplasmic reticulum in a certain paper, which is linked to
5 genes by the Gene database at the NCBI.

Additionally you could use pre-computed go annotation of pubmed articles
to link to genes by PMID.

I think if done right these three approaches (homology, signal
sequences, literature mining) should help you a lot, but I didn't try
any of them personally.

All the best,
Dan.

P.S. The GI <-> ACCN mapping is a perennial problem. Try searching
UniParc?


> Ethan Strauss Ph.D.
> Bioinformatics Scientist
> Promega Corporation
> 2800 Woods Hollow Rd.
> Madison, WI 53711
> 608-274-4330
> 800-356-9526
> ethan.strauss at promega.com
>
> _______________________________________________
> Biodevelopers mailing list
> Biodevelopers at bioinformatics.org
> https://bioinformatics.org/mailman/listinfo/biodevelopers
>


_______________________________________________
Biodevelopers mailing list
Biodevelopers at bioinformatics.org
https://bioinformatics.org/mailman/listinfo/biodevelopers



More information about the Biodevelopers mailing list