[BiO BB] Re: Quickly retrieving cross-referenced records from NCBI

Gaj Stan (BIGCAT) Stan.Gaj at BIGCAT.unimaas.nl
Wed Dec 20 06:02:29 EST 2006


Dear Dale,

I encountered the same question a few weeks ago, but my focus was the
other way around: go from NM to NP. For that I've written a Perl script
that I've adjusted to fit your needs (so going for NP to NM).

If I'm correct, RefSeq splits it's database in three parts: genomic,
mRNA and protein. For this script to work, you need a) to download a
species-specific RefSeq mRNA database (ends with .rna.gbff) for the NCBI
ftp and b) to have your own file of convertable IDs, sorted in a
list-form..
Note that this script will NOT detect version numbers: e.g. XP_12345.1
needs to be converted to XP_12345 in your list before it does it's job!

Although the code is far from perfect, it fulfills your question
perfectly (-;

Best wishes,

   Stan


-----Original Message-----
From:
bio_bulletin_board-bounces+stan.gaj=bigcat.unimaas.nl at bioinformatics.org
[mailto:bio_bulletin_board-bounces+stan.gaj=bigcat.unimaas.nl at bioinforma
tics.org] On Behalf Of Eugene Bolotin
Sent: 13 December 2006 19:30
To: General Forum at Bioinformatics.Org
Subject: Re: [BiO BB] Re: Quickly retrieving cross-referenced records
from NCBI

The quickest way is UCSC table browser, batch retreive. Read up on that.


On 12/12/06, Dale Richardson <dalesan at gmail.com> wrote:
>
> Hello All,
>
> Forgive me for posting, but this question is hard to condense into a
> good google search.  I am wondering if there is a quick way to batch
> retrieve all coding sequences (mRNA sequences) linked to a particular
> NCBI RefSeq Protein identifier.  For example, if I have a list of 10
> sequences with the following protein refseq IDs:
>
> XP_698519.1
> XP_697978.1
>
> and so on..
>
> How can I retrieve the cross-referenced XM_ identifiers for the
> coding sequences based on such protein accessions?  Must one write
> some kind of script to accomplish this or is there a quicker way?
>
> thanks,
>
> dale richardson
> university of cologne
>
> _______________________________________________
> General Forum at Bioinformatics.Org -
> BiO_Bulletin_Board at bioinformatics.org
> https://bioinformatics.org/mailman/listinfo/bio_bulletin_board
>



-- 
Eugene Bolotin
Ph.D. candidate
Genetics Genomics and Bioinformatics
University of California Riverside
ybolo001 at student.ucr.edu
Dr. Frances Sladek Lab
_______________________________________________
General Forum at Bioinformatics.Org -
BiO_Bulletin_Board at bioinformatics.org
https://bioinformatics.org/mailman/listinfo/bio_bulletin_board


More information about the BBB mailing list