[BiO BB] GI numbers

Michel Dumontier micheld at mshri.on.ca
Wed Mar 31 10:40:37 EST 2004

Hi Robson,

  Since 14600509 and 5103570 are identifiers for identical sequences but
from different sources, they can be found in the same definition line in the
non-redundant fasta file that NCBI provides on it's FTP site (as a BLAST
database distribution - nr.gz).

This file and each definition line entry has been imported into Seqhound
(http://seqhound.mshri.on.ca), and is searchable under the redundant group
module with a variety of programming interfaces (C/C++/Perl/Java).


----- Original Message ----- 
From: "Robson Francisco de Souza" <rfsouza at citri.iq.usp.br>
To: <info at ncbi.nlm.nih.gov>
Cc: <bio_bulletin_board at bioinformatics.org>
Sent: Friday, March 26, 2004 12:48 PM
Subject: [BiO BB] GI numbers

> Hi,
> I'm analyzing a set of sequences with regard to their classifications as
> homologs from both COG and Kegg databases of orthologs. Although both
> COG and Kegg provide tables relating gene names to GI (PID) numbers,
> I'm, up to this moment, unable to map GIs from one dataset to the other,
> in order to check classifications for genes in both catalogs.
> GIs from COG appear to be from RefSeq and those from Kegg seem to be
> from GenPept. How can I map GI numbers from Kegg to GI numbers from COG
> database? Is there any query I can make to download such info for 185904
> proteins in COG and their equivalents on Kegg Orthologs database?
> Here is an example:
> Sequence 14600509 is the protein coded by gene APE0180 from Aeropyrum
> pernix complete genome, as described in COG's table myva=gb. The same
> sequence is identified by GI 5103570 in Kegg. In this case, I was able map
> COG's GI to Kegg's GI by using the gene identifier and annotation, a
> procedure that is not easily automated.
> How can I retrive equivalent IDs for the whole COG gene set?
> Thanks in advance for any help.
> Robson

More information about the BBB mailing list