[BiO BB] Getting PDB id from Swissprot entry
idoerg at burnham.org
Thu Jul 1 12:15:21 EDT 2004
I do not know for which particular purpose you are looking for SP <-->
PDB mappings, but beware the following perils & pitfalls:
1) The SP <--> PDB mapping can be many-to-many.
1.1 )There may be several entries in PDB which correspond to a single
swissprot entry. This is because the same protein may have been
structurally solved by different groups at different times, solved with
different ligands, point mutated, and so forth. Be very careful about
point-mutations: they do not have the same sequence in PDB as in SP.
1.2) At the same time, there may be several SP entries corresponding to
a single PDB entry. This may be due to SP redundancy (although database
curators are doing a fantastic job of keeping that down), close
homologs, or point mutations.
2) An SP amino-acid sequence is rarely the same as the PDB sequence.
Usually only part of a structure is solved. Gaps abound, because
crystallographers sometimes cannot see the loops. There are large
deletions, because there are bits which are not crystallizable, or, if
NMR, they are trying to keep the protein short.
3) There are many SP entries which do have an equivalent in PDB, but it
does not say so in DR or PDB. See also the "40%" comment below.
Sourangshu Bhattacharya wrote:
> Hi Dan,
> Thank you very much. I didn't know about MSD.
> There is also an entry HSSP in swissprot which gives homologues.
> Dan Bolser wrote:
>> On Wed, 30 Jun 2004, Sourangshu Bhattacharya wrote:
>>> Is there a direct way (without reading the protein name from
>>> swissprot and searching in PDB) of getting the PDB id of the protein
>>> corresponding to a particular Swissprot id ?
>> I would use the MSD database, which maintains a manually curated version
>> of the SwissProt to PDB mapping.
>>> Also, how do I know whether structure for a particular protein
>>> corresponding to a swissprot id has been determined or not ?
>> Strictly speeking, the above mapping gives you this. More realistically,
>> however, you can consider very close homologues to the above set as also
>> 'solved'. Where you draw the line is a matter of requirement, but you can
>> get reasonable models (allegedly) at > 40% sequence identity, or
>> reasonable 'fold prediction' at much larger distances (see SUPERFAMILY
>> It all depends on what you want to do.
>>> Thank you very much..
>> BiO_Bulletin_Board maillist - BiO_Bulletin_Board at bioinformatics.org
Iddo Friedberg, Ph.D.
The Burnham Institute
10901 N. Torrey Pines Rd.
La Jolla, CA 92037 USA
Tel: +1 (858) 646 3100 x3516
Fax: +1 (858) 713 9930
More information about the BBB