[ssml] Re: pdb-l: SEQRES and ATOM record mismatch

Mon Dec 13 16:30:59 EST 2004

Manisha Goel asked

>I have been using protein sequences of proteins with known structures
>(PDB databse) derived from the SEQRES records.  But now that I need to
>run either DSSP or STRIDE on them.. I cannot map the secondary
>structure back to the sequence alignments because the SEQRES and ATOMS
>records do not agree on the residue number id.  So a residue numbered
>as 278 in SEQRES record is listed as 268 in the ATOMS record, messing
>up my alignments (I was using this numbering to map the secondary
>structure on to the sequence) I have come across quite a bit of
>discussion in some mailing lists about the need & proposed methods of
>modification of the PDB files, so that they can be made consistent.
>BUT ..
>
>Meanwhile can somebody suggest a method or resource .. which could
>either fix this dicrepency or maybe a round about way of taking care
>of this.  I guess with people working with stuctural/sequence mapping
>so often, some such fix would have definetly been devised by somebody.
>
>I just want to be able to modify the residue numbers in the ATOMS record
>to match the SEQRES records or something to that effect. CIF does not
>work because it does not segregate by chain numbers/id.

Unfortunately, it is more difficult than it looks, because STRIDE and
DSSP see different subsets of the residues, since they have different
standards about how complete the backbone has to be in order to make a
secondary structure determination.

We have evolved a system over the past few years that usually allows
us to get the correct alignment of the DSSP and STRIDE results to the
original PDB file, but it is a complex arrangement of TCL scripts, and
I doubt that we'll be able to maintain it once the original author
leaves.  I certainly wouldn't try to have someone else try to install
or maintain it!

We ended up for many purposes creating a translation file that gives
the mapping between the position in the sequence and the PDB numbers:

#
# chain=A
# note1=SEQRES aligned to ATOM residues with SAM
# numInserts=3
# numResidues=164
# pdbCode=2arc
# pdbFile=/projects/compbio/data/pdb/2arc.pdb.gz
# pdbId=2arcA
# pdbToolsVersion=2.0
#
POS	SEQNUM	AA3	AA
4N	6S	3S	2S
1	7	ASP	D
2	8	PRO	P
3	9	LEU	L
4	10	LEU	L
5	11	PRO	P
6	12	GLY	G
7	13	TYR	Y
8	14	SER	S
...

We have found it easiest to make such a mapping once, with one tool,
then use it to try to align things, rather than have several tools all
trying to figure out and maintain the mapping.