Manisha Goel asked >I have been using protein sequences of proteins with known structures >(PDB databse) derived from the SEQRES records. But now that I need to >run either DSSP or STRIDE on them.. I cannot map the secondary >structure back to the sequence alignments because the SEQRES and ATOMS >records do not agree on the residue number id. So a residue numbered >as 278 in SEQRES record is listed as 268 in the ATOMS record, messing >up my alignments (I was using this numbering to map the secondary >structure on to the sequence) I have come across quite a bit of >discussion in some mailing lists about the need & proposed methods of >modification of the PDB files, so that they can be made consistent. >BUT .. > >Meanwhile can somebody suggest a method or resource .. which could >either fix this dicrepency or maybe a round about way of taking care >of this. I guess with people working with stuctural/sequence mapping >so often, some such fix would have definetly been devised by somebody. > >I just want to be able to modify the residue numbers in the ATOMS record >to match the SEQRES records or something to that effect. CIF does not >work because it does not segregate by chain numbers/id. Unfortunately, it is more difficult than it looks, because STRIDE and DSSP see different subsets of the residues, since they have different standards about how complete the backbone has to be in order to make a secondary structure determination. We have evolved a system over the past few years that usually allows us to get the correct alignment of the DSSP and STRIDE results to the original PDB file, but it is a complex arrangement of TCL scripts, and I doubt that we'll be able to maintain it once the original author leaves. I certainly wouldn't try to have someone else try to install or maintain it! We ended up for many purposes creating a translation file that gives the mapping between the position in the sequence and the PDB numbers: # # chain=A # note1=SEQRES aligned to ATOM residues with SAM # numInserts=3 # numResidues=164 # pdbCode=2arc # pdbFile=/projects/compbio/data/pdb/2arc.pdb.gz # pdbId=2arcA # pdbToolsVersion=2.0 # POS SEQNUM AA3 AA 4N 6S 3S 2S 1 7 ASP D 2 8 PRO P 3 9 LEU L 4 10 LEU L 5 11 PRO P 6 12 GLY G 7 13 TYR Y 8 14 SER S ... We have found it easiest to make such a mapping once, with one tool, then use it to try to align things, rather than have several tools all trying to figure out and maintain the mapping.