[ssml] PDBe introduces a compound-based browser for the PDB archive

Gerard DVD Kleywegt gerard at xray.bmc.uu.se
Fri Mar 18 17:48:31 EDT 2011


Hi all,

As part of its recent winter update, the Protein Data Bank in Europe (PDBe; 
http://pdbe.org) introduced a new, chemistry-based module of its PDB archive 
browser (a.k.a. PDBeXplore). It can be accessed at:

                      http://pdbe.org/compounds

As you may (or may not) know, the PDB browser is an interface that enables you 
to retrieve and analyse information on subsets of structures in the PDB using 
various biological or chemical classifications. Previously released modules 
enable browsing of the archive based on the Enzyme Class (http://pdbe.org/ec), 
CATH domains (http://pdbe.org/cath), Pfam families (http://pdbe.org/pfam) or 
Fasta-based sequence-similarity searches (http://pdbe.org/fasta).

The new compound-based browser allows you to enter the name of a chemical 
compound of interest and analyse all the PDB entries that contain that 
compound. Once you start typing the name (or three-letter code, if you happen 
to know it) of a compound, a drop-down menu will show you matching compound 
names and you can select the compound of interest. For instance, if you are 
interested in Sildenafil, just start typing the name and once you get to 
"sild", the only remaining matching compound is:

VIA - 
5-{2-ETHOXY-5-[(4-METHYLPIPERAZIN-1-YL)SULFONYL]PHENYL}-1-METHYL-3-PROPYL-1H,6H,7H-PYRAZOLO[4,3-D]PYRIMIDIN-7-ONE

(Note: the auto-complete function uses information about synonyms from the 
wwPDB chemical component dictionary.)

Select this compound, click on the "Submit" button and the central panel of the 
browser will soon be filled with a table of all PDB entries that contain this 
compound (currently there are only five). The right-hand panel will contain 
more information about the compound you have selected, including a chemical 
diagram, formula and SMILES codes.

Note: if you don't know if your compound occurs in the PDB or what its name is, 
you can use the search options of PDBeChem - at http://pdbe.org/pdbechem - 
including an option to draw a (sub)structure (to do this, click on the "edit" 
button for the "Non-Stereo SMILES (Has Sub-Structure)" field in the PDBeChem 
search form).

In order to demonstrate the powerful analysis options in the compound browser, 
select a more abundant compound, e.g. ATP, and hit the "Submit" button again 
(or click on this link: http://pdbe.org/compounds?ligand=ATP). The central 
panel will show a list of the PDB entries containing the compound you selected. 
The information here can be sorted by clicking on any of the column headings in 
the table (clicking again reverts the sort order).

You will notice a number of tabs at the top of the central panel - they are 
labelled "PDB entries", "Ligands", etc. Selecting one of these tabs gives you a 
new "perspective" on the selected set of PDB entries (in this case, all entries 
containing ATP or whichever compound you selected):

* PDB entries: this is the default view that the browser will present once you 
have selected a compound. To download the entire table as a text file, use the 
link in the right-hand panel. If you move your mouse over the PDB code of an 
entry, it will show a miniature image of the structure; clicking the link will 
open the PDBe summary page for that entry. Clicking on the "view" link will 
load the structure in an interactive viewer so that you can study it in detail.

* Ligands: this view displays a table of information about the additional 
compounds found in all the PDB entries that contain your compound of interest. 
The table is ordered such that the compounds that occur most often are at the 
top. Each row in the table gives information about the three-letter code of the 
compound, its chemical structure, chemical formula and systematic name. The 
second column contains a link to information about the interaction statistics 
of the compound with the standard amino-acid types. The link "Get PDB entries" 
generates a list of all PDB entries containing both that compound and our 
compound of interest.

* Structure folds: this view displays information about the fold families 
(based on the CATH classification) encountered in the PDB entries containing 
the selected compound. The tab also shows the distribution of CATH classes and 
CATH architectures for the selected PDB entries as a pie chart. If you click on 
a pie slice (or in the legend), only the appropriate CATH categories will be 
shown in the table. By the way, the pie charts can also be printed or 
downloaded.

* Assemblies: this view provides information about the possible quaternary 
structure(s) of the selected PDB entries. A small table shows how many entries 
are monomeric, homomeric and heteromeric, and two (clickable) pie charts show a 
further breakdown of the homomeric and heteromeric structures respectively. The 
main table in the tab shows the possible quaternary structure(s) for the 
entries, together with (for non-monomeric structures) the accessible and buried 
surface areas of the complex and the estimated free energy gain upon formation 
of the complex.

* Sequence families: this view lists all Pfam families that are present in the 
selected PDB entries.

* Organisms: the source organisms found in all selected PDB entries are shown 
in a table. The clickable pie charts show the distribution of these organisms 
based on superkingdom (bacteria, archaea, etc.) and genus (homo, rattus, 
bacillus, etc.).

* Publications: this table contains details about the (primary) publications of 
all the PDB entries with the selected compound.

* Authors: this tab lists the names of all the authors of the structures 
containing the selected compound in the PDB, sorted by the number of those PDB 
entries of which they are an author. This information is useful to biologists 
and journal editors who wish to get in touch with, for instance, 
crystallographers who have solved many structures containing a particular 
ligand.

The information presented by the browser is taken from the PDBe database, which 
means that it is always up to date.

Using this browser, it is now child's play to dig up titbits such as:

- the compound that occurs most commonly in entries that also contain ATP is 
magnesium

- about 1 in 10 entries that contain NAD also contain FAD

- 95% of CATH domains occurring in entries with NAD are of the alpha-beta class

- there is only one hetero-hexameric assembly in all the entries that contain 
NAD, namely http://pdbe.org/3ket

- Johan Weigelt has deposited more structures of NAD-containing proteins than 
Michael Sundstrom

Note: currently, the statistics presented by the browser are based on all the 
PDB entries that contain your compound of interest, i.e. not only the 
macromolecules to which it is actually bound in those entries.

By the way, all the previously released browser modules have been updated 
recently to include clickable pie charts and retrieve results much faster than 
before.

We welcome your comments, bug reports and feature requests on the compound 
browser (and the other browser modules). Please use the feedback button at the 
top of any PDBe web page.

--Gerard

---
Gerard J. Kleywegt, PDBe, EMBL-EBI, Hinxton, UK
gerard at ebi.ac.uk ..................... pdbe.org
Secretary: Pauline Haslam  pdbe_admin at ebi.ac.uk



More information about the ssml-general mailing list