[BiO BB] THESEUS, a program for maximum likelihood superpositioning of macromolecules
Douglas L. Theobald
dtheobald at brandeis.edu
Tue Sep 26 12:23:39 EDT 2006
Announcing a fundamentally new way to superimpose structures: Maximum
likelihood instead of least squares.
http://www.theseus3d.org/
The Program:
THESEUS is a unix command line program for performing maximum likelihood
(ML) superpositions and analysis of macromolecular structures. While all
conventional superpositioning methods use ordinary least-squares as the
optimization criterion, THESEUS uses maximum likelihood, which provides
superpositions with substantially improved accuracy (see the figure at
http://www.theseus3d.org/ for an example). When superpositioning
macromolecules with different residue sequences, other programs and
algorithms currently discard residues that are aligned with gaps.
THESEUS, however, uses a novel ML algorithm that includes all of the
available data.
The Rationale:
Over 30 years ago, Cox, Diamond, McLachlan, Kabsch, and others
investigated and solved the least-squares superposition problem for
macromolecular structures (Flower 1999), and the least-squares method
has been used effectively ever since for comparing structures. However,
least-squares is not ideal. As a fitting criterion, least-squares is
based theoretically on two strong assumptions: (1) that all atoms in a
structure have the same variability and (2) that all atoms are
independent and uncorrelated. We know that both of these assumptions are
false. Some regions of a structure are more variable than others, and
atoms are connected to each other via chemical bonds. The ML method used
by THESEUS properly down-weights variable structural regions and
corrects for correlations among atoms.
The Benefits:
ML superpositioning is robust and insensitive to the specific atoms
included in the analysis. In current practice, regions of structures
that are considered "unsuperimposable" or divergent are subjectively
excluded from the superposition. However, when doing a ML
superposition, you do not need to hand prune selected variable atomic
coordinates, since the variability is already accounted for in the ML
method. ML superpositioning will greatly improve our ability to
accurately compare biological macromolecules in many applications,
including analysis of NMR families, alternate crystal structures,
evolutionarily homologous molecules, molecular dynamics simulations, and
de novo structure predictions.
Output from THESEUS includes both likelihood-based and frequentist
statistics for evaluation of the adequacy of a superposition and for
reliable analysis of structural similarities and differences. Residue
ranges for excluding/including in the superposition can be specified on
the command line. For ease of comparison, THESEUS will also calculates
least-squares superpositions. Additionally, THESEUS performs principal
components analysis (PCA) for analyzing the complex correlations found
among the atoms and residues within a structural ensemble.
Source code and binaries for several platforms are available from:
http://www.theseus3d.org/
Refs:
Theobald, D.L. and Wuttke, D.S. (2006)
"THESEUS: Maximum likelihood superpositioning and analysis of
macromolecular structures."
Bioinformatics 22(17):2171
http://bioinformatics.oxfordjournals.org/cgi/content/abstract/22/17/2171
Overview of mathematical results and algorithm (supplementary materials
from Theobald & Wuttke 2006):
http://www.theseus3d.org/pdfs/
Theobald_Wuttke_2006_Bioinformatics_THESEUS_SuppMat.pdf
Theobald, D. L. and Wuttke, D. S. (2006)
"Empirical Bayes hierarchical models for regularizing maximum likelihood
estimation in the matrix Gaussian Procrustes problem."
PNAS, in press
Cox, J. M. (1967)
"Mathematical methods used in the comparison of the quaternary
structures."
J Mol Biol, 28, 151–156.
Diamond, R. (1966)
"A mathematical model-building procedure for proteins."
Acta Crystallogr, 21, 253–266.
Diamond, R. (1976)
"On the comparison of conformations using linear and quadratic
transformations."
Acta Crystallogr A, 32, 1–10.
Flower, D. R. (1999)
"Rotational superposition: A review of methods."
J Mol Graph Model, 17, 238–244.
Kabsch, W. (1978)
"A discussion of the solution for the best rotation to relate two sets
of vectors."
Acta Crystallogr A, 34, 827–828.
McLachlan, A. (1972)
"A mathematical procedure for superimposing atomic coordinates of
proteins."
Acta Crystallogr A, 28, 656–657.
More information about the BBB
mailing list