[OpenMMS] pdb-l: Cleanup of PDB data? Community Freebase Gridworks project?
dan.bolser at gmail.com
Thu May 13 14:59:47 EDT 2010
The problem wasn't with the ?/null cases, I just reported those for
the sake of consistency.
Will you look into the cases where, for example, 407 water molecules
are assigned an EC number?
On 13 May 2010 20:51, Christine Zardecki <zardecki at rcsb.rutgers.edu> wrote:
> Hi Dan,
> The cases that you describe are non-mandatory data items. ?/null can be
> used when the depositor does not have additional information to provide.
> The wwPDB would appreciate any comments or reports about data at
> info at wwpdb.org. We'll review and evaluate to see if there are errors that
> should be corrected immediately, or that would be included in a future
> remediation of the archive.
> Christine Zardecki
> for the wwPDB
> On May 12, 2010, at 7:56 PM, Dan Bolser wrote:
>> Hi all,
>> There is an interesting tool called 'Freebase Gridworks':
>> Basically it makes 'cleaning up' tables of data really easy,
>> automating most of the things you find yourself doing when presented
>> with 'user input', including interrogating the data for
>> It seems ideal for 'looking at' biological data, so I decided to test
>> it out on the mmCIF Entity table from a recent dump of the PDB (May
>> The tool quickly allowed the identification of the following list of
>> 22 inconsistencies in the data (focusing initially on the 52,289 water
>> entities, which are by far the most standard of the three types of
>> * 52288 water entities have the 'details' field set to '?', in entry
>> 1dcn it's set to "ARGININOSUCCINATE BOUND TO ONE ACTIVE SITE".
>> * 41030 water entities have the 'pdbx_mutation' field set to '?' and
>> another 11256 are NULL. In entry 3igf it's "A127T", in 1tm0 it's
>> "protein has C-tag (LEHHHHHH)", and in 2huk it's "C97A".
>> * The 'pdbx_ec' field is set to other than '?' or NULL in four cases,
>> 1fp3, 3dhy, 3d7s, and 1mpx.
>> * The 'pdbx_fragment' field is set to other than '?' or NULL in 14
>> cases. In 8 cases it's set to water, Water or WATER (1dqd, 1em8, 1ijk,
>> 1pnz, 1po0, 1rc5, 1yvm, and 2g75). The six remaining cases, including
>> the interesting "WATER MOLECULES WITH RESIDUE NUMBER 1011 AND 1053 ARE
>> MOST PROBABLY AMMONIUM IONS." are: 1d5w, 1ee4, 1eke, 1ijq, 1rc7, and
>> I'm planning to keep looking at the various tables in the PDB,
>> however, my hope is that these changes can be pushed back onto the
>> central PDB archive. Previously, changes like this were not readily
>> updated into the archive, is this still true (post-remediation)?
>> I'm not sure how best to use Freebase Gridworks collaboratively.
>> Ideally we could all work together on the same remediation project,
>> however, I'm not sure if it supports that kind of collaborative
>> editing. In any case, I'm happy to share my current project file with
>> anyone who is interested. It should be interesting to start clustering
>> values to detect category typos and reconciling species names against
>> TO UNSUBSCRIBE OR CHANGE YOUR SUBSCRIPTION OPTIONS, please see
>> https://lists.sdsc.edu/mailman/listinfo.cgi/pdb-l .
More information about the OpenMMSusers-general