[OpenMMS] pdb-l: Cleanup of PDB data? Community Freebase Gridworks project?

Dan Bolser dan.bolser at gmail.com
Thu May 13 14:59:47 EDT 2010


Hi Christine,

The problem wasn't with the ?/null cases, I just reported those for
the sake of consistency.

Will you look into the cases where, for example, 407 water molecules
are assigned an EC number?


Cheers,
Dan.

On 13 May 2010 20:51, Christine Zardecki <zardecki at rcsb.rutgers.edu> wrote:
> Hi Dan,
>
> The cases that you describe are non-mandatory data items.  ?/null can be
> used when the depositor does not have additional information to provide.
>
> The wwPDB would appreciate any comments or reports about data at
> info at wwpdb.org.  We'll review and evaluate to see if there are errors that
> should be corrected immediately, or that would be included in a future
> remediation of the archive.
>
> Sincerely,
>
> Christine Zardecki
> for the wwPDB
>
> On May 12, 2010, at 7:56 PM, Dan Bolser wrote:
>
>> Hi all,
>>
>> There is an interesting tool called 'Freebase Gridworks':
>>
>> http://code.google.com/p/freebase-gridworks/
>>
>>
>> Basically it makes 'cleaning up' tables of data really easy,
>> automating most of the things you find yourself doing when presented
>> with 'user input', including interrogating the data for
>> inconsistencies.
>>
>> It seems ideal for 'looking at' biological data, so I decided to test
>> it out on the mmCIF Entity table from a recent dump of the PDB (May
>> 9th).
>>
>> The tool quickly allowed the identification of the following list of
>> 22 inconsistencies in the data (focusing initially on the 52,289 water
>> entities, which are by far the most standard of the three types of
>> entity):
>>
>> * 52288 water entities have the 'details' field set to '?', in entry
>> 1dcn it's set to "ARGININOSUCCINATE BOUND TO ONE ACTIVE SITE".
>>
>> * 41030 water entities have the 'pdbx_mutation' field set to '?' and
>> another 11256 are NULL. In entry 3igf it's "A127T", in 1tm0 it's
>> "protein has C-tag (LEHHHHHH)", and in 2huk it's "C97A".
>>
>> * The 'pdbx_ec' field is set to other than '?' or NULL in four cases,
>> 1fp3, 3dhy, 3d7s, and 1mpx.
>>
>> * The 'pdbx_fragment' field is set to other than '?' or NULL in 14
>> cases. In 8 cases it's set to water, Water or WATER (1dqd, 1em8, 1ijk,
>> 1pnz, 1po0, 1rc5, 1yvm, and 2g75). The six remaining cases, including
>> the interesting "WATER MOLECULES WITH RESIDUE NUMBER 1011 AND 1053 ARE
>> MOST PROBABLY AMMONIUM IONS." are: 1d5w, 1ee4, 1eke, 1ijq, 1rc7, and
>> 3l0l.
>>
>>
>>
>> I'm planning to keep looking at the various tables in the PDB,
>> however, my hope is that these changes can be pushed back onto the
>> central PDB archive. Previously, changes like this were not readily
>> updated into the archive, is this still true (post-remediation)?
>>
>> I'm not sure how best to use Freebase Gridworks collaboratively.
>> Ideally we could all work together on the same remediation project,
>> however, I'm not sure if it supports that kind of collaborative
>> editing. In any case, I'm happy to share my current project file with
>> anyone who is interested. It should be interesting to start clustering
>> values to detect category typos and reconciling species names against
>> Freebase.
>>
>>
>> Cheers,
>> Dan.
>>
>> TO UNSUBSCRIBE OR CHANGE YOUR SUBSCRIPTION OPTIONS, please see
>> https://lists.sdsc.edu/mailman/listinfo.cgi/pdb-l .
>
>



More information about the OpenMMSusers-general mailing list