[Pdbwiki-devel] PDB version 4
dan.bolser at gmail.com
Wed Jul 27 13:28:15 EDT 2011
On 27 July 2011 18:11, Jose M. Duarte <jose.m.duarte at gmail.com> wrote:
> So some good news regarding OpenMMS pdbase batch pipeline
> I've managed to fix the reading directly from gzip issue! Basically the
> support for reading from gz was already there, but there was a bug in the
> actual zip reading, it seems that they were using some old zip parser from
> java which supported another zip format. Don't know exactly why, but it
> basically worked when I used the GZipInputStream class.
> The new fixed jar is called OpenMMSbatch.jar and is in the pdbase dir in
> svn. To recreate it one only needs to check out the openmms/java dir from
> svn and it should all be self contained and build directly in eclipse. Once
> there to generate the jar do Export->as Runnable Jar and this creates a self
> contained jar that includes the mysql connector.
> Second thing I've done is fix a few issues with hard-coded db parameters and
> upper case of table names (it worked in lower case in molgen because we were
> using mysql in ignore-case mode)
> So now it works in here too! :)
Awesome work so far!
> Next step would be making it work in incremental mode. The good news is one
> can upload by batches (BTW the loader does nothing if one tries to load an
> already loaded file). In order to have a full incremental pipeline we still
> need to be able to remove entries from the db. Is that possible at all with
> the loader?
IIRC, the 'loader' also supports a delete command, where you pass it a
list of PDB entries to delete. We just need to work out how to
generate the list of files to delete (and then update) without
recourse to the file time stamps (which is what we use currently
We discussed tracking the data in the OBSOLETE file, but rejected that
idea because it changes weekly, so we need to make sure not to skip a
week... However, in comparison to building the whole db from scratch
every week, it doesn't seem like too much of a compromise. I'm sure
there is a good solution in the data somewhere.
Once we have a reliable way to generate the list of deletes, updates,
and additions, we can then pass those to the loader appropriately.
Fantastic to see the project hasn't died! :D
All the best,
> For the record the relevant dirs in the svn repo are
> svn://bioinformatics.org/svnroot/pdbwiki/trunk/openmms and
> Pdbwiki-devel mailing list
> Pdbwiki-devel at bioinformatics.org
More information about the Pdbwiki-devel