[OpenMMS] Batch load from gz files and incremental update

Dan Bolser dan.bolser at gmail.com
Thu Jul 28 10:25:49 EDT 2011

Great work Jose!

Good to see that the project is alive!

On 28 July 2011 15:04, Jose M. Duarte <jose.m.duarte at gmail.com> wrote:
> So some good news regarding OpenMMS pdbase batch pipeline
> I've managed to fix the reading directly from gzip issue! Basically the
> support for reading from gz was already there, but there was a bug in the
> actual zip reading, it seems that they were using some old zip parser from
> java which supported another zip format. Don't know exactly why, but it
> worked when I used the GZipInputStream class. I can only guess that the
> pre-remediated cif files were compressed in zip format and the
> post-remediation ones in gzip and the 2 formats are not the same (sorry I
> don't know much about compression)
> The new fixed jar is called OpenMMSbatch.jar and is in the pdbase dir in
> svn. To recreate it one only needs to check out the openmms/java dir from
> svn and it should all be self contained and build directly in eclipse. Once
> there to generate the jar do Export->as Runnable Jar and this creates a self
> contained jar that includes the mysql connector.
> Second thing I've done is fix a few issues with hard-coded db parameters and
> upper case of table names in the load scripts (it worked in lower case in
> molgen because we were using mysql in ignore-case mode)
> So now it should be portable enough to work in any site as long as you have
> an rsync copy of the PDB mmCIF repo.
> Next step would be making it work in incremental mode. The good news is one
> can upload by batches (BTW the loader does nothing if one tries to load an
> already loaded file). Then the loader can also do deletion of entries by
> passing a command like:
> java -cp OpenMMSbatch.jar org.rcsb.openmms.apps.rdb.
> PDBase LenientParse \
> data=/nfs/data/dbs/pdb/data/structures/all/mmCIF \
> manifest=file:///nfs/data/dbs/pdb/ls-lR \
> log=PDBASE.LOG \
> exclude=ExcludeStructureIDs.list \
> entries=102M,102D,103D,105D \
> pdblist=allpdb_ex1.list \
> dbUrl=jdbc:mysql://localhost/pdbase dbDrv=com.mysql.jdbc.Driver dbUsr=user
> dbPwd=pwd \
> action=DeleteSingleEntry
> At the moment loadpdb.sh will only do a full load from scratch (or a batch
> load of a few entries). In principle it's possible to modify it to work in
> an incremental mode. I'll post again if I do so.
> For the record the relevant dirs in the svn repo are
> svn://bioinformatics.org/svnroot/pdbwiki/trunk/openmms (modified java source
> code) and svn://bioinformatics.org/svnroot/pdbwiki/trunk/pdbase (stand-alone
> jar for openmms-batch and scripts to create pdbase from scratch)
> Jose
> _______________________________________________
> OpenMMSusers-general mailing list
> OpenMMSusers-general at bioinformatics.org
> http://www.bioinformatics.org/mailman/listinfo/openmmsusers-general

More information about the OpenMMSusers-general mailing list