So some good news regarding OpenMMS pdbase batch pipeline<br><br>I've 

managed to fix the reading directly from gzip issue! Basically the 

support for reading from gz was already there, but there was a bug in 

the actual zip reading, it seems that they were using some old zip 

parser from java which supported another zip format. Don't know exactly 

why, but it worked when I used the GZipInputStream class. I can only 

guess that the pre-remediated cif files were compressed in zip format 

and the post-remediation ones in gzip and the 2 formats are not the same

 (sorry I don't know much about compression)<br>


<br>The new fixed jar is called OpenMMSbatch.jar and is in the pdbase 

dir in svn. To recreate it one only needs to check out the openmms/java 

dir from svn and it should all be self contained and build directly in 

eclipse. Once there to generate the jar do Export->as Runnable Jar 

and this creates a self contained jar that includes the mysql connector.<br>

<br>Second thing I've done is fix a few issues with hard-coded db 

parameters and upper case of table names in the load scripts (it worked in lower case in 

molgen because we were using mysql in ignore-case mode)<br><br>So now it should be portable enough to work in any site as long as you have an rsync copy of the PDB mmCIF repo.<br>

<br>Next step would be making it work in incremental mode. The good news

 is one can upload by batches (BTW the loader does nothing if one tries 

to load an already loaded file). Then the loader can also do deletion of entries by passing a command like:<br><br>java -cp OpenMMSbatch.jar org.rcsb.openmms.apps.rdb.<div>PDBase LenientParse \<br>data=/nfs/data/dbs/pdb/data/structures/all/mmCIF \<br>


manifest=file:///nfs/data/dbs/pdb/ls-lR \<br>log=PDBASE.LOG \<br>

exclude=ExcludeStructureIDs.list \<br>entries=102M,102D,103D,105D \<br>pdblist=allpdb_ex1.list \<br>dbUrl=jdbc:mysql://localhost/pdbase dbDrv=com.mysql.jdbc.Driver dbUsr=user dbPwd=pwd \<br>action=DeleteSingleEntry<br></div>


<br>At the moment loadpdb.sh will only do a full load from scratch 

(or a batch load of a few entries). In principle it's possible to modify 

it to work in an incremental mode. I'll post again if I do so.<br>

<br>For the record the relevant dirs in the svn repo are 

svn://<a href="http://bioinformatics.org/svnroot/pdbwiki/trunk/openmms" target="_blank">bioinformatics.org/svnroot/pdbwiki/trunk/openmms</a> (modified java source code) and 

svn://<a href="http://bioinformatics.org/svnroot/pdbwiki/trunk/pdbase" target="_blank">bioinformatics.org/svnroot/pdbwiki/trunk/pdbase</a><font color="#888888"> </font>(stand-alone jar for openmms-batch and scripts to create pdbase from scratch) <br>

<font color="#888888">

<font color="#888888"><br>Jose<br></font></font>