So some good news regarding OpenMMS pdbase batch pipeline<br><br>I've
managed to fix the reading directly from gzip issue! Basically the
support for reading from gz was already there, but there was a bug in
the actual zip reading, it seems that they were using some old zip
parser from java which supported another zip format. Don't know exactly
why, but it worked when I used the GZipInputStream class. I can only
guess that the pre-remediated cif files were compressed in zip format
and the post-remediation ones in gzip and the 2 formats are not the same
(sorry I don't know much about compression)<br>
<br>The new fixed jar is called OpenMMSbatch.jar and is in the pdbase
dir in svn. To recreate it one only needs to check out the openmms/java
dir from svn and it should all be self contained and build directly in
eclipse. Once there to generate the jar do Export->as Runnable Jar
and this creates a self contained jar that includes the mysql connector.<br>
<br>Second thing I've done is fix a few issues with hard-coded db
parameters and upper case of table names in the load scripts (it worked in lower case in
molgen because we were using mysql in ignore-case mode)<br><br>So now it should be portable enough to work in any site as long as you have an rsync copy of the PDB mmCIF repo.<br>
<br>Next step would be making it work in incremental mode. The good news
is one can upload by batches (BTW the loader does nothing if one tries
to load an already loaded file). Then the loader can also do deletion of entries by passing a command like:<br><br>java -cp OpenMMSbatch.jar org.rcsb.openmms.apps.rdb.<div>PDBase LenientParse \<br>data=/nfs/data/dbs/pdb/data/structures/all/mmCIF \<br>
manifest=file:///nfs/data/dbs/pdb/ls-lR \<br>log=PDBASE.LOG \<br>
exclude=ExcludeStructureIDs.list \<br>entries=102M,102D,103D,105D \<br>pdblist=allpdb_ex1.list \<br>dbUrl=jdbc:mysql://localhost/pdbase dbDrv=com.mysql.jdbc.Driver dbUsr=user dbPwd=pwd \<br>action=DeleteSingleEntry<br></div>
<br>At the moment loadpdb.sh will only do a full load from scratch
(or a batch load of a few entries). In principle it's possible to modify
it to work in an incremental mode. I'll post again if I do so.<br>
<br>For the record the relevant dirs in the svn repo are
svn://<a href="http://bioinformatics.org/svnroot/pdbwiki/trunk/openmms" target="_blank">bioinformatics.org/svnroot/pdbwiki/trunk/openmms</a> (modified java source code) and
svn://<a href="http://bioinformatics.org/svnroot/pdbwiki/trunk/pdbase" target="_blank">bioinformatics.org/svnroot/pdbwiki/trunk/pdbase</a><font color="#888888"> </font>(stand-alone jar for openmms-batch and scripts to create pdbase from scratch) <br>
<font color="#888888">
<font color="#888888"><br>Jose<br></font></font>