From Bioinformatics.Org Wiki
Upgrade to Java 1.6
In build.xml, compile.version.vm has been changed to 1.6, meaning we now are compliant with
Java 1.6. Even 1.6 is getting old. One word of caution - we have to be careful where we compile Java programs. If you compile it on a system that defaults to a newer Java, it won't work on older Java's. It would be good to move to Java 1.7, but we have to determine whether it is reasonable to conclude that the number of older systems still running 1.6 is negligible.
Making BIRCH documentation web-accessible
This can still be done using birchconfig, but for future releases it is best to retire this program. We should be able to incorporate this into Preferences --> Settings, by running a script. It might be possible to create a Java application from the existing birchconfig code.
Need mechanism for BioLegato to run commands in the background
At present, there is no way for PCD shell commands to run jobs in the background. That is, the Java Virtual Machine cannot terminate until every shell command has terminated. Even if the command ends with an ampersand, it must terminate before the JVM will terminate. That is an annoyance when we want displayed output to persist even after a BioLegato job has terminated, and a potentially major problem if we want to launch long-running or resource-intensive jobs from BioLegato.
It's probably best to write a short demo program to experiment with different approaches.
- src/BioPCD/parser/src/org/biopcd/parser/CommandThread.java is the object that calls Runtime.getRuntime to execute commands as new threads. see shellCommand in this file.
- As a temporary workaround, we can call scripts though a bash wrapper that uses nohup and & to run the script in the background.
- How do I launch a completely independent process from a Java program?
- Runtime.getRunTime().exec not behaving like C language “system()” command
- Is there a Null OutputStream in Java?
- Create threads to run in background
Solution: In BioLegato 1.0.3, CommandThread.java has been modified so that if a command line ends in '&', it will be run in the background.
Remaining issues (for further work in a later release):
- On Linux, we can run anything we want in the background using &. This doesn't always work in MacOSX, for reasons that are not clear. If you launch output in the background, the text editor doesn't pop up until the parent BioLegato process is terminated. It looks like what we have to do is to only use & when output is being sent to files. The rest of the time, we don't use &. This is probably okay, because if you logout, you certainly don't expect or want viewers to persist. If you don't logout, there's no need to kill BioLegato. One important thing is to first remove & from the birch launcher and birchadmn. We need to do more thorough testing on the remaining BioLegato programs. In some cases, there may be a need to run programs in the background through wrappers.
- Most shell commands run the command in the background. However, if temp files aren't explicitly saved, they get deleted before the command has a chance to process them. Therefore, we need to go through the .blmenu files and make sure that "save true" is included for all input file declarations eg in1.
birch birchadmin bldna
Remaining ones will be done in BIRCH 3.40.
NUMSEQ, BACHREST and PRIMER3 don't launch the text editor until after the calling Java VM has terminated ie. BioLegato. Why these three programs, and not, for example, TESTCODE or any of the others?
newstr.param gets overwritten - It appears that during an update, newstr.param gets overwritten by birch_install.py. A quick glance at the script seems to indicate that the values from the existing copy of the file are read, and should be retained in the copy that gets written out, but obviously, there is a problem. This is mainly an issue when updating a system in which newstr.param contains URL info so that documentation can be read by http. Solution: Modified birch_install.py to read BIRCH.properties file so that the correct values for birchhomeURL and birchURL get read during an update. These are set by default in birch_install.py, which overwrites newstr.param with whatever values it has. Previously, it never got the new values, which were only stored in newstr.param, so they would be lost during an update. Now, these values are stored in BIRCH.properties, which can only be changed using the newly-added BIRCH Properties tool, which can be launched from birchadmin. Still need to test on albacore and psgendb BIRCH sites
Compliance with NCBI phase-out of GI numbers
GI numbers still need to be handled if they're there, but are not depended upon
BLAST+ programs, when run through BioLegato, still pop up GI numbers, rather than Accession numbers in blnfetch and blpfetch. Oct. 4: This has been fixed for BLAST output, and FASTA output for Uniprot. However, for some databases, like Patented, FASTA hit lines start 'gi|accession' rather than 'gi|GI'. We need to tweak dbsout.py to correctly parse ACCESSION numbers. Maybe we should check to see which of the tokens has numerals only, and which has letters beside numerals? Need to create or retrieve a test dataset without GI numbers. May be best to write a script that removes GI numbers from GenBank entries. Programs that take GenBank entries as input need to be tested.script/delgi.py. GenBank release notes were reviewed to make sure delgi.py handles all legal GenBank files.
- Test programs for compliance, and check source code where possible:
BioLegato - tested reading and writing GenBank files;checked GenBank2008 class
features,splitdb, Open_XYLEMGB, Export Foreign,SaveSel,ViewSeq,CopyOut,PasteIn,Extract PrettyPrint TACGrest
BLASTN,BLASTX,TBLASTX Artemis dialign-tx tcoffee blncbi - The default entrez summary only has GI numbers. Need to create a custom summary that gives ACCESSION numbers instead.This was fixed in ncbiquery.py by replacing "Gi" in the -element list with "Caption".Still need to check whether this works. Once Accession.Verison goes live, &id will change to Accession.Version in XML records. Also, NCBI says that in late Sept. you need to include the new &rettype=acc parameter in all Eutils requests. Should be able to find BioLegato PCD files using GenBank format by using find command.
BLAST/FASTA scripts need to be able to handle the new format, in which BLAST output will presumably no longer contain GI numbers.
Compliance with NCBI switch from HTTP to HTTPS
Upgrade BioPython to v1.68 (Completed on linux-x86_64, but does this need to be propagated to OSX? A: Works on OSX. For functions tested, BioPython appears to be platform-neutral.
- Upgrade NCBI applications:
BLAST+ (will be fixed in BLAST+ 2.5.0) Sequin Jalview? - Jalview probably not afftcted. It has a DBfetch utility, but that looks like it goes to PDB, not NCBI. Last version was in Oct. 15, version 2.9.0b2 Artemis? - Apparently, Artemis only knows how to fetch sequences from EBI, so the NCBI change shouldn't impact on Artemis. Artemis 16 already retrieves by accession number, not GI.
probably need a common script for running blast and presenting output. This could potentially be fixed if we can just get BioLegato to truly launch programs in the background. possible multiple concurrent output formats multicolumn output for blnfetch, blpfetch for use by blnfetch and blpfetch, blsort.py needs to be upgraded to be able to recognize and to sort read numbers. Hopefully, Python already has a way to do this. *Is there an NCBI script of program that gets results back in something like ASN1, and then lets you generate as many different output formats from that report as you want?
-Yes: blast_formatter. See demo scripts blast2multi.sh and fasta2multi.sh in BIRCHDEV/test/BLASTbenchmark/BIRCHbenchmark. With blast, write -outfmt 11 (XML), and then use blast_formatter to create as many reports as you wish.