The BIOJAVA interface in STRAP
BioJava is a set of modules
and packages for biology, including sequence analysis, database
access, and parsers for sequence files.
Mark Schreiber maintains an excellent introduction, with many examples:
BioJava Cookbook.
An interface for
BioJava is provided to allow authors of STRAP-plugins or -scripts to
use the BioJava API.
Vice-versa, BioJava projects can use the STRAP API.
Two classes are provided to convert objects between
both tool-kits:
One makes an
GappedSequence object
from a
StrapProtein instance. The other class creates a StrapProtein object from a GappedSequence object.
The sequence position specific features contained in the classes are also transformed.
Testing the STRAP-BioJava-interface
Plugins for STRAP can be created, started, and modified at runtime.
A few demo-plugins are enclosed in STRAP to exemplify the usage of plugins.
When STRAP is started there are several possibilities to get some protein files for testing
into STRAP.
In the menu
Plugins of the toolbar is a menu item
Start standard plugin .....
There you can select the BioJava example.
Comparing BioJava and the STRAP-API
Similarities:
- Both provide comprehensive collections of methods for protein sequences.
- Both are used by Java programmers for coding Bioinformatics algorithms.
- Both separate implementations and definitions by using java interfaces.
- Both are open source projects.
- Both can read and write many sequence file formats.
Differences between BioJava and STRAP:
-
BioJava is applicable to nucleotide and peptide sequences and can be applied for entire
genomes.
STRAP cannot cope with single sequences as long as an entire chromosome.
Instead STRAP manipulates peptide sequences and 3D-
structures of the size of single proteins.
Nevertheless, it can hold a high number of sequences and structures in memory.
STRAP is designed for protein sequences but can read
coding nucleotide files, which are then translated to peptide sequences.
-
STRAP is very fast since the graphical user interface must be highly responsive.
BioJava is used where speed is less critical.
-
BioJava is well designed in terms of type safety, ontology and object design.
BioJava uses objects for sequences, annotations and sequence positions.
Even single amino acids or nucleotides are object references.
To enhance speed, STRAP avoids frequent object instantiations and invocation of non-final
object-methods to enhance speed.
-
In BioJava peptide sequences and nucleotide sequences are
lists of symbols. The symbols can be retrieved one after
the other with an iterator or sub-sequences can be
obtained. The advantages are that the entire
sequence does not necessarily reside in memory and that programs are less
susceptible to programming errors.
Symbol objects are immutable elements of an alphabet. In
STRAP however simple byte arrays are used for sequences
and float arrays for coordinates. Besides speed the low
memory consumption is an important advantage of basic data
types.
Classes in Strap expose internal data.
Therefore programmers might commit
programming errors like manipulating byte arrays directly instead of using the setter methods.
Another disadvantage is that no checks are performed in STRAP whether the characters in
sequences are valid with respect to an underlying alphabet.
-
In BioJava sequence positions are realized by the class Location.
Discontiguous Location objects are composed of several contiguous
RangeLocation objects or PointLocation objects.
For the class StrapProtein however, single residue positions are indicated by
integer numbers between 0 and countResidues()-1.
Multiple positions are given by boolean arrays.
True at a given index means selected whereas false means not selected.
-
BioJava throws exceptions when methods are invoked with invalid parameters.
STRAP avoids the time consuming creation of Throwable objects.
Instead, errors in methods are indicated by the return values NaN, -1 or null.
From the point of program design however Throwable objects are nicer.
-
In BioJava a Sequence object is either a peptide sequence or a nucleotide sequence.
A StrapProtein can hold both at the same time if a coding nucleotide sequence
was read and translated into protein.
Both, the nucleotide sequence and the peptide sequence are contained
in the same StrapProtein object.
The coding or non-coding regions can be changed and the peptide sequence alters accordingly.