Colombo v3.8
SigiHMM - Prediction of Genomic Islands in Prokaryotic Genomes 
--------------------------------------------------------------

WHAT IS COLOMBO?
COLOMBO is a software framework equipped with a GUI for the statistical
analysis of sequences of a genome. It can be supplied with different plugins 
that actually perform the analysis. The current version comes with SigiHMM, a 
tool for the prediction of Genomic Islands, as demonstrated on the ECCB 2006.
Later Versions of Colombo are planned that will integrate plugins for various
other applications.


INSTALLING COLOMBO
To install Colombo, just copy the directory Colombo anywhere you like. You will
also need a working version of the Java Virtual Machine. Colombo assumes that
a Sun Java interpreter is installed and accessible by the command "java". 
If you don't have java in your path, or the Java runtime environment has a
different name on your system, in the file Colombo replace "java" by the 
command you want to invoke the JRE with. In this case, you might also adapt 
the command line parameters to the needs of your Java distribution.


RUNNING COLOMBO
Colombo has been compiled and tested thoroughly under version 1.5.0 of the 
Sun Java interpreter. It should also run under Java 1.4.2, though problems
have been reported for some Java 1.4.2 packages coming with SuSE linux: 
it might happen that the Virtual Machine crashes due to an internal error. 

To run Colombo from a unix platform, just type 
./Colombo 
from within the directory "Colombo".
To run Colombo from a windows platform, just type
Colombo.bat
from within the directory "Colombo"


CREATING POSTSCRIPT FILES
Problems can arise when creating an ps-file from a gff-file out of the user 
interface because of a bug in the Java Runtime Environment. 
The execution of scripts with large inputs or outputs does not work properly.
The newest version of the Java Runtime Environment seems to have fixed this 
problem. If it does not work for you, it is still possible to produce the 
ps-output by using gff2ps from commandline with the included options-file. 
All required files are located in the directory external/gff2ps. 
You can produce a gff2ps output by issuing the command

./run-gff2ps <gff-output> [-T <title> -t <subtitle>]

where title and subtitle are optional. It is also possible to pass
more gff2ps command-line parameters to the script, or change the
settings in the COLOMBO-specific options file (with the name
./external/gff2ps/colombo-gff2ps.rc). See the gff2ps manual for details.
Note that running gff2ps is only possible on UNIX systems.


FILES AND PATHS
At present, Colombo only accepts EMBL format flat files as input source for the
evalated genomes. It will also produce EMBL output. Enter the paths to the
corresponding files into the input lines.

In the Settings menu, you will be asked for a path to a list of Codon Usage
Tables presented in a *.cut file. Each such file contains a number of Codon
Usages serving as a model for possible donors. A Codon Usage is specified with 
a description, followed by a newline and 64 nonnegative numbers, interpreted 
as Codon occurrences. The order of Codons is the following:
CGA CGC CGG CGU AGA AGG CUA CUC CUG CUU UUA UUG UCA UCC UCG UCU 
AGC AGU ACA ACC ACG ACU CCA CCC CCG CCU GCA GCC GCG GCU GGA GGC 
GGG GGU GUA GUC GUG GUU AAA AAG AAC AAU CAA CAG CAC CAU GAA GAG 
GAC GAU UAC UAU UGC UGU UUC UUU AUA AUC AUU AUG UGG UAA UAG UGA
The default file is external/cut/default.cut

If you like to run the genome viewer Artemis from within Colombo, you can
specify the corresponding path in the Paths dialog of the Settings menu.


RESULTS AND STATES
Colombo uses the following keywords to represent gene-classes:
    NORMAL = Normal/Native gene.
    PUTAL  = Putative alien gene. Possible donors are available in the advanced
	     classification table.
    INCON  = Normal/native gene within an island. Those are referred to as 
	     inconspicious genes.
Colombo makes use of an additional outlier-test. If a gene is rated as being an
outlier, the prefix "OUT" is added to the keyword, resulting in:
    OUTNORMAL
    OUTPUTAL


INCLUDING CUSTOM DONORS
You might want to include custom donor candidates in Colombo's prediction.
The Donors-Dialog enables you to add those from an EMBL-File. You may save 
the resulting CUT-File somewhere else and include it through the Paths-Dialog
as described above.


OPTIONS
The plugins option of the Settings menu is for the later integration of
additional models. At present there are 2 plugins: two different versions
of SigiHMM (tbSigiHMM and okSigiHMM). Both should produce equivalent 
output, but might have different running times: the latter appeared to be
faster in the majority of cases.


BATCH PROCESSING
There are also two standalone commandline versions of the plugins for 
automated script-guided runnings. They can be invoked with
one of the commands

java SigiHMM [options]
java okSigiHMM [options]

also from within the directory Colombo. To see a list of available options and 
their default values, use "help" as option.


If any problems arise, feel free to contact the authors:
Thomas.Brodag@T-Online.de
keller@cs.uni-goettingen.de
