Interactive example of overlapping residue annotations. The first
sequence has two residue selections indicated by cyan and red
background. The second sequence exhibits two residue
selections which are shown as red and green underlining. The
text information pops up when the mouse is moved.
BASE=http://www.bioinformatics.org/strap
FILE=$BASE/strap.jar
wget -N $FILE || curl -O $FILE
FILE=$BASE/scripts/toHTML1.txt
wget -N $FILE || curl -O $FILE
FILE=$BASE/toHTML/data/fly_temp.gif
wget -N $FILE || curl -O $FILE
FILE=$BASE/aa/alignment2html.jar
wget -N $FILE || curl -O $FILE
export JavaProxy=' -DproxyHost=proxy.institution.org -DproxyPort=8080 -Dhttp.nonProxyHosts="" '
Test the proxy settings. You should see google's html code.
java $JavaProxy -jar strap.jar -testWeb http://www.google.com
java $JavaProxy -jar strap.jar -script=toHTML1.txt -toHTML=myOutput.htmlThe output file myOutput.html is ready to be displayed in a web browser.
java -jar alignment2html.jar -help
-help=script. Also see Scripting language. The following three lines specify the number of characters per line, the minimal conservation of a residue position to be emphasized in bold face and the residue color mode.
set_characters_per_line 24
set_conservation_threshold 70
set_color_mode chemical
Three sequences are created: Canus and Xenopus and
Drosophila. Dashes denote alignment gaps. Dashes would be not not required
if the alignment was computed with the command
align *.
aa_sequence MVLSAADKGNVKAAWGKVGGHAAEYGAEALERMFLSFPTTKTYFP, Canus
aa_sequence -VLSAAERAQVKAAWGKI--QAGAHGAEALERMFLGFPTTKTYPF, Xenopus
aa_sequence MILSAAERAQIKAAWGKVG-NAGAHGAEALD--FLGYPTTKSYPY, Drosophila
Assigning the cleaved initial Methionine the index 1, the Xenupus sequence starts with amino acid number 2:
set_residue_index_offset 1, XenopusThe protein image icons are shown in the alignment row header. For the first two icons the URL is given. The last icon is loaded from a local file which is not accessible from other computers and the image data is included into the HTML file:
icon http://www.goldenweb.it/software/immagini/icone/animals/water_animals/Frog.gif, Xenopus
icon http://www.goldenweb.it/software/immagini/icone/animals/misc_animals/dog1.gif, Canus
icon fly_temp.gif, Drosophila
If the database accession ID is given, a blue asterisk after the sequence name acts as a hyper-link:
accession_id UNIPROT:P0A7B8 , Canus
Residue selections are created with the command new_selection.
Two display styles are supported: STYLE_BACKGROUND and
STYLE_UNDERLINE.
Color, display style,
balloon-text and web-links are specified with add_annotation or
set_annotation.
new_selection 1-4, Canus/N-terminus
set_annotation Hyperrefs=http://en.wikipedia.org/wiki/N-terminus, Canus/N-terminus
set_annotation Style=STYLE_BACKGROUND, Canus/N-terminus
set_annotation Color=#00ffFF, Canus/N-terminus
add_annotation Balloon=Balloon text blablabla, Canus/N-terminus
The command set_annotation overrides any previous value, whereas add_annotation keeps already existing lines.
Splice variants of Hexokinase. The size of the alignment exceeds the window size and therefore it can be scrolled. These sequences are loaded from nucleotide sequence files. Therefore the coding triplet and exon number of the amino acid under the mouse pointer is shown.
project_coordinates AUTO, *
add_annotation 3D_view=3D_spheres, Canus/N-terminusAll annotations with the key 3D_view are sequentially executed.
open_3D NameOfView , List of proteinsIf the 3D view contains more than one peptide chain, one of them can be specified with select_3D.
select_3D NameOfView , one_proteinAll 3D commands have the prefix "3D_" and often resemble Rasmol and Jmol. The command 3D_select selects amino acids or atoms. It should not be mixed up with the command select_3D.
3D_select 20-100Optional, the atom types such as carbon alpha and carbon beta can be appended.
3D_select 20-100.CA.CBThe style of these atoms can be changed with commands starting with the prefix "3D_".
3D_spheres on
load UNIPROT:P49722 UNIPROT:P0A272 PDB:1ryp_CExample with URLs:
load http://www.bioinformatics.org/strap/dataFiles/hs_HelicobacterPylori.swiss http://www.bioinformatics.org/strap/dataFiles/hs_SalmonellaTyphi.swissA subsequence rather than the entire amino acid sequence may be anticipated. The residue index intervall is appended after an exclamation mark. One of both intervall boundaries can be omitted. Example:
load UNIPROT:P49722!30-60Optionally, a protein name can be given after a vertical bar. Example:
load PDB:1ryp_C|My_nameThe name can contain the following variables: $ORGANISM, $ORGANISM_SCIENTIFIC, $ORGANISM5 (E.g. "DroMe" for Drosophila melanogaster), $NAME (The original name), $PDB, $SP (Swissprot name like "hslv_ecoli") and $SP1 (First part of Swissprot name like "hslv").
align *The wildcard "*" or ".*" means all sequences. Alternatively, a space separated list of protein names, database IDs and regular expressions matching protein names will be accepted. By default ClustalW (Precompiled binary for Intel) and CE/CL (Java) will be used. The 3D-alignment program TM-align (Fortran) is faster than CE/CL. You could install TM-align from the software manager of your computer. Under Debian:
apt-get install clustalw
tm-align Alternatively install a Fortran compiler. Then
add the program
option -a3d=tm_align. There are a few alternatives to ClustalW, some of which produce more accurate results but require more time. They will be expected in the /usr/bin/ directory for example /usr/bin/t_coffee. They can also be automatically loaded and installed. The unattended software installation from source code requires the software installation tools make and C++.
DAS_features CSA%20-%20extended uniprot cbs_total netphos netoglyc , *and the GFF features from the Expasy server are loaded with
GFF_expasy_features *The "%20" in the feature name is the hexadecimal character code for white space. After loading the data from the remote servers, the sequence positions are underlined in the alignment. The DAS-annotation providers are listed in the standard BioDAS registry file or in supplementary registry files given at the command line. Underlining these sequence annotations is time consuming. At least the identification of the UNIPROT identifier, can be accellerated by a local BLAST database and a local Uniprot as described below.
for i in ; do
FILE=http://www.bioinformatics.org/strap/toHTML/scripts/$i.txt
wget -N $FILE || curl -O $FILE
java $JavaProxy -jar strap.jar -script=$i.txt -toHTML=$i.html || break
done
| christophgil | ![]() |
goog | lemail | . | com |