Alignment Annotator -
Browser based sequence alignment visualization with JAVASCRIPT
Scripting interface
This page describes the scripting interface and explains the scripting language of Alignment Annotator by basic examples. Scripts are used in two situations:
When alignment Annotator is used as a visualization tool within other software, the script text is generated programmatically.
The software can be a web service or a locally running software.
The script-text in the variables beforeAC and afterAC are run before and after alignment computation, respectively.
These two web variables may contain the script text directly or the URL of a script file.
There are two possible ways to display the alignment document:
An IFRAME displays the sequence alignment
The web service pages contain Buttons or web links. By click, the alignment is opened in an IFRAME or another tab.
Alignment Annotator can be used from the main server or installed on any Unix/Linux web server.
The installation instructions can be requested from the author.
When advanced program features of Alignment Annotator are not accessible via the graphical user interface.
The two script texts are entered into text fields in the section "Scripts" of the web inteface.
The tabulator key can be used for word-completion of sequence names and script commands.
Script line syntax
Each line starts with a script command and may contain parameters.
The set of commands is a subset of the commands supported in Strap.
Each script line follows one of the following four syntaxes:
Command
Command parameters
Command list-of-sequences-residue-selections
Command parameters, list-of-sequences-residue-selections
The last comma (,) in the line marks the end of parameters and the beginning of the list of sequences and residue selections.
Consequently the list of white space separated sequences and residue selections must not contain a comma.
Server side program flow
The input data is processed in the following order:
The sequences in the text-pane "sequences" are loaded (if any).
The script "beforeAC" is interpreted (if any).
The alignment is computed, unless there are already gaps in the input.
ClustalW is currently used as standard method.
If at this time 3D structures are loaded, mixed sequence/structure alignment is performed.
TM-align is currently the standard 3D alignment method.
Residue annotations are loaded from UniProt, CSA or BioDAS-servers if the respective check-boxes are activated..
The script "afterAC" is run. (if any)
Homologous 3D-structures are identified if the check-box "3D" is activated.
Since alignment computation is performed earlier, structures inferred here have no impact on alignment computation.
Conversely, structures loaded in the text-field "sequences" or inferred by the command project_coordinates in "beforeAC" are considered for alignment computation.
Script Examples
1. Creating sequences
1.1. Amino acid sequences
1.2. Aligned amino acid sequences
1.3. Nucleotide sequences
beforeAC
set_alignment_type_CN
nt_sequence ATGGCATATCCCATACAACTAGGATTCCAAGATGCAACATCACCAATCATAGAAGAACTACTTCACTTTCATGACCACACGCTAATAATTGTCTTCTTAATTAGCTCATTAGTACTTTACATTATTTCACTAATACTAACGACAAAGCTGACCCATACAAGCACGATAGATGCACAAGAAGTAGAGACAATCTGAACCATTCTGCCCGCCATCATCTTAATTCTAATTGCTCTTCCTTCTTTACGAATTCTATACATAATAGATGAAATCAATAACCCATCTCTTACAGTAAAAACCATAGGACATCAGTGATACTGAAGCTATGAGTATACAGATTATGAGGACTTAAGCTTCGACTCCTACATAATTCCAACATCAGAATTAAAGCCAGGGGAGCTACGACTATTAGAAGTCGATAATCGAGTTGTACTACCAATAGAAATAACAATCCGAATGTTAGTCTCCTCTGAAGACGTATTACACTCATGAGCTGTGCCCTCTCTAGGACTAAAAACAGACGCAATCCCAGGCCGTCTAAACCAAACAACCCTTATATCGTCCCGTCCAGGCTTATATTACGGTCAATGCTCAGAAATTTGCGGGTCAAACCACAGTTTCATACCCATTGTCCTTGAGTTAGTCCCACTAAAGTACTTTGAAAAATGATCTGCGTCAATATTATAA, Cow
nt_sequence ATGGCACACCCAACGCAACTAGGTTTCAAGGACGCGGCCATACCCGTTATAGAGGAACTTCTTCACTTCCACGACCACGCATTAATAATTGTGCTCCTAATTAGCACTTTAGTTTTATATATTATTACTGCAATGGTATCAACTAAACTTACTAATAAATATATTCTAGACTCCCAAGAAATCGAAATCGTATGAACCATTCTACCAGCCGTCATTTTAGTACTAATCGCCCTGCCCTCCCTACGCATCCTGTACCTTATAGACGAAATTAACGACCCTCACCTGACAATTAAAGCAATAGGACACCAATGATACTGAAGTTACGAGTATACAGACTATGAAAATCTAGGATTCGACTCCTATATAGTACCAACCCAAGACCTTGCCCCCGGACAATTCCGACTTCTGGAAACAGACCACCGAATAGTTGTTCCAATAGAATCCCCAGTCCGTGTCCTAGTATCTGCTGAAGACGTGCTACATTCTTGAGCTGTTCCATCCCTTGGCGTAAAAATGGACGCAGTCCCAGGACGACTAAATCAAGCCGCCTTTATTGCCTCACGCCCAGGGGTCTTTTACGGACAATGCTCTGAAATTTGTGGAGCTAATCACAGCTTTATACCAATTGTAGTTGAAGCAGTACCTCTCGAACACTTCGAAAACTGATCCTCATTAATACTAGAAGACGCCTCGCTAGGAAGCTAA, Carp
nt_sequence ATGGCCAACCACTCCCAACTAGGCTTTCAAGACGCCTCATCCCCCATCATAGAAGAGCTCGTTGAATTCCACGACCACGCCCTGATAGTCGCACTAGCAATTTGCAGCTTAGTACTCTACCTTCTAACTCTTATACTTATAGAAAAACTATCATCAAACACCGTAGATGCCCAAGAAGTTGAACTAATCTGAACCATCCTACCCGCTATTGTCCTAGTCCTGCTTGCCCTCCCCTCCCTCCAAATCCTCTACATAATAGACGAAATCGACGAACCTGATCTCACCCTAAAAGCCATCGGACACCAATGATACTGAACCTATGAATACACAGACTTCAAGGACCTCTCATTTGACTCCTACATAACCCCAACAACAGACCTCCCCCTAGGCCACTTCCGCCTACTAGAAGTCGACCATCGCATTGTAATCCCCATAGAATCCCCCATTCGAGTAATCATCACCGCTGATGACGTCCTCCACTCATGAGCCGTACCCGCCCTCGGGGTAAAAACAGACGCAATCCCTGGACGACTAAATCAAACCTCCTTCATCACCACTCGACCAGGAGTGTTTTACGGACAATGCTCAGAAATCTGCGGAGCTAACCACAGCTACATACCCATTGTAGTAGAGTCTACCCCCCTAAAACACTTTGAAGCCTGATCCTCACTACTGTCATCTTAA, Chicken
nt_sequence ATGGCACATGCAGCGCAAGTAGGTCTACAAGACGCTACTTCCCCTATCATAGAAGAGCTTATCACCTTTCATGATCACGCCCTCATAATCATTTTCCTTATCTGCTTCCTAGTCCTGTATGCCCTTTTCCTAACACTCACAACAAAACTAACTAATACTAACATCTCAGACGCTCAGGAAATAGAAACCGTCTGAACTATCCTGCCCGCCATCATCCTAGTCCTCATCGCCCTCCCATCCCTACGCATCCTTTACATAACAGACGAGGTCAACGATCCCTCCCTTACCATCAAATCAATTGGCCACCAATGGTACTGAACCTACGAGTACACCGACTACGGCGGACTAATCTTCAACTCCTACATACTTCCCCCATTATTCCTAGAACCAGGCGACCTGCGACTCCTTGACGTTGACAATCGAGTAGTACTCCCGATTGAAGCCCCCATTCGTATAATAATTACATCACAAGACGTCTTGCACTCATGAGCTGTCCCCACATTAGGCTTAAAAACAGATGCAATTCCCGGACGTCTAAACCAAACCACTTTCACCGCTACACGACCGGGGGTATACTACGGTCAATGCTCTGAAATCTGTGGAGCAAACCACAGTTTCATGCCCATCGTCCTAGAATTAATTCCCCTAAAAATCTTTGAAATAGGGCCCGTATTTACCCTATAG, Human
nt_sequence ATGGCACATCCCACACAATTAGGATTCCAAGACGCGGCCTCACCCGTAATAGAAGAACTTCTTCACTTCCATGACCATGCCCTAATAATTGTATTTTTGATTAGCGCCCTAGTACTTTATGTTATTATTACAACCGTCTCAACAAAACTCACTAACATATATATTTTGGACTCACAAGAAATTGAAATCGTATGAACTGTGCTCCCTGCCCTAATCCTCATTTTAATCGCCCTCCCCTCACTACGAATTCTATATCTTATAGACGAGATTAATGACCCCCACCTAACAATTAAGGCCATGGGGCACCAATGATACTGAAGCTACGAGTATACTGATTATGAAAACTTAAGTTTTGACTCCTACATAATCCCCACCCAGGACCTAACCCCTGGACAATTCCGGCTACTAGAGACAGACCACCGAATGGTTGTTCCCATAGAATCCCCTATTCGCATTCTTGTTTCCGCCGAAGATGTACTACACTCCTGGGCCCTTCCAGCCATGGGGGTAAAGATAGACGCGGTCCCAGGACGCCTTAACCAAACCGCCTTTATTGCCTCCCGCCCCGGGGTATTCTATGGGCAATGCTCAGAAATCTGTGGAGCAAACCACAGCTTTATACCCATCGTAGTAGAAGCGGTCCCACTATCTCACTTCGAAAACTGGTCCACCCTTATACTAAAAGACGCCTCACTAGGAAGCTAA, Loach
nt_sequence ATGGCCTACCCATTCCAACTTGGTCTACAAGACGCCACATCCCCTATTATAGAAGAGCTAATAAATTTCCATGATCACACACTAATAATTGTTTTCCTAATTAGCTCCTTAGTCCTCTATATCATCTCGCTAATATTAACAACAAAACTAACACATACAAGCACAATAGATGCACAAGAAGTTGAAACCATTTGAACTATTCTACCAGCTGTAATCCTTATCATAATTGCTCTCCCCTCTCTACGCATTCTATATATAATAGACGAAATCAACAACCCCGTATTAACCGTTAAAACCATAGGGCACCAATGATACTGAAGCTACGAATATACTGACTATGAAGACCTATGCTTTGATTCATATATAATCCCAACAAACGACCTAAAACCTGGTGAACTACGACTGCTAGAAGTTGATAACCGAGTCGTTCTGCCAATAGAACTTCCAATCCGTATATTAATTTCATCTGAAGACGTCCTCCACTCATGAGCAGTCCCCTCCCTAGGACTTAAAACTGATGCCATCCCAGGCCGACTAAATCAAGCAACAGTAACATCAAACCGACCAGGGTTATTCTATGGCCAATGCTCTGAAATTTGTGGATCTAACCATAGCTTTATGCCCATTGTCCTAGAAATGGTTCCACTAAAATATTTCGAAAACTGATCTGCTTCAATAATTTAA, Mouse
nt_sequence ATGGCTTACCCATTTCAACTTGGCTTACAAGACGCTACATCACCTATCATAGAAGAACTTACAAACTTTCATGACCACACCCTAATAATTGTATTCCTCATCAGCTCCCTAGTACTTTATATTATTTCACTAATACTAACAACAAAACTAACACACACAAGCACAATAGACGCCCAAGAAGTAGAAACAATTTGAACAATTCTCCCAGCTGTCATTCTTATTCTAATTGCCCTTCCCTCCCTACGAATTCTATACATAATAGACGAGATTAATAACCCAGTTCTAACAGTAAAAACTATAGGACACCAATGATACTGAAGCTATGAATATACTGACTATGAAGACCTATGCTTTGACTCCTACATAATCCCAACCAATGACCTAAAACCAGGTGAACTTCGTCTATTAGAAGTTGATAATCGGGTAGTCTTACCAATAGAACTTCCAATTCGTATACTAATCTCATCCGAAGACGTCCTGCACTCATGAGCCATCCCTTCACTAGGGTTAAAAACCGACGCAATCCCCGGCCGCCTAAACCAAGCTACAGTCACATCAAACCGACCAGGTCTATTCTATGGCCAATGCTCTGAAATTTGCGGCTCAAATCACAGCTTCATACCCATTGTACTAGAAATAGTGCCTCTAAAATATTTCGAAAACTGATCAGCTTCTATAATTTAA, Rat
nt_sequence ATGGCATACCCCCTACAAATAGGCCTACAAGATGCAACCTCTCCCATTATAGAGGAGTTACTACACTTCCATGACCACACATTAATAATTGTGTTCCTAATTAGCTCATTAGTACTCTACATTATCTCACTTATACTAACCACGAAACTCACCCACACAAGTACAATAGACGCACAAGAAGTGGAAACGGTGTGAACGATCCTACCCGCTATCATTTTAATTCTCATTGCCCTACCATCATTACGAATCCTCTACATAATGGACGAGATCAATAACCCTTCCTTGACCGTAAAAACTATAGGACATCAGTGATACTGAAGCTATGAGTACACAGACTACGAAGACCTGAACTTTGACTCATATATGATCCCCACACAAGAACTAAAGCCCGGAGAACTACGACTGCTAGAAGTAGACAATCGAGTAGTCCTCCCAATAGAAATAACAATCCGCATACTAATCTCATCAGAAGATGTACTCCACTCATGAGCCGTACCGTCCCTAGGACTAAAAACTGATGCTATCCCAGGACGACTAAACCAAACAACCCTAATAACCATACGACCAGGACTGTACTACGGTCAATGCTCAGAAATCTGTGGTTCAAACCACAGCTTCATACCTATTGTCCTCGAATTGGTCCCACTATCCCACTTCGAGAAATGATCTACCTCAATGCTTTAA, Seal
nt_sequence ATGGCATATCCATTCCAACTAGGTTTCCAAGATGCAGCATCACCCATCATAGAAGAGCTCCTACACTTTCACGATCATACACTAATAATCGTTTTTCTAATTAGCTCTTTAGTTCTCTACATTATTACCCTAATGCTTACAACCAAATTAACACATACTAGTACAATAGACGCCCAAGAAGTAGAAACTGTCTGAACTATCCTCCCAGCCATTATCTTAATTTTAATTGCCTTGCCTTCATTACGGATCCTTTACATAATAGACGAAGTCAATAACCCCTCCCTCACTGTAAAAACAATAGGTCACCAATGATATTGAAGCTATGAGTATACCGACTACGAAGACCTAAGCTTCGACTCCTATATAATCCCAACATCAGACCTAAAGCCAGGAGAACTACGATTATTAGAAGTAGATAACCGAGTTGTCTTACCTATAGAAATAACAATCCGAATATTAGTCTCATCAGAAGACGTACTCCACTCATGGGCCGTACCCTCCTTGGGCCTAAAAACAGATGCAATCCCAGGACGCCTAAACCAAACAACCTTAATATCAACACGACCAGGCCTATTTTATGGACAATGCTCAGAGATCTGCGGCTCAAACCACAGTTTCATACCAATTGTCCTAGAACTAGTACCCCTAGAAGTCTTTGAAAAATGATCTGTATCAATACTATAA, Whale
nt_sequence ATGGCACACCCATCACAATTAGGTTTTCAAGACGCAGCCTCTCCAATTATAGAAGAATTACTTCACTTCCACGACCATACCCTCATAGCCGTTTTTCTTATTAGTACGCTAGTTCTTTACATTATTACTATTATAATAACTACTAAACTAACTAATACAAACCTAATGGACGCACAAGAGATCGAAATAGTGTGAACTATTATACCAGCTATTAGCCTCATCATAATTGCCCTTCCATCCCTTCGTATCCTATATTTAATAGATGAAGTTAATGATCCACACTTAACAATTAAAGCAATCGGCCACCAATGATACTGAAGCTACGAATATACTAACTATGAGGATCTCTCATTTGACTCTTATATAATTCCAACTAATGACCTTACCCCTGGACAATTCCGGCTGCTAGAAGTTGATAATCGAATAGTAGTCCCAATAGAATCTCCAACCCGACTTTTAGTTACAGCCGAAGACGTCCTCCACTCGTGAGCTGTACCCTCCTTGGGTGTCAAAACAGATGCAATCCCAGGACGACTTCATCAAACATCATTTATTGCTACTCGTCCGGGAGTATTTTACGGACAATGTTCAGAAATTTGCGGAGCAAACCACAGCTTTATACCAATTGTAGTTGAAGCAGTACCGCTAACCGACTTTGAAAACTGATCTTCATCAATACTAGAAGCATCACTAAGA, Frog
Description
The nucleotide sequences are defined with the command nt_sequence .
The default sequence type is peptide. The command set_alignment_type_CN prevents the nucleotide sequences from being translated.
Display alignment
2. Loading sequences from files
2.1. Sequence files from URLs
2.2. Sequence name
2.3. Sequence files from databases
2.4. Alignment files
3. Translating nucleotide sequences
3.1. Translating coding sequences
3.2. Genomic sequence positions
beforeAC
set_alignment_type_P
load UNIPROT:P01185|vasopressin_UniProt
nt_sequence ATGCCTGACACCATGCTGCCCGCCTGCTTCCTCGGCCTACTGGCCTTCTCCTCCGCGTGCTACTTCCAGAACTGCCCGAGGGGCGGCAAGAGGGCCATGTCCGACCTGGAGCTGAGACAGTGCCTCCCCTGCGGCCCCGGGGGCAAAGGCCGCTGCTTCGGGCCCAGCATCTGCTGCGCGGACGAGCTGGGCTGCTTCGTGGGCACGGCTGAGGCGCTGCGCTGCCAGGAGGAGAACTACCTGCCGTCGCCCTGCCAGTCCGGCCAGAAGGCGTGCGGGAGCGGGGGCCGCTGCGCCGCCTTCGGCGTTTGCTGCAACGACGAGAGCTGCGTGACCGAGCCCGAGTGCCGCGAGGGCTTTCACCGCCGCGCCCGCGCCAGCGACCGGAGCAACGCCACGCAGCTGGACGGGCCGGCCGGGGCCTTGCTGCTGCGGCTGGTGCAGCTGGCCGGGGCGCCCGAGCCCTTCGAGCCCGCCCAGCCCGACGCCTAC,complement(75..247,422..623,2000..2119),vasopressin
load UNIPROT:P01178|oxytocin_UniProt
nt_sequence ATGGCCGGCCCCAGCCTCGCTTGCTGTCTGCTCGGCCTCCTGGCGCTGACCTCCGCCTGCTACATCCAGAACTGCCCCCTGGGAGGCAAGAGGGCCGCGCCGGACCTCGACGTGCGCAAGTGCCTCCCCTGCGGCCCCGGGGGCAAAGGCCGCTGCTTCGGGCCCAATATCTGCTGCGCGGAAGAGCTGGGCTGCTTCGTGGGCACCGCCGAAGCGCTGCGCTGCCAGGAGGAGAACTACCTGCCGTCGCCCTGCCAGTCCGGCCAGAAGGCGTGCGGGAGCGGGGGCCGCTGCGCGGTCTTGGGCCTCTGCTGCAGCCCGGACGGCTGCCACGCCGACCCTGCCTGCGACGCGGAAGCCACCTTCTCCCAGCGC, join(37..156,458..659,744..799), oxytocin
GFF oxytocin_UniProt||Signalpeptide|1|19|||||Color=#aaAA00;
GFF oxytocin_UniProt||Oxytocin|20|28|||||Style=BACKGROUND; Color=#55FF55;
GFF oxytocin_UniProt||Neurophysin_1|32|125|||||Color=#6666FF;
GFF vasopressin_UniProt||Signalpeptide|1|19|||||Color=#aaAA00;
GFF vasopressin_UniProt||ARG-Vasopressin|20|28|||||Style=BACKGROUND; Color=#55FF55;
GFF vasopressin_UniProt||Neurophysin_2|32|124|||||Color=#6666FF;
new_nucleotide_selection 2000-2119, vasopressin/Exon1
new_nucleotide_selection 422-623, vasopressin/Exon2
new_nucleotide_selection 75-247, vasopressin/Exon3
new_nucleotide_selection 1-156, oxytocin/Exon1
new_nucleotide_selection 458-659, oxytocin/Exon2
new_nucleotide_selection 744-898, oxytocin/Exon3
color #FF0000, */Exon1 */Exon3
color #0000FF, */Exon2
add_annotation Style=UNDERLINE, */Ex.*
Description
Optionally, the corresponding genomic positions can be given in the second parameter.
They will be reported for the triplet at the mouse pointer and allow for residue selections referring to genomic positions.
The command new_nucleotide_selection creates amino acid selections based on nucleotide positions.
Please note that oxytocin and vasopressin are transcribed in opposite directions.
Move the mouse pointer slowly from the N-terminus towards the C-terminus and observe the genomic DNA positions.
Note that the genomic positions are declining in vasopressin while they are rising in oxytocin.
Exons 2 and 3 overlap.
Considering the triplet at the exon boundary the first codon position belongs to exon 2 while codon position 2 and 3 belong to exon 3.
Display alignment
3.3. Genomic sequences from EMBL or Genbank files
beforeAC
load UNIPROT:P01185|vasopressin_UniProt http://www.bioinformatics.org/strap/aa/sampleData/vasopressin_ENSG00000101200.data.txt|vasopressin
load UNIPROT:P01178|oxytocin_UniProt http://www.bioinformatics.org/strap/aa/sampleData/oxytocin_ENSG00000101405.data.txt|oxytocin
translate_cds join(37..156,458..659,744..799), oxytocin
translate_cds join(complement(2000..2119),complement(422..623), complement(75..247)), vasopressin
GFF oxytocin_UniProt||Signalpeptide|1|19|||||Color=#aaAA00;
GFF oxytocin_UniProt||Oxytocin|20|28|||||Style=BACKGROUND; Color=#55FF55;
GFF oxytocin_UniProt||Neurophysin_1|32|125|||||Color=#6666FF;
GFF vasopressin_UniProt||Signalpeptide|1|19|||||Color=#aaAA00;
GFF vasopressin_UniProt||ARG-Vasopressin|20|28|||||Style=BACKGROUND; Color=#55FF55;
GFF vasopressin_UniProt||Neurophysin_2|32|124|||||Color=#6666FF;
new_nucleotide_selection 2000-2119, vasopressin/Exon1
new_nucleotide_selection 422-623, vasopressin/Exon2
new_nucleotide_selection 75-247, vasopressin/Exon3
new_nucleotide_selection 1-156, oxytocin/Exon1
new_nucleotide_selection 458-659, oxytocin/Exon2
new_nucleotide_selection 744-898, oxytocin/Exon3
color #FF0000, */Exon1 */Exon3
color #0000FF, */Exon2
add_annotation Style=UNDERLINE, */Ex.*
Description
The genomic sequence is loaded with the command load .
The amino acid sequence is predicted from the exon positions given with the command translate_cds .
There is a potential performance problem:
The loaded files are potentially huge because they contain the entire gene sequence including UTRs and introns.
Therefore, the notion of the above example should be prefered.
Display alignment
4. Pruning
A subrange of the original sequence can be be displayed. Residue numbering still refers to the original full length sequence.
4.1. Displaying a range of residue
4.2. Pruning alignments left or right
5. Sequence Groups
Sequence groups are named subsets of all loaded sequences. They can be activated by buttons in the alignment GUI. Open the menu "Sequence Groups" in the alignment frame.
5.1. Sequence groups
5.2. Sequence groups by taxonomy
6. Shorter scripts - compactness
Using variables, brace expansion and regular expressions, readability of the script can be improved and its size reduced.
6.1. Variables
6.2. Brace expansion
6.3. Regular expressions and asterisks
6.4. Aliases for web adresses
7. Sequence Attributes
7.1. Balloon messages
7.2. Accession IDs and cross references
beforeAC
aa_sequence AGYDRHITIFSPEGRLYQVEYAFKATNQTNINSLAVRGKDCTVVISQKKVPDKLLDPTTVSYIFCISRTIGMVVNGPIPDARNAALRAKAEAAEFRYKYGYDMPCDVLAKRMANLSQIYTQRAYMRPLGVILTFVSVDEELGPSIYKTDPAGYYVGYKATATGPKQQEITTNLENHFKKSKIDHINEESWEKVVEFAITHMIDALGTEFSKNDLEVGVATKDKFFTLSAENIEERLVAIAEQD, a1_SaccharomycesCerevisiae
aa_sequence TSIMAVTFKDGVILGADSRTTTGAYIANRVTDKLTRVHDKIWCCRSGSAADTQAIADIVQYHLELYTSQYGTPSTETAASVFKELCYENKDNLTAGIIVAGYDDKNKGEVYTIPLGGSVHKLPYAIAGSGSTFIYGYCDKNFRENMSKEETVDFIKHSLSQAIKWDGSSGGVIRMVVLTAAGVERLIFYPDEYEQL, b1_SaccharomycesCerevisiae
aa_sequence TTIVSVRRNGHVVIAGDGQATLGNTVMKGNVKKVRRLYNDKVIAGFAGGTADAFTLFELFERKLEMHQGHLVKAAVELAKDWRTDRMLRKLEALLAVADETASLIITGNGDVVQPENDLIAIGSGGPYAQAAARALLENTELSAREIAEKALDIAGDICIYTNHFHTIEELSYKAEFHHH, hs_EscherichiaColi
let $BALLOON=Balloon=<HTML><BODY><OL> <LI>Thr<sub>1</sub></LI> <LI>Lys<sub>33</sub></LI> <LI>Ser<sub>129</sub></LI></OL> </BODY></HTML>
GFF b1_SaccharomycesCerevisiae||Active_site|1,33,129|||||$BALLOON
GFF hs_EscherichiaColi||Active_site|1,33,124|||||$BALLOON
let $HS=hs_EscherichiaColi
accession_id UNIPROT:PSA1_YEAST, a1_SaccharomycesCerevisiae
accession_id UNIPROT:PSB1_YEAST, b1_SaccharomycesCerevisiae
accession_id PDB:1NED_A, $HS
add_xref UNIPROT:HSLV_ECO24, $HS
Description
Database IDs are either set explicitly with the commands accession_id , add_xref and balloon_text
or predicted by sequence search with the command find_uniprot_id .
In Alignment Annotator, the UniProt ID is very important and is
used for uniprot_features , DAS_features and taxonomy_group .
Display alignment
7.3. Icons
7.4. Residue index offsets
7.5. Secondary structure
7.6. Secondary structure
8. Residue selections and annotations
Residue selections are displayed by underline or filled background.
They can be defined explicitly, refering to the sequence index, the PDB resnum and insertion code or the nucleotide index of the DNA sequence an amino acid was predicted from.
They can also be obtained from annotation databases.
Residue selections can have attributes like color, balloon messages and 3D-commands.
8.1. GFF-notion
8.2. GFF: with attributes
8.3. GFF: non-consecutive positions
8.4. GFF: refering to PDB-Resnum
8.5. Adding attributes to residue selections
8.6. Residues in proximity to a ligand
8.7. Solvent Accessibility
beforeAC
# load PFAM:PF00941
# load UNIPROT:P23639
# project_coordinates AUTO, *
let $area=10
load PDB:1RYP_B|example_1
load PDB:1RYP_B|example_2
load PDB:1RYP_B|example_3
load PDB:1RYP_B|example_4
new_selection MIN_ACCESSIBILITY=$area, example_1/surface
new_selection MIN_ACCESSIBILITY=$area SUBUNITS=ALL, example_2/surface
new_selection MIN_ACCESSIBILITY=$area SUBUNITS=http://www.bioinformatics.org/strap/aa/sampleData/pdb1ryp.data.txt, example_3/surface
new_selection MIN_ACCESSIBILITY=$area SUBUNITS="PDB:1ryp_A PDB:1ryp_C", example_4/surface
set_annotation Color=#3322FF, */surface
set_annotation Style=BACKGROUND, */surface
Description
With the attribute "MIN_ACCESSIBILITY=..." for the command new_selection ,
residues are highlighted that have a solvent accessible surface area greater than the given value in square Angstrom.
Computation is performed with the program mkdssp by Kabsch and Sander which
must be in the executable path.
Only sequences with known 3D-structure (see project_coordinates ) are considered.
Alignment Annotator expects mkdssp in /usr/bin/ or bin/.
example_1: Only amino acids are considered - hetero atoms and nucleotide acid are ignored.
In multimeric protein (here proteasome), also the amino acids at the interfaces between the subunits are highlighted even though they are not solvent exposed in the multimer.
To exclude those residues at the inter-subunit interfaces, the parameter SUBUNITS can hold a reference to other structure files to be considered during computation.
example_2: For proteins loaded from the PDB, the attribute SUBUNITS=ALL denote the original structure containing all subunits.
example_3: A file with all other subunits can be provided.
example_4: The two neighbouring subunits are given as a PDB reference. Lists of space separated entries must be enclosed in double quotes.
For structures from the PDB, subunits=ALL can be used.
Display alignment
8.8. Residue selections from UniProt
beforeAC
aa_sequence AGYDRHITIFSPEGRLYQVEYAFKATNQTNINSLAVRGKDCTVVISQKKVPDKLLDPTTVSYIFCISRTIGMVVNGPIPDARNAALRAKAEAAEFRYKYGYDMPCDVLAKRMANLSQIYTQRAYMRPLGVILTFVSVDEELGPSIYKTDPAGYYVGYKATATGPKQQEITTNLENHFKKSKIDHINEESWEKVVEFAITHMIDALGTEFSKNDLEVGVATKDKFFTLSAENIEERLVAIAEQD, a1_SaccharomycesCerevisiae
aa_sequence TSIMAVTFKDGVILGADSRTTTGAYIANRVTDKLTRVHDKIWCCRSGSAADTQAIADIVQYHLELYTSQYGTPSTETAASVFKELCYENKDNLTAGIIVAGYDDKNKGEVYTIPLGGSVHKLPYAIAGSGSTFIYGYCDKNFRENMSKEETVDFIKHSLSQAIKWDGSSGGVIRMVVLTAAGVERLIFYPDEYEQL, b1_SaccharomycesCerevisiae
aa_sequence TTIVSVRRNGHVVIAGDGQATLGNTVMKGNVKKVRRLYNDKVIAGFAGGTADAFTLFELFERKLEMHQGHLVKAAVELAKDWRTDRMLRKLEALLAVADETASLIITGNGDVVQPENDLIAIGSGGPYAQAAARALLENTELSAREIAEKALDIAGDICIYTNHFHTIEELSYKAEFHHH, hs_EscherichiaColi
let $BALLOON=Balloon=<HTML><BODY><OL> <LI>Thr<sub>1</sub></LI> <LI>Lys<sub>33</sub></LI> <LI>Ser<sub>129</sub></LI></OL> </BODY></HTML>
GFF b1_SaccharomycesCerevisiae||Active_site|1,33,129|||||$BALLOON
GFF hs_EscherichiaColi||Active_site|1,33,124|||||$BALLOON
find_uniprot_id *
uniprot_features *
Description
The UniProt ID is obtained by sequence search.
Alternatively, it can be set explicitly with add_xref or accession_id .
The command uniprot_features highlights all sequence features stored in the UniProt.
Since the data is available on the Alignment Annotator server,
UniProt features are loaded instantly.
BioDAS annotations used to be loaded with the command DAS_features .
Unfortunately, most BioDAS servers and the registry is not available any more.
Additional BioDAS registries can be added by the administrator.
Display alignment
9. Display
9.1. Residue color
9.2. Residue background color
9.3. Alignment title
9.4. Characters per line
10. 3D-Visualization
Currently, 3D-visualization is based on Java, but a JavaScript based 3D-visualization (probably JSmol) and a desktop application will be included soon.
The type of 3D viewer does not affect the scripting language which is independent on the specific implementation.
First install Java.
Use a web browser that still supports Java applets: Firefox, Iceweasel, Opera and IE.
On the other hand, MS-Edge, Chromium and Chrome do not support Java applets.
The 3D views are not shown automatically, when the alignment document is displayed in the browser because
there might be several 3D-views. Each is represented by a button. 3D views are displayed by pushing the respective button.
There are two different locations for these buttons:
The context panels (Right-click) of the sequence names in the alignment panel contain buttons to open 3D views.
For any sequence with associated 3D structure there will be at least the default 3D view.
In addition, 3D views are listed which have been created with open_3D .
Activating "Tab" in the segmented control and then clicking 3D brings up a panel with buttons for all available 3D-views.
3D-views are created with the command
open_3D which takes the unique ID and a list of loaded proteins or structures which are not part of the sequence alignment.
The later are given as file paths, URLs or database reference.
To be recognized as file paths, file paths must start with slash, dot-slash or dot-dot-slash.
A 3D-view can be referred to by its ID
select_3D and one ore several of the loaded proteins or structures. Proteins
are best specified by their sequence name and pdb files must be
referred to by exactly the same file path, URL or database reference
used in the open_3D-command.
Once selected with select_3D, 3D script commands can be applied. These commands start with "3D_".
Usually, the next command is
3D_select
10.1. Superimposing protein structures
10.2. Style of single atoms
10.3. Attaching 3D styles to residue selections
beforeAC
load PDB:1RYP_B
color #FF0000, 1RYP_B
let $SEL=1RYP_B/My_Special_Residues
new_selection 100:-110:.CA.CB, $SEL
add_annotation 3D_view=3D_spheres, $SEL
color #ff0000, $SEL
# Or type set_annotation Color=#ff0000, $SEL
add_annotation 3D_view=3D_color #FFffFF, $SEL
add_annotation Atoms=.CB, $SEL
add_annotation 3D_view=3D_color #FF00FF, $SEL
add_annotation Atoms=.CA, $SEL
add_annotation 3D_view=3D_color #FF0077, $SEL
Description
Another method for changing 3D-styles is to create a residue selection and to attach annotations of the type "3D_view.
Display alignment
10.4. Residue annotations from UniProt
beforeAC
load PDB:1SBC_A UNIPROT:P00780
# Delete all residue annotations from the PDB file
delete 1SBC_A/*
# Infer coordinates of the PDB structures onto the UniProt sequence.
project_coordinates PDB:1SBC_A, P00780
# Load Annotations from Uniprot.
uniprot_features *
# Going to attach styles to annotations that have the name "Active_site" ...
let $SEL=*/Active_site
# All atoms white.
#add_annotation 3D_color=#FF00FF, $SEL
add_annotation Atoms=.CB, $SEL
# C-beta atoms red and sphere.
add_annotation 3D_view=3D_color #FF0000, $SEL
add_annotation 3D_view=3D_spheres, $SEL
# C-alpha atoms blue and sphere.
add_annotation Atoms=.CA, $SEL
add_annotation 3D_view=3D_color #0000FF, $SEL
add_annotation 3D_view=3D_spheres, $SEL
Description
With the command add_annotation , 3D-styles are attached to residue annotations with the Name "Active_site" loaded from Uniprot.
All entries of type "3D_view" are evaluated one after the other.
Initially all atoms of the amino acids are considered.
The current set of atoms is altered with a command like "add_annotation Atoms=.CA, residue-selection ".
This example exhibits are a rare problem:
Why are the Uniprot annotations not shown on PDB:1SBC_A? Because the PDB sequence differes from the UniProt as indicated by entries in the PDB file like
SEQADV 1SBC SER A 103 UNP P00780 THR 207 CONFLICT
Perhaps in later releases of Alignment Annotator, mismatches indicated in this way may be tolerated and the UniProt annotations loaded.
Display alignment
10.5. Multimeric proteins
10.6. DNA
11. Structure alignment
11.1. Sequence or structure based alignment
beforeAC
aa_sequence AGYDRHITIFSPEGRLYQVEYAFKATNQTNINSLAVRGKDCTVVISQKKVPDKLLDPTTVSYIFCISRTIGMVVNGPIPDARNAALRAKAEAAEFRYKYGYDMPCDVLAKRMANLSQIYTQRAYMRPLGVILTFVSVDEELGPSIYKTDPAGYYVGYKATATGPKQQEITTNLENHFKKSKIDHINEESWEKVVEFAITHMIDALGTEFSKNDLEVGVATKDKFFTLSAENIEERLVAIAEQD, a1_SaccharomycesCerevisiae
aa_sequence TSIMAVTFKDGVILGADSRTTTGAYIANRVTDKLTRVHDKIWCCRSGSAADTQAIADIVQYHLELYTSQYGTPSTETAASVFKELCYENKDNLTAGIIVAGYDDKNKGEVYTIPLGGSVHKLPYAIAGSGSTFIYGYCDKNFRENMSKEETVDFIKHSLSQAIKWDGSSGGVIRMVVLTAAGVERLIFYPDEYEQL, b1_SaccharomycesCerevisiae
aa_sequence TTIVSVRRNGHVVIAGDGQATLGNTVMKGNVKKVRRLYNDKVIAGFAGGTADAFTLFELFERKLEMHQGHLVKAAVELAKDWRTDRMLRKLEALLAVADETASLIITGNGDVVQPENDLIAIGSGGPYAQAAARALLENTELSAREIAEKALDIAGDICIYTNHFHTIEELSYKAEFHHH, hs_EscherichiaColi
let $BALLOON=Balloon=<HTML><BODY><OL> <LI>Thr<sub>1</sub></LI> <LI>Lys<sub>33</sub></LI> <LI>Ser<sub>129</sub></LI></OL> </BODY></HTML>
GFF b1_SaccharomycesCerevisiae||Active_site|1,33,129|||||$BALLOON
GFF hs_EscherichiaColi||Active_site|1,33,124|||||$BALLOON
project_coordinates AUTO, *
afterAC
title ClustalW
Description
Here, the command project_coordinates is run before alignment computation.
Therefore the 3D coordinates of Cα atoms are used for alignment computation.
For comparison, move the command line "project_coordinates..." to the second script text box and observe the alignment of the active site residue Ser129 .
Since the sequence similarity of these remote homologs is low, the alignment quality obtained by ClustalW is poor.
This can be seen by the active site trias which is aligned only if the 3D structure is used.
Another indicator are the secondary structure elements which are displayed with a check-box in the tool-bar.
Since insertions and deletions hardly occur in helices and beta sheets, they should be almost devoid of gaps.
Display alignment
12. Abnormal computation
12.1. Abnormal program termination
12.2. Very long computation
12.3. Exceptions