Alignment Annotator -
This page describes the scripting interface and explains the scripting language of Alignment Annotator by basic examples. Scripts are used in two situations:
- When alignment Annotator is used as a visualization tool within other software, the script text is generated programmatically.
The software can be a web service or a locally running software.
The script-text in the variables
afterAC are run before and after alignment computation, respectively.
These two web variables may contain the script text directly or the URL of a script file.
There are two possible ways to display the alignment document:
Alignment Annotator can be used from the main server or installed on any Unix/Linux web server.
The installation instructions can be requested from the author.
- An IFRAME displays the sequence alignment
- The web service pages contain Buttons or web links. By click, the alignment is opened in an IFRAME or another tab.
- When advanced program features of Alignment Annotator are not accessible via the graphical user interface.
The two script texts are entered into text fields in the section "Scripts" of the web inteface.
The tabulator key can be used for word-completion of sequence names and script commands.
Script line syntax
Each line starts with a script command and may contain parameters.
The set of commands is a subset of the commands supported in Strap.
Each script line follows one of the following four syntaxes:
- Command parameters
- Command list-of-sequences-residue-selections
- Command parameters, list-of-sequences-residue-selections
The last comma (,) in the line marks the end of parameters and the beginning of the list of sequences and residue selections.
Consequently the list of white space separated sequences and residue selections must not contain a comma.
Server side program flow
The input data is processed in the following order:
- The sequences in the text-pane "sequences" are loaded (if any).
- The script "beforeAC" is interpreted (if any).
- The alignment is computed, unless there are already gaps in the input.
ClustalW is currently used as standard method.
If at this time 3D structures are loaded, mixed sequence/structure alignment is performed.
TM-align is currently the standard 3D alignment method.
- Residue annotations are loaded from UniProt, CSA or BioDAS-servers if the respective check-boxes are activated..
- The script "afterAC" is run. (if any)
- Homologous 3D-structures are identified if the check-box "3D" is activated.
Since alignment computation is performed earlier, structures inferred here have no impact on alignment computation.
Conversely, structures loaded in the text-field "sequences" or inferred by the command project_coordinates in "beforeAC" are considered for alignment computation.
1. Creating sequences
1.1. Amino acid sequences
1.2. Aligned amino acid sequences
1.3. Nucleotide sequences
2. Loading sequences from files
2.1. Sequence files from URLs
2.2. Sequence name
2.3. Sequence files from databases
2.4. Alignment files
3. Translating nucleotide sequences
3.1. Translating coding sequences
3.2. Genomic sequence positions
3.3. Genomic sequences from EMBL or Genbank files
A subrange of the original sequence can be be displayed. Residue numbering still refers to the original full length sequence.
4.1. Displaying a range of residue
4.2. Pruning alignments left or right
5. Sequence Groups
Sequence groups are named subsets of all loaded sequences. They can be activated by buttons in the alignment GUI. Open the menu "Sequence Groups" in the alignment frame.
5.1. Sequence groups
5.2. Sequence groups by taxonomy
6. Shorter scripts - compactness
Using variables, brace expansion and regular expressions, readability of the script can be improved and its size reduced.
6.2. Brace expansion
6.3. Regular expressions and asterisks
6.4. Aliases for web adresses
7. Sequence Attributes
7.1. Balloon messages
7.2. Accession IDs and cross references
7.4. Residue index offsets
7.5. Secondary structure
7.6. Secondary structure
8. Residue selections and annotations
Residue selections are displayed by underline or filled background.
They can be defined explicitly, refering to the sequence index, the PDB resnum and insertion code or the nucleotide index of the DNA sequence an amino acid was predicted from.
They can also be obtained from annotation databases.
Residue selections can have attributes like color, balloon messages and 3D-commands.
8.2. GFF: with attributes
8.3. GFF: non-consecutive positions
8.4. GFF: refering to PDB-Resnum
8.5. Adding attributes to residue selections
8.6. Residues in proximity to a ligand
8.7. Solvent Accessibility
8.8. Residue selections from UniProt
9.1. Residue color
9.2. Residue background color
9.3. Alignment title
9.4. Characters per line
The type of 3D viewer does not affect the scripting language which is independent on the specific implementation.
First install Java.
Use a web browser that still supports Java applets: Firefox, Iceweasel, Opera and IE.
On the other hand, MS-Edge, Chromium and Chrome do not support Java applets.
The 3D views are not shown automatically, when the alignment document is displayed in the browser because
there might be several 3D-views. Each is represented by a button. 3D views are displayed by pushing the respective button.
There are two different locations for these buttons:
- The context panels (Right-click) of the sequence names in the alignment panel contain buttons to open 3D views.
For any sequence with associated 3D structure there will be at least the default 3D view.
In addition, 3D views are listed which have been created with open_3D.
- Activating "Tab" in the segmented control and then clicking 3D brings up a panel with buttons for all available 3D-views.
3D-views are created with the command open_3D
which takes the unique ID and a list of loaded proteins or structures which are not part of the sequence alignment.
The later are given as file paths, URLs or database reference.
To be recognized as file paths, file paths must start with slash, dot-slash or dot-dot-slash.
A 3D-view can be referred to by its ID select_3D
and one ore several of the loaded proteins or structures. Proteins
are best specified by their sequence name and pdb files must be
referred to by exactly the same file path, URL or database reference
used in the open_3D-command.
Once selected with select_3D, 3D script commands can be applied. These commands start with "3D_".
Usually, the next command is 3D_select
10.1. Superimposing protein structures
10.2. Style of single atoms
10.3. Attaching 3D styles to residue selections
10.4. Residue annotations from UniProt
10.5. Multimeric proteins
11. Structure alignment
11.1. Sequence or structure based alignment
12. Abnormal computation
12.1. Abnormal program termination
12.2. Very long computation