Alignment Annotator   -   Browser based sequence alignment visualization with JAVASCRIPT

Acknowledgements

Summary

Alignment Annotator annotates and renders sequence alignments. Program features: The resulting HTML document is interactive and can be included in web pages.

Entering sequences

The start page has a text area where the sequences can be entered in different ways. Amino acid sequences and nucleotide sequences are supported. Nucleotide sequences are assumed, if all sequences are composed of the letters A, C, T, G and N, unless, the alignment type is explicitly set with the script command set_alignment_type_P in the first script text. Residue annotations from bioinformatics services and 3D visualization are only available for amino acid sequences.

The sequence alignment display and its user interface

After submitting the sequences, the alignment is displayed. It is interactive and allows: The menus and tool-bars are initially hidden and become visible by clicking . There are two levels:
  1. The top panel outside the black frame is a classical web form. It allows changing those features of the alignment that require computation at server side. Changes take only effect after pressing the Upload button.
  2. The area surrounded by a black frame is an embedded HTML frame inside the main HTML page. The user interface responds immediately without accessing the server. Detailed description: documentation of the alignment view.

Translation into peptide sequences:

If coding nucleotide sequences are provided, the amino acid sequences can be predicted with the script command translate_cds. In this case the amino acid sequence alignment is shown. Example translating the first to the 99th nucleotide:
    translate_cds  1..99 , *
  
Example translating the first to the last nucleotide:
    translate_cds  1.. , *
  
Example with intron spanning position 32 to 103:
    translate_cds  complement(20..31,104..222) , sequenceName
  

Alignment computation

The submitted sequences are aligned on the server, unless at least one of the sequences contains the gap character (dash, "-"). In most cases, computation is performed with ClustalW. More accurate methods like T-Coffee can be selected with the script command use_aligner in the first script. Mixed sequence / 3D structure alignment can improve alignment quality of remote homologs and is conducted if structure data is loaded at the time of alignment computation for at least two sequences. This is the case if 3D structure alignment is time consuming. By default, the program TM-align is used which can align two structures. If more than two structures are provided, all n times (n-1) pair alignments are compiled by the server program. The advantage of performing each pair alignment is, that each intermediate result is stored in the cache and interrupted computations can be resumed. Conversely, structure alignment methods which naively align more than two structures often yield better results and can be selected with the script command use_aligner3D, however only the final result is stored in the cache and interrupted computations are lost. For time consuming computations, please use the locally running Strap program (Java) rather than the Alignment Annotation server. The result of 3D-alignment is finally used to align also those sequences without 3D-structure using ClustalW and T-Coffee.

Long-Computation-Mode

If the computation time exceeds a certain amount of seconds, it may be stopped by requests submitted later by the same or by other users. This will be inidicated in the browser page. The user can resumed the computation by reloading the page or change it to Long-Computation-Mode which will run much longer before it can be interrupted by another job. The disadvantage is that if the queue is not empty, it can take long before the job is going to be processed.

Annotations

Residue annotations are either shown as colored background or underline of residues. All attached information can be inspected by opening the context panel (Right-click). On touch screens, the cursor can be roughly positioned with the finger and moved exactly on the annotated residue with two arrow buttons. When the cursor lies on an annotated residue, another button becomes active which opens the context panel.



Explicit definition: Annotations can be defined explicitly using GFF-format with tabulator or vertical bar as field separator typed into the text box in Change > Annotations > Own. The nine fields are: Sequence | Source leave empty | Name | Start | End | Score | Nucleotide Strand | leave empty | Attributes.
The data can be typed directly into the text-field, using the tabulator key for automatic sequence name completion. It can also be prepared in a spread-sheet program like MS-Excel. To get the lines from Excel to the text-field, use Ctrl-C / Ctrl-V for copy and paste. For example the following line will select residue 24 of the specified sequence:
    seqNameOrNumber|.|Modified residue|24|24|.|.|.|
  
Colors are written as Red-Green-Blue Hexadecimal triplets and can be set in the field Attributes:
    seqNameOrNumber|.|Modified residue|24|24|.|.|.|Color=#00ff00
  
Otherwise a table of feature names and colors is used which can be edited with the command feature_colors.
    feature_colors Modified_residue=#00AA00 Phosphoserine=#00ff00
  
If there is no matching entry, the default color is used. The following specifies the color, balloon message, style and 3D style:
    seqNameOrNumber|.|Modified residue|24|24|.|.|.|Color=#00ff00; Balloon="hello world"; 3D_view=spheres; Style=BACKGROUND
  
If the amino acid sequence is translated from a coding nucleotide sequence as described above then the positions can refer to the coding nucleotide sequence by typing "+" into field 7. The attribute Hide=true combined with the default style (UNDERLINE) deactivates the residue selection. It can be activated (and de-activated) with a toggle-button after the sequence line.
There are the following enhancements over the standard GFF format: Lines starting with let are variable assignments. Variables can be used for frequently used text fragments to reduce the amount of typed text. Example:
    let $t=Hello world
  
Annotations can also be defined with script commands which is, however, more verbose than the GFF notion.

Retrieval from services: The UniProt sequence features and Catalytic Site Atlas residues are directly stored on the server and will be quickly available. The BioDAS services, however are loaded from remote servers which causes some delay. The default BioDAS services are cbs_total, netphos and netoglyc. Any BioDAS residue annotation is available with script command DAS_features name, sequences.
Limitations: Currently, only UniProt centered BioDAS-annotations are supported.

Scripts

Advanced program features not accessible by the graphical user interface require script commands.

Save/Backup

All project data can be downloaded to HD such that the project can be continued any time on any Alignment Annotation server. In addition, a Strap script file can be downloaded which allows continuation of the project using the desktop program Strap.

Export

A Zip file can be downloaded. It has all required files for embedding the alignment in web sites.

Using Alignment Annotator as a view option for sequence alignments for other web services

See Programming interface.

Publications