AA-Annotator

Alignment Annotator

  cd /home/people/christo/public_html/strap/AA2/tmp/apache/x56336547f0_6dbe/; less -R stdout.txt

Author: christoph

gille

charite

de Institut für Biochemie, Charité, Berlin

Summary

Alignment Annotator annotates and renders sequence alignments. Program features:

Annotations
- Coping with huge numbers of annotations
- Retrieval of information from annotation services and 3D-structure files
- Optimized layout of underlined residue selections for compactness
3D
- Sequence alignment interlinked with 3D structure visualization
Compatibility
- All major web browsers (except IE)
- All operating systems (Windows, Apple, UNIX, iOS, Android)
- Tablet support
Export
- MS-Word, Libre-Office, Open-Office. This allows for further editing for example to create figures for publications.
- Clustal, Fasta
Alignment Annotator does not require Java.

The resulting HTML document is interactive and can either stand alone or be included in other web pages.

Entering sequences

The start page has a text area where the sequences can be entered in several different ways. See examples by activating the check-box Sample input data:

As an alignment. All standard sequence and alignment formats are supported. The gaps are denoted by dash.
Sequences without gaps. All standard multiple sequence file formats are supported. In this case the system will align the sequences automatically.
Reference (URL or database id) for alignment documents, sequence files or PDB structure files.

Amino acid sequences and nucleotide sequences are supported. Nucleotide sequences are assumed, if all sequences are composed of the letters A, C, T, G and N, unless, the alignment type is explicitly set with the commands set_alignment_type_N or set_alignment_type_P in the first script text. Annotations from services and 3D visualization are only available for amino acid sequences.

Translation into peptide sequences:

If coding nucleotide sequences are provided and amino acid sequences are predicted with the script command translate_cds in the first script text, an amino acid sequence alignment is shown. Example translating the first to the 99th nucleotide:

 translate_cds  1..99 , *

Example translating the first to the last nucleotide:

 translate_cds  1.. , *

Example with intron spanning position 32 to 103:

 translate_cds  complement(20..31,104..222) , sequenceName

Instead of an explicit CDS expression a specific protein name of a given Embl or GenBank formated file can be entered. The translate_cds command should be entered into the first script text such that alignment computation acts on amino acid sequences.
_{If the translate_cds command was in the 2nd script text, the alignment would be computed for
the nucleotide sequences and then the nucleotide sequences would be
translated.}

Alignment computation

In most cases, server side alignment computation is performed with ClustalW. More accurate methods like T-Coffee can be selected with the script command use_aligner in the first script. Mixed sequence / 3D structure alignment can improve alignment quality of remote homologs and is conducted if structure data is loaded at the time of alignment computation for at least two sequences. This is the case if

the input data contains PDB IDs
or if PDB files are loaded by script commands in the first script text
or if the second script contains an align-command while mapping of homologous structures is enabled.

3D structure alignment is time consuming. By default, the program TM-align is used which can align two structures. If more than two structures are provided, all n times (n-1) pair alignments are compiled by the server program. The advantage of performing each pair alignment is, that each intermediate result is stored in the cache and interrupted computations can be resumed. Conversely, structure alignment methods which naively align more than two structures often yield better results and can be selected with the script command use_aligner3D, however only the final result is stored in the cache and interrupted computations are lost. For time consuming computations, please use the locally running Strap program (Java) rather than the Alignment Annotation server. The result of 3D-alignment is finally used to align also those sequences without 3D-structure using ClustalW and T-Coffee.

The alignment view

After submitting the data to the server, the rendered and interactive alignment will be shown in an embedded frame. It may happen that the job prematurely terminates due to timeout. In this case, the computation can be resumed by reloading the browser page. The alignment view will look like:

The alignment view has a graphical user interface which is independent of the server. It allows:

Changing order of sequences by dragging the mouse
Hiding of sequences and annotation by dragging into the trash
Changing color mode and conservation threshold
Wrapping long sequences

It is described in detail in documentation of the alignment view.

Annotations

Following the GFF syntax, annotations are entered into the text field in Change > Annotations > Own. The tabulator key triggers word-completion for sequence names.

Explicit definition: Annotations can be defined explicitly using GFF-format with tabulator or vertical bar as field separator.
The nine fields are: Sequence | Source leave empty | Name | Start | End | Score | Nucleotide Strand | leave empty | Attributes.
For example the following line will select residue 24 of the specified sequence:

seqNameOrNumber|.|Modified residue|24|24|.|.|.|

Colors are written as Red-Green-Blue Hexadecimal triplets and can be set in the field Attributes:

seqNameOrNumber|.|Modified residue|24|24|.|.|.|Color=#00ff00

Otherwise a table of feature names and colors is used which can be edited with the command feature_colors.

 
feature_colors Modified_residue=#00AA00 Phosphoserine=#00ff00

If there is no matching entry, the default color is used. The following specifies the color, balloon message, style and 3D style:

seqNameOrNumber|.|Modified residue|24|24|.|.|.|Color=#00ff00; Balloon="hello world"; 3D_view=spheres; Style=BACKGROUND

If the amino acid sequence is translated from a coding nucleotide sequence as described above then the positions can refer to the nucleotide sequence by setting the field Nucleotide Strand to "+". A minus sign, denoting the reverse complementary strand in GFF is not supported. The attribute Hide=true combined with the default style (UNDERLINE) deactivates the residue selection, which can be activated with a check-box.
There are the following enhancements over the standard GFF format:

If Start equals End i.e. only one position is selected, the field End can be omitted.
Start can contain a complex expression allowing for non-consecutive sequence positions. In this case End must remain empty. For example 10-20,30-40 selects residues 10 to 20 plus 30 to 40. As in Rasmol/Jmol, 20:-30: refers to PDB residue numbers 20 to 30. The chain ID after the colon can be omitted.

Lines starting with let are variable assignments. Variables can be used for frequently used text fragments to reduce the amount of typed text. Example:

let $t=Hello world

Annotations can also be defined with script commands which is, however, more verbose than the GFF notion.

Retrieval from services: The UniProt sequence features and Catalytic Site Atlas residues are directly stored on the server and will be quickly available. The BioDAS services, however are loaded from remote servers which causes some delay. The default BioDAS services are cbs_total, netphos and netoglyc. All are available with the script command DAS_features name, sequences. The selection of the color is described above.
Limitations: Currently, only UniProt centered BioDAS-annotations are supported.

Scripts

Advanced program features not accessible by the graphical user interface require script commands.

Download/Save

A Zip file can be downloaded. It has all required files for embedding the alignment in web sites. It also contains the URL for opening Alignment Annotator and a Strap file for opening the alignment in Strap.

Embedding in web services

Read here about the Programming interface.
Alignment Annotator can be employed for visualization in Bioinformatics web services.
Acknowledgements