Alignment Annotator - Browser based sequence alignment visualization with JAVASCRIPT

Scripting interface

This page describes the scripting interface and explains the scripting language of Alignment Annotator by basic examples. Scripts are used in two situations:

When alignment Annotator is used as a visualization tool within other software, the script text is generated programmatically. The software can be a web service or a locally running software. The script-text in the variables
```
beforeAC
```
and
```
afterAC
```
are run before and after alignment computation, respectively. These two web variables may contain the script text directly or the URL of a script file. There are two possible ways to display the alignment document:
- An IFRAME displays the sequence alignment
- The web service pages contain Buttons or web links. By click, the alignment is opened in an IFRAME or another tab.
Alignment Annotator can be used from the main server or installed on any Unix/Linux web server. The installation instructions can be requested from the author.
When advanced program features of Alignment Annotator are not accessible via the graphical user interface. The two script texts are entered into text fields in the section "Scripts" of the web inteface. The tabulator key can be used for word-completion of sequence names and script commands.

Script line syntax

Each line starts with a script command and may contain parameters. The set of commands is a subset of the commands supported in Strap. Each script line follows one of the following four syntaxes:

Command
Command parameters
Command list-of-sequences-residue-selections
Command parameters, list-of-sequences-residue-selections
The last comma (,) in the line marks the end of parameters and the beginning of the list of sequences and residue selections.
Consequently the list of white space separated sequences and residue selections must not contain a comma.

Server side program flow

The input data is processed in the following order:

The sequences in the text-pane "sequences" are loaded (if any).
The script "beforeAC" is interpreted (if any).
The alignment is computed, unless there are already gaps in the input. ClustalW is currently used as standard method. If at this time 3D structures are loaded, mixed sequence/structure alignment is performed. TM-align is currently the standard 3D alignment method.
Residue annotations are loaded from UniProt, CSA or BioDAS-servers if the respective check-boxes are activated..
The script "afterAC" is run. (if any)
Homologous 3D-structures are identified if the check-box "3D" is activated. Since alignment computation is performed earlier, structures inferred here have no impact on alignment computation. Conversely, structures loaded in the text-field "sequences" or inferred by the command project_coordinates in "beforeAC" are considered for alignment computation.

Script Examples

1. Creating sequences

1.1. Amino acid sequences

beforeAC	aa_sequence MGDSQYSFSLTTFSPSGKLVQIEHALTAVGSGQTSLGIKASNGVVIATEKKLPSILVDEASVQKIQHLTPNIGVVYSGMGPDFRVLVRKSRKQAEQYLRLYKEPIPVTQLVRETATVMQEFTQSGGVRPFGVSLLVAGYDDKGPQLYQVDPSGSYFSWKASAMGKNVSNAKTFLEKRYTEDMELDDAIHTAILTLKEGFEGEISSKNIEIGKIGADKVFRVLTPAEIDDYLAEVE, ArabidopsisThaliana aa_sequence MATERYSFSLTTFSPSGKLVQLEYALAAVSGGAPSVGIIASNGVVIATENKHKSPLYEQHSVHRVEMIYNHIGMVYSGMGPDYRLLVKQARKIAQTYYLTYKEPIPVSQLVQRVATLMQEYTQSGGVRPFGVSLLICGWDNDRPYLYQSDPSGAYFAWKATAMGKNAVNGKTFLEKRYSEDLELDDAVHTAILTLKEGFEGKMTADNIEIGICDQNGFQRLDPASIKDYLASIP, DrosophilaMelanogaster aa_sequence SFSLTTFSPSGKLVQIEYALAAVAAGAPSVGIKATNGVVLATEKKQKSILYDEQSAHKVEPITKHIGMVYSGMGPDYRVLVRRARKLAQQYYLVYQEPIPTAQLVQRVASVMQEYTQSGGVRPFGVSLLIAGWDEGRPYLFQSDPSGAYFAWKATAMGKNYVNGKTFLEKRYNEDLELEDAIHTAILTLKESFEGQMTEDNIEVGICNEAGFKRLTPAEVKDYLAAIA, XenopusLaevis
Description	The command aa_sequence creates amino acid sequences from the one-letter-code.
	Display alignment

1.2. Aligned amino acid sequences

beforeAC	aa_sequence ----MGDSQY-----------------------SFSLTTFSPSGKLVQIEHALTAVGSGQTSLGIKASNGVVIATEKKLPSILVDEASVQKIQHLTPNIGVVYSGMGPDFRVLVRKSRKQAEQYLRLYKEPIPVTQLVRETATVMQEFTQSGGVRPFGVSLLVAGYDDKGPQLYQVDPSGSYFSWKASAMGKNVSNAKTFLEKRYTEDMELDDAIHTAILTLKEGFEGEISSKNIEIGKIGADKVFRVLTPAEIDDYLAEVE, ArabidopsisThaliana aa_sequence ---------------------------MATERYSFSLTTFSPSGKLVQLEYALAAVSGGAPSVGIIASNGVVIATENKHKSPLYEQHSVHRVEMIYNHIGMVYSGMGPDYRLLVKQARKIAQTYYLTYKEPIPVSQLVQRVATLMQEYTQSGGVRPFGVSLLICGWDNDRPYLYQSDPSGAYFAWKATAMGKNAVNGKTFLEKRYSEDLELDDAVHTAILTLKEGFEGKMTADNIEIGICDQNGFQRLDPASIKDYLASIP----------------------------, DrosophilaMelanogaster aa_sequence ---------------------------------SFSLTTFSPSGKLVQIEYALAAVAAGAPSVGIKATNGVVLATEKKQKSILYDEQSAHKVEPITKHIGMVYSGMGPDYRVLVRRARKLAQQYYLVYQEPIPTAQLVQRVASVMQEYTQSGGVRPFGVSLLIAGWDEGRPYLFQSDPSGAYFAWKATAMGKNYVNGKTFLEKRYNEDLELEDAIHTAILTLKESFEGQMTEDNIEVG-ICNEAGFKRLTPAEVKDYLAAIA, XenopusLaevis
Description	Dash characters in the sequences are interpreted as alignment gaps. Alignment computation is skipped.
	Display alignment

1.3. Nucleotide sequences

beforeAC	set_alignment_type_CN nt_sequence ATGGCATATCCCATACAACTAGGATTCCAAGATGCAACATCACCAATCATAGAAGAACTACTTCACTTTCATGACCACACGCTAATAATTGTCTTCTTAATTAGCTCATTAGTACTTTACATTATTTCACTAATACTAACGACAAAGCTGACCCATACAAGCACGATAGATGCACAAGAAGTAGAGACAATCTGAACCATTCTGCCCGCCATCATCTTAATTCTAATTGCTCTTCCTTCTTTACGAATTCTATACATAATAGATGAAATCAATAACCCATCTCTTACAGTAAAAACCATAGGACATCAGTGATACTGAAGCTATGAGTATACAGATTATGAGGACTTAAGCTTCGACTCCTACATAATTCCAACATCAGAATTAAAGCCAGGGGAGCTACGACTATTAGAAGTCGATAATCGAGTTGTACTACCAATAGAAATAACAATCCGAATGTTAGTCTCCTCTGAAGACGTATTACACTCATGAGCTGTGCCCTCTCTAGGACTAAAAACAGACGCAATCCCAGGCCGTCTAAACCAAACAACCCTTATATCGTCCCGTCCAGGCTTATATTACGGTCAATGCTCAGAAATTTGCGGGTCAAACCACAGTTTCATACCCATTGTCCTTGAGTTAGTCCCACTAAAGTACTTTGAAAAATGATCTGCGTCAATATTATAA, Cow nt_sequence ATGGCACACCCAACGCAACTAGGTTTCAAGGACGCGGCCATACCCGTTATAGAGGAACTTCTTCACTTCCACGACCACGCATTAATAATTGTGCTCCTAATTAGCACTTTAGTTTTATATATTATTACTGCAATGGTATCAACTAAACTTACTAATAAATATATTCTAGACTCCCAAGAAATCGAAATCGTATGAACCATTCTACCAGCCGTCATTTTAGTACTAATCGCCCTGCCCTCCCTACGCATCCTGTACCTTATAGACGAAATTAACGACCCTCACCTGACAATTAAAGCAATAGGACACCAATGATACTGAAGTTACGAGTATACAGACTATGAAAATCTAGGATTCGACTCCTATATAGTACCAACCCAAGACCTTGCCCCCGGACAATTCCGACTTCTGGAAACAGACCACCGAATAGTTGTTCCAATAGAATCCCCAGTCCGTGTCCTAGTATCTGCTGAAGACGTGCTACATTCTTGAGCTGTTCCATCCCTTGGCGTAAAAATGGACGCAGTCCCAGGACGACTAAATCAAGCCGCCTTTATTGCCTCACGCCCAGGGGTCTTTTACGGACAATGCTCTGAAATTTGTGGAGCTAATCACAGCTTTATACCAATTGTAGTTGAAGCAGTACCTCTCGAACACTTCGAAAACTGATCCTCATTAATACTAGAAGACGCCTCGCTAGGAAGCTAA, Carp nt_sequence ATGGCCAACCACTCCCAACTAGGCTTTCAAGACGCCTCATCCCCCATCATAGAAGAGCTCGTTGAATTCCACGACCACGCCCTGATAGTCGCACTAGCAATTTGCAGCTTAGTACTCTACCTTCTAACTCTTATACTTATAGAAAAACTATCATCAAACACCGTAGATGCCCAAGAAGTTGAACTAATCTGAACCATCCTACCCGCTATTGTCCTAGTCCTGCTTGCCCTCCCCTCCCTCCAAATCCTCTACATAATAGACGAAATCGACGAACCTGATCTCACCCTAAAAGCCATCGGACACCAATGATACTGAACCTATGAATACACAGACTTCAAGGACCTCTCATTTGACTCCTACATAACCCCAACAACAGACCTCCCCCTAGGCCACTTCCGCCTACTAGAAGTCGACCATCGCATTGTAATCCCCATAGAATCCCCCATTCGAGTAATCATCACCGCTGATGACGTCCTCCACTCATGAGCCGTACCCGCCCTCGGGGTAAAAACAGACGCAATCCCTGGACGACTAAATCAAACCTCCTTCATCACCACTCGACCAGGAGTGTTTTACGGACAATGCTCAGAAATCTGCGGAGCTAACCACAGCTACATACCCATTGTAGTAGAGTCTACCCCCCTAAAACACTTTGAAGCCTGATCCTCACTACTGTCATCTTAA, Chicken nt_sequence ATGGCACATGCAGCGCAAGTAGGTCTACAAGACGCTACTTCCCCTATCATAGAAGAGCTTATCACCTTTCATGATCACGCCCTCATAATCATTTTCCTTATCTGCTTCCTAGTCCTGTATGCCCTTTTCCTAACACTCACAACAAAACTAACTAATACTAACATCTCAGACGCTCAGGAAATAGAAACCGTCTGAACTATCCTGCCCGCCATCATCCTAGTCCTCATCGCCCTCCCATCCCTACGCATCCTTTACATAACAGACGAGGTCAACGATCCCTCCCTTACCATCAAATCAATTGGCCACCAATGGTACTGAACCTACGAGTACACCGACTACGGCGGACTAATCTTCAACTCCTACATACTTCCCCCATTATTCCTAGAACCAGGCGACCTGCGACTCCTTGACGTTGACAATCGAGTAGTACTCCCGATTGAAGCCCCCATTCGTATAATAATTACATCACAAGACGTCTTGCACTCATGAGCTGTCCCCACATTAGGCTTAAAAACAGATGCAATTCCCGGACGTCTAAACCAAACCACTTTCACCGCTACACGACCGGGGGTATACTACGGTCAATGCTCTGAAATCTGTGGAGCAAACCACAGTTTCATGCCCATCGTCCTAGAATTAATTCCCCTAAAAATCTTTGAAATAGGGCCCGTATTTACCCTATAG, Human nt_sequence ATGGCACATCCCACACAATTAGGATTCCAAGACGCGGCCTCACCCGTAATAGAAGAACTTCTTCACTTCCATGACCATGCCCTAATAATTGTATTTTTGATTAGCGCCCTAGTACTTTATGTTATTATTACAACCGTCTCAACAAAACTCACTAACATATATATTTTGGACTCACAAGAAATTGAAATCGTATGAACTGTGCTCCCTGCCCTAATCCTCATTTTAATCGCCCTCCCCTCACTACGAATTCTATATCTTATAGACGAGATTAATGACCCCCACCTAACAATTAAGGCCATGGGGCACCAATGATACTGAAGCTACGAGTATACTGATTATGAAAACTTAAGTTTTGACTCCTACATAATCCCCACCCAGGACCTAACCCCTGGACAATTCCGGCTACTAGAGACAGACCACCGAATGGTTGTTCCCATAGAATCCCCTATTCGCATTCTTGTTTCCGCCGAAGATGTACTACACTCCTGGGCCCTTCCAGCCATGGGGGTAAAGATAGACGCGGTCCCAGGACGCCTTAACCAAACCGCCTTTATTGCCTCCCGCCCCGGGGTATTCTATGGGCAATGCTCAGAAATCTGTGGAGCAAACCACAGCTTTATACCCATCGTAGTAGAAGCGGTCCCACTATCTCACTTCGAAAACTGGTCCACCCTTATACTAAAAGACGCCTCACTAGGAAGCTAA, Loach nt_sequence ATGGCCTACCCATTCCAACTTGGTCTACAAGACGCCACATCCCCTATTATAGAAGAGCTAATAAATTTCCATGATCACACACTAATAATTGTTTTCCTAATTAGCTCCTTAGTCCTCTATATCATCTCGCTAATATTAACAACAAAACTAACACATACAAGCACAATAGATGCACAAGAAGTTGAAACCATTTGAACTATTCTACCAGCTGTAATCCTTATCATAATTGCTCTCCCCTCTCTACGCATTCTATATATAATAGACGAAATCAACAACCCCGTATTAACCGTTAAAACCATAGGGCACCAATGATACTGAAGCTACGAATATACTGACTATGAAGACCTATGCTTTGATTCATATATAATCCCAACAAACGACCTAAAACCTGGTGAACTACGACTGCTAGAAGTTGATAACCGAGTCGTTCTGCCAATAGAACTTCCAATCCGTATATTAATTTCATCTGAAGACGTCCTCCACTCATGAGCAGTCCCCTCCCTAGGACTTAAAACTGATGCCATCCCAGGCCGACTAAATCAAGCAACAGTAACATCAAACCGACCAGGGTTATTCTATGGCCAATGCTCTGAAATTTGTGGATCTAACCATAGCTTTATGCCCATTGTCCTAGAAATGGTTCCACTAAAATATTTCGAAAACTGATCTGCTTCAATAATTTAA, Mouse nt_sequence ATGGCTTACCCATTTCAACTTGGCTTACAAGACGCTACATCACCTATCATAGAAGAACTTACAAACTTTCATGACCACACCCTAATAATTGTATTCCTCATCAGCTCCCTAGTACTTTATATTATTTCACTAATACTAACAACAAAACTAACACACACAAGCACAATAGACGCCCAAGAAGTAGAAACAATTTGAACAATTCTCCCAGCTGTCATTCTTATTCTAATTGCCCTTCCCTCCCTACGAATTCTATACATAATAGACGAGATTAATAACCCAGTTCTAACAGTAAAAACTATAGGACACCAATGATACTGAAGCTATGAATATACTGACTATGAAGACCTATGCTTTGACTCCTACATAATCCCAACCAATGACCTAAAACCAGGTGAACTTCGTCTATTAGAAGTTGATAATCGGGTAGTCTTACCAATAGAACTTCCAATTCGTATACTAATCTCATCCGAAGACGTCCTGCACTCATGAGCCATCCCTTCACTAGGGTTAAAAACCGACGCAATCCCCGGCCGCCTAAACCAAGCTACAGTCACATCAAACCGACCAGGTCTATTCTATGGCCAATGCTCTGAAATTTGCGGCTCAAATCACAGCTTCATACCCATTGTACTAGAAATAGTGCCTCTAAAATATTTCGAAAACTGATCAGCTTCTATAATTTAA, Rat nt_sequence ATGGCATACCCCCTACAAATAGGCCTACAAGATGCAACCTCTCCCATTATAGAGGAGTTACTACACTTCCATGACCACACATTAATAATTGTGTTCCTAATTAGCTCATTAGTACTCTACATTATCTCACTTATACTAACCACGAAACTCACCCACACAAGTACAATAGACGCACAAGAAGTGGAAACGGTGTGAACGATCCTACCCGCTATCATTTTAATTCTCATTGCCCTACCATCATTACGAATCCTCTACATAATGGACGAGATCAATAACCCTTCCTTGACCGTAAAAACTATAGGACATCAGTGATACTGAAGCTATGAGTACACAGACTACGAAGACCTGAACTTTGACTCATATATGATCCCCACACAAGAACTAAAGCCCGGAGAACTACGACTGCTAGAAGTAGACAATCGAGTAGTCCTCCCAATAGAAATAACAATCCGCATACTAATCTCATCAGAAGATGTACTCCACTCATGAGCCGTACCGTCCCTAGGACTAAAAACTGATGCTATCCCAGGACGACTAAACCAAACAACCCTAATAACCATACGACCAGGACTGTACTACGGTCAATGCTCAGAAATCTGTGGTTCAAACCACAGCTTCATACCTATTGTCCTCGAATTGGTCCCACTATCCCACTTCGAGAAATGATCTACCTCAATGCTTTAA, Seal nt_sequence ATGGCATATCCATTCCAACTAGGTTTCCAAGATGCAGCATCACCCATCATAGAAGAGCTCCTACACTTTCACGATCATACACTAATAATCGTTTTTCTAATTAGCTCTTTAGTTCTCTACATTATTACCCTAATGCTTACAACCAAATTAACACATACTAGTACAATAGACGCCCAAGAAGTAGAAACTGTCTGAACTATCCTCCCAGCCATTATCTTAATTTTAATTGCCTTGCCTTCATTACGGATCCTTTACATAATAGACGAAGTCAATAACCCCTCCCTCACTGTAAAAACAATAGGTCACCAATGATATTGAAGCTATGAGTATACCGACTACGAAGACCTAAGCTTCGACTCCTATATAATCCCAACATCAGACCTAAAGCCAGGAGAACTACGATTATTAGAAGTAGATAACCGAGTTGTCTTACCTATAGAAATAACAATCCGAATATTAGTCTCATCAGAAGACGTACTCCACTCATGGGCCGTACCCTCCTTGGGCCTAAAAACAGATGCAATCCCAGGACGCCTAAACCAAACAACCTTAATATCAACACGACCAGGCCTATTTTATGGACAATGCTCAGAGATCTGCGGCTCAAACCACAGTTTCATACCAATTGTCCTAGAACTAGTACCCCTAGAAGTCTTTGAAAAATGATCTGTATCAATACTATAA, Whale nt_sequence ATGGCACACCCATCACAATTAGGTTTTCAAGACGCAGCCTCTCCAATTATAGAAGAATTACTTCACTTCCACGACCATACCCTCATAGCCGTTTTTCTTATTAGTACGCTAGTTCTTTACATTATTACTATTATAATAACTACTAAACTAACTAATACAAACCTAATGGACGCACAAGAGATCGAAATAGTGTGAACTATTATACCAGCTATTAGCCTCATCATAATTGCCCTTCCATCCCTTCGTATCCTATATTTAATAGATGAAGTTAATGATCCACACTTAACAATTAAAGCAATCGGCCACCAATGATACTGAAGCTACGAATATACTAACTATGAGGATCTCTCATTTGACTCTTATATAATTCCAACTAATGACCTTACCCCTGGACAATTCCGGCTGCTAGAAGTTGATAATCGAATAGTAGTCCCAATAGAATCTCCAACCCGACTTTTAGTTACAGCCGAAGACGTCCTCCACTCGTGAGCTGTACCCTCCTTGGGTGTCAAAACAGATGCAATCCCAGGACGACTTCATCAAACATCATTTATTGCTACTCGTCCGGGAGTATTTTACGGACAATGTTCAGAAATTTGCGGAGCAAACCACAGCTTTATACCAATTGTAGTTGAAGCAGTACCGCTAACCGACTTTGAAAACTGATCTTCATCAATACTAGAAGCATCACTAAGA, Frog
Description	The nucleotide sequences are defined with the command nt_sequence. The default sequence type is peptide. The command set_alignment_type_CN prevents the nucleotide sequences from being translated.
	Display alignment

2. Loading sequences from files

2.1. Sequence files from URLs

beforeAC	load http://www.ebi.ac.uk/cgi-bin/dbfetch?db=uniprotkb&style=raw&id=HBA_PIG load http://www.ebi.ac.uk/cgi-bin/dbfetch?db=uniprotkb&style=raw&id=HBA1_XENLA load http://www.rcsb.org/pdb/files/2DN1.pdb.gz
Description	The command load loads sequences from text documents in any format. It takes URLs, absolute file-paths or database references. URLs or file-paths of Gnu-zipped files must end with ".gz"
	Display alignment

2.2. Sequence name

beforeAC	load http://www.ebi.ac.uk/cgi-bin/dbfetch?db=uniprotkb&style=raw&id=HBA_PIG\|Pig load http://www.ebi.ac.uk/cgi-bin/dbfetch?db=uniprotkb&style=raw&id=HBA1_XENLA\|Frog load http://www.rcsb.org/pdb/files/2DN1.pdb.gz\|Human
Description	The sequence name can be specified by adding a suffix with a vertical bar.
	Display alignment

2.3. Sequence files from databases

beforeAC	load UNIPROT:A0AQI8 UNIPROT:A0AQI9 PDB:1YEW_B
Description	The command load loads sequences from text documents in any format. It takes URLs, absolute file-paths or database references. URLs or file-paths of Gnu-zipped files must end with ".gz"
	Display alignment

2.4. Alignment files

beforeAC	load PFAM:PF08792
Description	The command load can load alignments in Clustal, MSF, Multiple Fasta and Stockholm format given as URL or PFAM-ID.
	Display alignment

3. Translating nucleotide sequences

3.1. Translating coding sequences

beforeAC	set_alignment_type_P load UNIPROT:P01185\|vasopressin_UniProt nt_sequence ATGCCTGACACCATGCTGCCCGCCTGCTTCCTCGGCCTACTGGCCTTCTCCTCCGCGTGCTACTTCCAGAACTGCCCGAGGGGCGGCAAGAGGGCCATGTCCGACCTGGAGCTGAGACAGTGCCTCCCCTGCGGCCCCGGGGGCAAAGGCCGCTGCTTCGGGCCCAGCATCTGCTGCGCGGACGAGCTGGGCTGCTTCGTGGGCACGGCTGAGGCGCTGCGCTGCCAGGAGGAGAACTACCTGCCGTCGCCCTGCCAGTCCGGCCAGAAGGCGTGCGGGAGCGGGGGCCGCTGCGCCGCCTTCGGCGTTTGCTGCAACGACGAGAGCTGCGTGACCGAGCCCGAGTGCCGCGAGGGCTTTCACCGCCGCGCCCGCGCCAGCGACCGGAGCAACGCCACGCAGCTGGACGGGCCGGCCGGGGCCTTGCTGCTGCGGCTGGTGCAGCTGGCCGGGGCGCCCGAGCCCTTCGAGCCCGCCCAGCCCGACGCCTAC, vasopressin load UNIPROT:P01178\|oxytocin_UniProt nt_sequence ATGGCCGGCCCCAGCCTCGCTTGCTGTCTGCTCGGCCTCCTGGCGCTGACCTCCGCCTGCTACATCCAGAACTGCCCCCTGGGAGGCAAGAGGGCCGCGCCGGACCTCGACGTGCGCAAGTGCCTCCCCTGCGGCCCCGGGGGCAAAGGCCGCTGCTTCGGGCCCAATATCTGCTGCGCGGAAGAGCTGGGCTGCTTCGTGGGCACCGCCGAAGCGCTGCGCTGCCAGGAGGAGAACTACCTGCCGTCGCCCTGCCAGTCCGGCCAGAAGGCGTGCGGGAGCGGGGGCCGCTGCGCGGTCTTGGGCCTCTGCTGCAGCCCGGACGGCTGCCACGCCGACCCTGCCTGCGACGCGGAAGCCACCTTCTCCCAGCGC, oxytocin
Description	The command nt_sequence takes the coding sequence. If the alignment type is set to peptide with set_alignment_type_P at the beginning of the script or if any sequence has other letters than A, C, T, G or N then the sequence is translated to amino acids. The first three letters are the triplet for the first amino acid.
	Display alignment

3.2. Genomic sequence positions

beforeAC	set_alignment_type_P load UNIPROT:P01185\|vasopressin_UniProt nt_sequence ATGCCTGACACCATGCTGCCCGCCTGCTTCCTCGGCCTACTGGCCTTCTCCTCCGCGTGCTACTTCCAGAACTGCCCGAGGGGCGGCAAGAGGGCCATGTCCGACCTGGAGCTGAGACAGTGCCTCCCCTGCGGCCCCGGGGGCAAAGGCCGCTGCTTCGGGCCCAGCATCTGCTGCGCGGACGAGCTGGGCTGCTTCGTGGGCACGGCTGAGGCGCTGCGCTGCCAGGAGGAGAACTACCTGCCGTCGCCCTGCCAGTCCGGCCAGAAGGCGTGCGGGAGCGGGGGCCGCTGCGCCGCCTTCGGCGTTTGCTGCAACGACGAGAGCTGCGTGACCGAGCCCGAGTGCCGCGAGGGCTTTCACCGCCGCGCCCGCGCCAGCGACCGGAGCAACGCCACGCAGCTGGACGGGCCGGCCGGGGCCTTGCTGCTGCGGCTGGTGCAGCTGGCCGGGGCGCCCGAGCCCTTCGAGCCCGCCCAGCCCGACGCCTAC,complement(75..247,422..623,2000..2119),vasopressin load UNIPROT:P01178\|oxytocin_UniProt nt_sequence ATGGCCGGCCCCAGCCTCGCTTGCTGTCTGCTCGGCCTCCTGGCGCTGACCTCCGCCTGCTACATCCAGAACTGCCCCCTGGGAGGCAAGAGGGCCGCGCCGGACCTCGACGTGCGCAAGTGCCTCCCCTGCGGCCCCGGGGGCAAAGGCCGCTGCTTCGGGCCCAATATCTGCTGCGCGGAAGAGCTGGGCTGCTTCGTGGGCACCGCCGAAGCGCTGCGCTGCCAGGAGGAGAACTACCTGCCGTCGCCCTGCCAGTCCGGCCAGAAGGCGTGCGGGAGCGGGGGCCGCTGCGCGGTCTTGGGCCTCTGCTGCAGCCCGGACGGCTGCCACGCCGACCCTGCCTGCGACGCGGAAGCCACCTTCTCCCAGCGC, join(37..156,458..659,744..799), oxytocin GFF oxytocin_UniProt\|\|Signalpeptide\|1\|19\|\|\|\|\|Color=#aaAA00; GFF oxytocin_UniProt\|\|Oxytocin\|20\|28\|\|\|\|\|Style=BACKGROUND; Color=#55FF55; GFF oxytocin_UniProt\|\|Neurophysin_1\|32\|125\|\|\|\|\|Color=#6666FF; GFF vasopressin_UniProt\|\|Signalpeptide\|1\|19\|\|\|\|\|Color=#aaAA00; GFF vasopressin_UniProt\|\|ARG-Vasopressin\|20\|28\|\|\|\|\|Style=BACKGROUND; Color=#55FF55; GFF vasopressin_UniProt\|\|Neurophysin_2\|32\|124\|\|\|\|\|Color=#6666FF; new_nucleotide_selection 2000-2119, vasopressin/Exon1 new_nucleotide_selection 422-623, vasopressin/Exon2 new_nucleotide_selection 75-247, vasopressin/Exon3 new_nucleotide_selection 1-156, oxytocin/Exon1 new_nucleotide_selection 458-659, oxytocin/Exon2 new_nucleotide_selection 744-898, oxytocin/Exon3 color #FF0000, /Exon1 /Exon3 color #0000FF, /Exon2 add_annotation Style=UNDERLINE, /Ex.*
Description	Optionally, the corresponding genomic positions can be given in the second parameter. They will be reported for the triplet at the mouse pointer and allow for residue selections referring to genomic positions. The command new_nucleotide_selection creates amino acid selections based on nucleotide positions. Please note that oxytocin and vasopressin are transcribed in opposite directions. Move the mouse pointer slowly from the N-terminus towards the C-terminus and observe the genomic DNA positions. Note that the genomic positions are declining in vasopressin while they are rising in oxytocin. Exons 2 and 3 overlap. Considering the triplet at the exon boundary the first codon position belongs to exon 2 while codon position 2 and 3 belong to exon 3.
	Display alignment

3.3. Genomic sequences from EMBL or Genbank files

beforeAC	load UNIPROT:P01185\|vasopressin_UniProt http://www.bioinformatics.org/strap/aa/sampleData/vasopressin_ENSG00000101200.data.txt\|vasopressin load UNIPROT:P01178\|oxytocin_UniProt http://www.bioinformatics.org/strap/aa/sampleData/oxytocin_ENSG00000101405.data.txt\|oxytocin translate_cds join(37..156,458..659,744..799), oxytocin translate_cds join(complement(2000..2119),complement(422..623), complement(75..247)), vasopressin GFF oxytocin_UniProt\|\|Signalpeptide\|1\|19\|\|\|\|\|Color=#aaAA00; GFF oxytocin_UniProt\|\|Oxytocin\|20\|28\|\|\|\|\|Style=BACKGROUND; Color=#55FF55; GFF oxytocin_UniProt\|\|Neurophysin_1\|32\|125\|\|\|\|\|Color=#6666FF; GFF vasopressin_UniProt\|\|Signalpeptide\|1\|19\|\|\|\|\|Color=#aaAA00; GFF vasopressin_UniProt\|\|ARG-Vasopressin\|20\|28\|\|\|\|\|Style=BACKGROUND; Color=#55FF55; GFF vasopressin_UniProt\|\|Neurophysin_2\|32\|124\|\|\|\|\|Color=#6666FF; new_nucleotide_selection 2000-2119, vasopressin/Exon1 new_nucleotide_selection 422-623, vasopressin/Exon2 new_nucleotide_selection 75-247, vasopressin/Exon3 new_nucleotide_selection 1-156, oxytocin/Exon1 new_nucleotide_selection 458-659, oxytocin/Exon2 new_nucleotide_selection 744-898, oxytocin/Exon3 color #FF0000, /Exon1 /Exon3 color #0000FF, /Exon2 add_annotation Style=UNDERLINE, /Ex.*
Description	The genomic sequence is loaded with the command load. The amino acid sequence is predicted from the exon positions given with the command translate_cds. There is a potential performance problem: The loaded files are potentially huge because they contain the entire gene sequence including UTRs and introns. Therefore, the notion of the above example should be prefered.
	Display alignment

4. Pruning

A subrange of the original sequence can be be displayed. Residue numbering still refers to the original full length sequence.

4.1. Displaying a range of residue

beforeAC	load UNIPROT:A0AQI8!10-100 UNIPROT:A0AQI9!10-100 PDB:1YEW_B!10-100
Description	The residue range can be specified by adding a suffix with an exclamation character. Please observe the residue positions which start with 10.
	Display alignment

4.2. Pruning alignments left or right

beforeAC	load UNIPROT:HBB_HORSE UNIPROT:HBB_SHEEP UNIPROT:HBB_RABIT UNIPROT:HBB_ANSIN UNIPROT:HBB_MACFA UNIPROT:HBB_PAGBE UNIPROT:HBB_PANTR
afterAC	clip_N_term HBB_HORSE/10 clip_C_term HBB_HORSE/150
Description	The command clip_N_term takes the residue position of one specific reference sequence. If the residue position is omitted, the alignment is truncated at the first residue of that sequence. Nevertheless, it acts on all aligned sequences clipping off all residues left from this position. The command clip_C_term is equivalent and removes residues at the C-terminus.
	Display alignment

5. Sequence Groups

Sequence groups are named subsets of all loaded sequences. They can be activated by buttons in the alignment GUI. Open the menu "Sequence Groups" in the alignment frame.

5.1. Sequence groups

beforeAC	load UNIPROT:HBB_HORSE UNIPROT:HBB_SHEEP UNIPROT:HBB_RABIT UNIPROT:HBB_ANSIN UNIPROT:HBB_MACFA UNIPROT:HBB_PAGBE UNIPROT:HBB_PANTR UNIPROT:HBB_PIG UNIPROT:HBB1_MOUSE UNIPROT:HBB1_XENLA UNIPROT:HBB_ACCGE UNIPROT:HBB_AILFU UNIPROT:HBB_AILME UNIPROT:HBB_ALCAA UNIPROT:HBB_ALLMI UNIPROT:HBB_ANAPL UNIPROT:HBB_ANSAN UNIPROT:HBB_ANSSE UNIPROT:HBB_ANTPA UNIPROT:HBA_PAGBE UNIPROT:HBA_PIG UNIPROT:HBA1_XENLA UNIPROT:HBA_AILFU UNIPROT:HBA_AILME UNIPROT:HBA_ALCAA UNIPROT:HBA_ALLMI UNIPROT:HBA_AMBME
afterAC	sequence_group Alpha, HBA.* sequence_group Beta, HBB.*
Description	The command sequence_group takes a group name and a list of sequences.
	Display alignment

5.2. Sequence groups by taxonomy

beforeAC	aa_sequence MGDSQYSFSLTTFSPSGKLVQIEHALTAVGSGQTSLGIKASNGVVIATEKKLPSILVDEASVQKIQHLTPNIGVVYSGMGPDFRVLVRKSRKQAEQYLRLYKEPIPVTQLVRETATVMQEFTQSGGVRPFGVSLLVAGYDDKGPQLYQVDPSGSYFSWKASAMGKNVSNAKTFLEKRYTEDMELDDAIHTAILTLKEGFEGEISSKNIEIGKIGADKVFRVLTPAEIDDYLAEVE, ArabidopsisThaliana aa_sequence MATERYSFSLTTFSPSGKLVQLEYALAAVSGGAPSVGIIASNGVVIATENKHKSPLYEQHSVHRVEMIYNHIGMVYSGMGPDYRLLVKQARKIAQTYYLTYKEPIPVSQLVQRVATLMQEYTQSGGVRPFGVSLLICGWDNDRPYLYQSDPSGAYFAWKATAMGKNAVNGKTFLEKRYSEDLELDDAVHTAILTLKEGFEGKMTADNIEIGICDQNGFQRLDPASIKDYLASIP, DrosophilaMelanogaster aa_sequence SFSLTTFSPSGKLVQIEYALAAVAAGAPSVGIKATNGVVLATEKKQKSILYDEQSAHKVEPITKHIGMVYSGMGPDYRVLVRRARKLAQQYYLVYQEPIPTAQLVQRVASVMQEYTQSGGVRPFGVSLLIAGWDEGRPYLFQSDPSGAYFAWKATAMGKNYVNGKTFLEKRYNEDLELEDAIHTAILTLKESFEGQMTEDNIEVGICNEAGFKRLTPAEVKDYLAAIA, XenopusLaevis
afterAC	find_uniprot_id * aa_sequence acdefghiklmnpqrstvwy, Not_In_UniProt taxonomy Eukaryota;Metazoa;Craniata;Vertebrata;Euteleostomi;Mammalia;Primates;Hominidae;Homo, Not_In_UniProt taxonomy_group Vertebrata Viruses Eukaryota Bacteria Archaea Fungi Mammalia , *
Description	The command taxonomy_group acts on sequences with known taxonomy data. The taxonomy data is obtained in different ways: Set by the command taxonomy Loaded from UniProt formated sequence files. Obtained from UniProt data if the UniProt ID is known. For each word like "Eukaryota" or "Metazoa" , a sequence group is created. All sequences are included where the respective text string is contained in the taxonomy data. Open the menu "Sequence Groups" and find "Vertebrata #2", "Eukaryota #4" and "Mammalia #1".
	Display alignment

6. Shorter scripts - compactness

Using variables, brace expansion and regular expressions, readability of the script can be improved and its size reduced.

6.1. Variables

beforeAC	let $s=sequence1 let $msg=HELLO WORLD let $r=Aacc aa_sequence ACDEFGHIKLMNPQRSTVWY, $s aa_sequence $r$r$r$r, Repeated balloon_text $msg Or with curly braces ${msg}, $s
Description	A variable is defined with the command let. Subsequent script lines can contain references to that variable. When a script line is run, all variable references are replaced by their assigned values. As for UNIX shells, there are two notions for variable references: dollar_name like $msg dollar_curly-brace_name_curly-brace like ${msg}. This notion is required for variable references followed directly by a word character (letter, digit and underscore).
	Display alignment

6.2. Brace expansion

beforeAC	load UNIPROT:{HBB_HORSE HBB_SHEEP HBB_RABIT HBB_ANSIN HBB_MACFA HBB_PAGBE HBB_PANTR} # Two adjacent braces like {a b c d}{1 2 3} are supported # Three braces like {a b c d}{1 2 3}{x y} are not supported balloon_text This explodes to 4*3 tokens: {a b c d}{1 2 3} , HBB_HORSE
Description	Brace expansion is described in Wikipedia The elements surrounded by curly braces are here separated by white space.
	Display alignment

6.3. Regular expressions and asterisks

beforeAC	load UNIPROT:{HBB_HORSE HBB_SHEEP HBB_RABIT HBB_ANSIN HBB_MACFA HBB_PAGBE HBB_PANTR UNIPROT:HBA_RABIT UNIPROT:HBA_PIG UNIPROT:HBA_AILME UNIPROT:HBA_ALCAA} color #FF0000, HBA_.* color #0000FF, HBB_.* balloon_text Haemoglobin, * icon image/jpeg;base64,/9j/4AAQSkZJRgABAQEAXwBfAAD/2wBDABMNDxEPDBMREBEWFRMXHTAfHRsbHTsqLSMwRj5KSUU+RENNV29eTVJpU0NEYYRiaXN3fX59S12Jkoh5kW96fXj/2wBDARUWFh0ZHTkfHzl4UERQeHh4eHh4eHh4eHh4eHh4eHh4eHh4eHh4eHh4eHh4eHh4eHh4eHh4eHh4eHh4eHh4eHj/wgARCAAwADADAREAAhEBAxEB/8QAGQAAAwEBAQAAAAAAAAAAAAAAAwQFAQIG/8QAFwEBAQEBAAAAAAAAAAAAAAAAAAECA//aAAwDAQACEAMQAAABkHYIqE44Hod57T6ZeIlnJRhmXM7Z3jz1mDsNS9S1NZ83S4eClDFa0l6k5DLsEDmUNP/EACAQAAIBBAIDAQAAAAAAAAAAAAECAAMEEiEQERMiIzH/2gAIAQEAAQUCGyyMvFnTVw4xeYiKdVqQ6o/O2bbQQRT711+XKmKcW/RWTCpAYDFYEAy728wM6MD9QXAmOc8U/8QAGBEBAAMBAAAAAAAAAAAAAAAAAQAgIUD/2gAIAQMBAT8BubEqZHm//8QAGREAAgMBAAAAAAAAAAAAAAAAAAERIDAx/9oACAECAQE/Ab8Ex0iTmqx//8QAHxAAAQMEAwEAAAAAAAAAAAAAAQACERAgMVESIUEi/9oACAEBAAY/Al2KOLhPicNGzmzHo0p32ibIIXz4LebqEDFwjVuCpp//xAAeEAEAAgEFAQEAAAAAAAAAAAABABEhEDFBUWEgcf/aAAgBAQABPyEUG19ziY740EWNlwRciGh1zYtPjswO1hXHlaF6DqNzmS8J3GGHGx5qMpIAg5wQTwYXsB8TmJZMOzCdFlElG0p4s809i8nEV+kXP//aAAwDAQACAAMAAAAQAgiHCiJAEqjg9apiPl//xAAZEQADAQEBAAAAAAAAAAAAAAAAAREgECH/2gAIAQMBAT8QJpEpASwzaohifmYQeLx5vnf/xAAaEQEBAQEBAQEAAAAAAAAAAAABABEgITEQ/9oACAECAQE/EId52VfkDLCP1+EbH6cj7sO85kuUsyNsv//EACAQAQACAgIDAQEBAAAAAAAAAAEAESExQVEQYXGhgdH/2gAIAQEAAT8QrFEqvQi5emt18fA5EnQ5a9xgqAewfF4xhKCwpC/oRgSDmz9D1KmNj6HX4T2xPhXRK1XufQDmnJAMjinAJuMzlM5iVHDvD9ZYEsMjyRbrvlxhjMlLEHcIAcBVyuTEG6P8RSALUV3f+iO6cnTiAoYFW1cLLrW2+4jqf//Z , ._RABIT new_selection 63 , { HBB_HORSE HBB_ANSIN }/Histidine63 new_selection 64 , { HBB_RABIT HBB_MACFA HBB_PAGBE HBB_PANTR }/Histidine64 color #0000FF, /Histidine.*
Description	Asterisk: An asterisk stands for all sequences. A sequence name followed by slash and asterisk represents all residue selections in that sequence. An asterisk followed by slash and a selection name stands for all residue selections of that name in any sequence. Nevertheless, asterisks in regular expressions have a different meaning. Regular expressions: Regular expressions can be used for lists of sequences or residue selections. The expression HBA_.* denotes all sequences with a name starting with HBA_. The expression /Histidine. means all residue selections in all proteins with a name starting with Histidine
	Display alignment

6.4. Aliases for web adresses

beforeAC	alias_for_url WIKT:=http://en.wiktionary.org/wiki/ aa_sequence ACDEFGHIKLMNPQRSTVWY, seq1 balloon_text Be WIKT:happy !, seq1
Description	Without aliases, a web link within the balloon text of sequences and residue selections would require the complicated HTML syntax "<A target="_blank" href="...">...</A>. Aliases make it easier and also allow multiple usage of the same URL or its constant part. The alias for the entire web address or for its constant part is defined with alias_for_url. In this example the alias "WIKT:" is defined. It stands for "http://en.wiktionary.org/wiki/". This URL is not complete. Therefore "WIKT:" is followed by a word, here "happy". Thus "WIKT:happy" stands for "http://en.wiktionary.org/wiki/happy" The alias can be used with the commands balloon_text, add_annotation Balloon and GFF. Open the example alignment and evoke the balloon text in the example alignment by moving the mouse over the sequence name "seq1". Then click with the right mouse-button which will bring up a web link.
	Display alignment

7. Sequence Attributes

7.1. Balloon messages

beforeAC	load UNIPROT:HBA_PIG load UNIPROT:HBA1_XENLA balloon_text <HTML><BODY>I love <U>frogs</U><br>because they live by the water.</BODY></HTML>, HBA1_XENLA balloon_text Bla, bla bla text with URLs and x-refs: PFAM:PF00069 WIKI:Pig PMID:13464827 http://www.pnas.org/content/112/20/6425, HBA_PIG
Description	The command balloon_text takes plain text or HTML code. It can contain web links and database references. These references act as web-links. On desktop computers, the user needs to perform a right-click in order to activate the web-links. For mobile devices this is not necessary. Also see alias_for_url.
	Display alignment

7.2. Accession IDs and cross references

beforeAC	aa_sequence AGYDRHITIFSPEGRLYQVEYAFKATNQTNINSLAVRGKDCTVVISQKKVPDKLLDPTTVSYIFCISRTIGMVVNGPIPDARNAALRAKAEAAEFRYKYGYDMPCDVLAKRMANLSQIYTQRAYMRPLGVILTFVSVDEELGPSIYKTDPAGYYVGYKATATGPKQQEITTNLENHFKKSKIDHINEESWEKVVEFAITHMIDALGTEFSKNDLEVGVATKDKFFTLSAENIEERLVAIAEQD, a1_SaccharomycesCerevisiae aa_sequence TSIMAVTFKDGVILGADSRTTTGAYIANRVTDKLTRVHDKIWCCRSGSAADTQAIADIVQYHLELYTSQYGTPSTETAASVFKELCYENKDNLTAGIIVAGYDDKNKGEVYTIPLGGSVHKLPYAIAGSGSTFIYGYCDKNFRENMSKEETVDFIKHSLSQAIKWDGSSGGVIRMVVLTAAGVERLIFYPDEYEQL, b1_SaccharomycesCerevisiae aa_sequence TTIVSVRRNGHVVIAGDGQATLGNTVMKGNVKKVRRLYNDKVIAGFAGGTADAFTLFELFERKLEMHQGHLVKAAVELAKDWRTDRMLRKLEALLAVADETASLIITGNGDVVQPENDLIAIGSGGPYAQAAARALLENTELSAREIAEKALDIAGDICIYTNHFHTIEELSYKAEFHHH, hs_EscherichiaColi let $BALLOON=Balloon=<HTML><BODY><OL> <LI>Thr<sub>1</sub></LI> <LI>Lys<sub>33</sub></LI> <LI>Ser<sub>129</sub></LI></OL> </BODY></HTML> GFF b1_SaccharomycesCerevisiae\|\|Active_site\|1,33,129\|\|\|\|\|$BALLOON GFF hs_EscherichiaColi\|\|Active_site\|1,33,124\|\|\|\|\|$BALLOON let $HS=hs_EscherichiaColi accession_id UNIPROT:PSA1_YEAST, a1_SaccharomycesCerevisiae accession_id UNIPROT:PSB1_YEAST, b1_SaccharomycesCerevisiae accession_id PDB:1NED_A, $HS add_xref UNIPROT:HSLV_ECO24, $HS
Description	Database IDs are either set explicitly with the commands accession_id, add_xref and balloon_text or predicted by sequence search with the command find_uniprot_id. In Alignment Annotator, the UniProt ID is very important and is used for uniprot_features, DAS_features and taxonomy_group.
	Display alignment

7.3. Icons

beforeAC	load UNIPROT:HBA_PIG load UNIPROT:HBA1_XENLA icon http://www.goldenweb.it/software/immagini/icone/animals/water_animals/Frog.gif, HBA1_XENLA icon image/gif;base64,R0lGODlhGgAaAIABADMzM////yH5BAEKAAEALAAAAAAaABoAAAI5jI+py+0Po0SgWmpry9nwq3yimIwmh51nqo5s+71wd8yuPIc2Xe7AZnvkIK3JauJBIVlLXfMJjUIKADs= , HBA_PIG
Description	The command icon takes gif, jpg and png images as URL or base64 data. File paths of files on the server can also be used.
	Display alignment

7.4. Residue index offsets

beforeAC	load UNIPROT:A0AQI8!10-150 UNIPROT:A0AQI9!10-150 PDB:1YEW_B!10-150 GFF \|\|My_Site\|50\|60 set_residue_index_offset 30, A0AQI9.
Description	The command set_residue_index_offset is used when residue numbering start not at number one. In this example the displayed residue range starts at 10 such that the first index is 10 plus 30 = 40. The offset affects the position of residue selections.
	Display alignment

7.5. Secondary structure

beforeAC	load http://www.bioinformatics.org/strap/aa/sampleData/onlyAtomLines.pdb
Description	If the secondary structure elements are not recorded in the 3D structure file, the secondary structure is computed by dssp. Helices are drawn red and beta sheets yellow.
	Display alignment

7.6. Secondary structure

beforeAC	aa_sequence ACDEFGHIKLMNPQRSTVWY, seq1 aa_sequence ACDEFGHIKLMNPQRSTVWY, seq2 secondary_structure --HHHH---EEE---EEE-, seq1 secondary_structure -HHHHH---EEE---EEE-, seq2
Description	Residue structure elements can be assigned with the command secondary_structure. E=extended sheet, H=helix. To see the secondary structure for each individual sequence, the check-box "Helices & Sheets" in the tool-bar of the alignment needs to be activated. If more than one sequence has secondary structure information, the one that spans most alignment positions is taken for the secondary structure cartoon. This can be changed with set_ruler_secondary_structure.
	Display alignment

8. Residue selections and annotations

Residue selections are displayed by underline or filled background. They can be defined explicitly, refering to the sequence index, the PDB resnum and insertion code or the nucleotide index of the DNA sequence an amino acid was predicted from. They can also be obtained from annotation databases. Residue selections can have attributes like color, balloon messages and 3D-commands.

8.1. GFF-notion

beforeAC	load UNIPROT:HBA_PIG load UNIPROT:HBA1_XENLA GFF HBA_PIG\|\|My_Special_Residues\|12\|40
Description	The command GFF takes GFF-formated annotations. Fields are separated by tabulator character or vertical bar. For more details open Change > Annotations > Own in Alignment Annotator.
	Display alignment

8.2. GFF: with attributes

beforeAC	load UNIPROT:HBA_PIG load UNIPROT:HBA1_XENLA let $BALLOON=<HTML><BODY><u>underlined text</u><br><b>This is bold text</b></BODY></HTML> GFF HBA_PIG\|\|My_Special_Residues\|12\|40\|.\|.\|.\|Balloon=Bla bla; Color=#ff00FF; Style=BACKGROUND; 3D_view=spheres GFF HBA_PIG\|\|My_Special_Residues\|30\|50\|.\|.\|.\|Balloon=$BALLOON; Color=#ff0000; Style=BACKGROUND; 3D_view=sticks
Description	The 9th field can contain attributes such as Balloon. The attribute text ends at the next semicolon. If the attribute text itself contains a semicolon, then the text needs to be surrounded by double quotes.
	Display alignment

8.3. GFF: non-consecutive positions

beforeAC	load UNIPROT:HBA_PIG GFF HBA_PIG\|\|My_Special_Residues\|12,14,16,18,20-22
Description	When the 5th field (End-position) is omitted, the 4th field is interpreted as an expression of positions. It can contain comma separated ranges and single positions such as 10-20,40-100,102 This is not standard GFF.
	Display alignment

8.4. GFF: refering to PDB-Resnum

beforeAC	load PDB:1RYP_2 PDB:1JD2_2 GFF 1RYP_2\|\|My_Special_Residues\|12:-40: GFF 1RYP_2\|\|In_Prosequence\|-4:--2: GFF 1JD2_2\|\|With_Insertion_code\|179:,179C:,184H:-184M:
Description	When the 5th field (End-position) is empty, the 4th field can contain a complex specification of residue positions consisting of single positions and intervalls separated by space. For the PDB residue positions of 3D structure files, the Rasmol notion is used: PDB residue number - colon - (optional) chain letter. Thus the colon indicates that the number refers to the PDB numbering rather than the natural numbering. Rarely, adjacent residues share the same residue number and are distinguished by the so-called insertion code. This is a single upper case letter between the residue number and the colon. There is another deviation from the natural numbering: Zero and negative positions (not PDB residue numbers) are displayed as one minus the number. This is because the number zerow is usually ommitted.
	Display alignment

8.5. Adding attributes to residue selections

beforeAC	load UNIPROT:HBA_PIG let $BALLOON=<HTML><BODY><u>underlined text</u><br><b>This is bold text</b></BODY></HTML> let $sel=HBA_PIG/My_Special_Residues new_selection 12-40, $sel add_annotation 3D_view=spheres, $sel color #ff0000, $sel add_annotation Style=BACKGROUND, $sel add_annotation Balloon=$BALLOON, $sel
Description	Residue selections can also be created with new_selection. To address positions of the underlying nucleotide sequence of an amino acid sequence, new_nucleotide_selection is used instead. Information can be attached with add_annotation or set_annotation. Compared to the GFF-command, this notion is much more verbose. Variables are beneficial, see let.
	Display alignment

8.6. Residues in proximity to a ligand

beforeAC	load PFAM:PF00941 new_selection AROUND=FAD ANGSTROM=5, /FAD # Uncomment the following line to use atom coordinates of homologs # project_coordinates AUTO, set_annotation Color=#3322FF, /FAD set_annotation Style=BACKGROUND, /FAD
Description	With the attribute "AROUND" for the command new_selection, residues in proximity to a ligand (Hetero or DNA/RNA) are selected. Allowed constructs are "AROUND=DNA", "AROUND=RNA" and "AROUND=NucleotideChainLetter"
	Display alignment

8.7. Solvent Accessibility

beforeAC	# load PFAM:PF00941 # load UNIPROT:P23639 # project_coordinates AUTO, * let $area=10 load PDB:1RYP_B\|example_1 load PDB:1RYP_B\|example_2 load PDB:1RYP_B\|example_3 load PDB:1RYP_B\|example_4 new_selection MIN_ACCESSIBILITY=$area, example_1/surface new_selection MIN_ACCESSIBILITY=$area SUBUNITS=ALL, example_2/surface new_selection MIN_ACCESSIBILITY=$area SUBUNITS=http://www.bioinformatics.org/strap/aa/sampleData/pdb1ryp.data.txt, example_3/surface new_selection MIN_ACCESSIBILITY=$area SUBUNITS="PDB:1ryp_A PDB:1ryp_C", example_4/surface set_annotation Color=#3322FF, /surface set_annotation Style=BACKGROUND, /surface
Description	With the attribute "MIN_ACCESSIBILITY=..." for the command new_selection, residues are highlighted that have a solvent accessible surface area greater than the given value in square Angstrom. Computation is performed with the program mkdssp by Kabsch and Sander which must be in the executable path. Only sequences with known 3D-structure (see project_coordinates) are considered. Alignment Annotator expects mkdssp in /usr/bin/ or bin/. example_1: Only amino acids are considered - hetero atoms and nucleotide acid are ignored. In multimeric protein (here proteasome), also the amino acids at the interfaces between the subunits are highlighted even though they are not solvent exposed in the multimer. To exclude those residues at the inter-subunit interfaces, the parameter SUBUNITS can hold a reference to other structure files to be considered during computation. example_2: For proteins loaded from the PDB, the attribute SUBUNITS=ALL denote the original structure containing all subunits. example_3: A file with all other subunits can be provided. example_4: The two neighbouring subunits are given as a PDB reference. Lists of space separated entries must be enclosed in double quotes. For structures from the PDB, subunits=ALL can be used.
	Display alignment

8.8. Residue selections from UniProt

beforeAC	aa_sequence AGYDRHITIFSPEGRLYQVEYAFKATNQTNINSLAVRGKDCTVVISQKKVPDKLLDPTTVSYIFCISRTIGMVVNGPIPDARNAALRAKAEAAEFRYKYGYDMPCDVLAKRMANLSQIYTQRAYMRPLGVILTFVSVDEELGPSIYKTDPAGYYVGYKATATGPKQQEITTNLENHFKKSKIDHINEESWEKVVEFAITHMIDALGTEFSKNDLEVGVATKDKFFTLSAENIEERLVAIAEQD, a1_SaccharomycesCerevisiae aa_sequence TSIMAVTFKDGVILGADSRTTTGAYIANRVTDKLTRVHDKIWCCRSGSAADTQAIADIVQYHLELYTSQYGTPSTETAASVFKELCYENKDNLTAGIIVAGYDDKNKGEVYTIPLGGSVHKLPYAIAGSGSTFIYGYCDKNFRENMSKEETVDFIKHSLSQAIKWDGSSGGVIRMVVLTAAGVERLIFYPDEYEQL, b1_SaccharomycesCerevisiae aa_sequence TTIVSVRRNGHVVIAGDGQATLGNTVMKGNVKKVRRLYNDKVIAGFAGGTADAFTLFELFERKLEMHQGHLVKAAVELAKDWRTDRMLRKLEALLAVADETASLIITGNGDVVQPENDLIAIGSGGPYAQAAARALLENTELSAREIAEKALDIAGDICIYTNHFHTIEELSYKAEFHHH, hs_EscherichiaColi let $BALLOON=Balloon=<HTML><BODY><OL> <LI>Thr<sub>1</sub></LI> <LI>Lys<sub>33</sub></LI> <LI>Ser<sub>129</sub></LI></OL> </BODY></HTML> GFF b1_SaccharomycesCerevisiae\|\|Active_site\|1,33,129\|\|\|\|\|$BALLOON GFF hs_EscherichiaColi\|\|Active_site\|1,33,124\|\|\|\|\|$BALLOON find_uniprot_id * uniprot_features *
Description	The UniProt ID is obtained by sequence search. Alternatively, it can be set explicitly with add_xref or accession_id. The command uniprot_features highlights all sequence features stored in the UniProt. Since the data is available on the Alignment Annotator server, UniProt features are loaded instantly. _{BioDAS annotations used to be loaded with the command DAS_features. Unfortunately, most BioDAS servers and the registry is not available any more. Additional BioDAS registries can be added by the administrator.}
	Display alignment

9. Display

9.1. Residue color

beforeAC	load PFAM:PF00097 set_color_mode chemical set_conservation_threshold -95
Description	The color mode is specified with the command set_color_mode. A positive percentage value for set_conservation_threshold highlights conserved positions and a negative value divers positions.
	Display alignment

9.2. Residue background color

beforeAC	load PDB:1SBC_A!10- set_residue_index_offset 3, 1SBC_A bg_color 15:,18:,20:-22:=#FF0000 100-110,112=#00FF00, 1SBC_A.*
Description	The command bg_color specifies the background color of individual residues. PDB residue numbers can be refered to by appending a colon to the position number.
	Display alignment

9.3. Alignment title

beforeAC	load UNIPROT:HBA_PIG load UNIPROT:HBA1_XENLA title Frog and pig
Description	The command title sets the document title. The window title "Frog and pig" can only by observed if the alignment view is opened in a tab or window of its own. In most browsers you can open the link either with Ctrl-left-click or right-click.
	Display alignment

9.4. Characters per line

beforeAC	load UNIPROT:HBA_PIG load UNIPROT:HBA1_XENLA set_characters_per_line 20
Description	With the command set_characters_per_line the number of characters (gaps plus residues) per line can be specified.
	Display alignment

10. 3D-Visualization

Currently, 3D-visualization is based on Java, but a JavaScript based 3D-visualization (probably JSmol) and a desktop application will be included soon. The type of 3D viewer does not affect the scripting language which is independent on the specific implementation.

First install Java. Use a web browser that still supports Java applets: Firefox, Iceweasel, Opera and IE. On the other hand, MS-Edge, Chromium and Chrome do not support Java applets.

The 3D views are not shown automatically, when the alignment document is displayed in the browser because there might be several 3D-views. Each is represented by a button. 3D views are displayed by pushing the respective button. There are two different locations for these buttons:

The context panels (Right-click) of the sequence names in the alignment panel contain buttons to open 3D views. For any sequence with associated 3D structure there will be at least the default 3D view. In addition, 3D views are listed which have been created with open_3D.
Activating "Tab" in the segmented control and then clicking 3D brings up a panel with buttons for all available 3D-views.

3D-views are created with the command open_3D which takes the unique ID and a list of loaded proteins or structures which are not part of the sequence alignment. The later are given as file paths, URLs or database reference. To be recognized as file paths, file paths must start with slash, dot-slash or dot-dot-slash. A 3D-view can be referred to by its ID select_3D and one ore several of the loaded proteins or structures. Proteins are best specified by their sequence name and pdb files must be referred to by exactly the same file path, URL or database reference used in the open_3D-command. Once selected with select_3D, 3D script commands can be applied. These commands start with "3D_". Usually, the next command is 3D_select

10.1. Superimposing protein structures

beforeAC	load PDB:1RYP_{B C D} # What do the curly braces mean? See bash brace expansion color #FF0000, 1RYP_B color #00FF00, 1RYP_C color #3333FF, 1RYP_D superimpose 1RYP_{B C D} open_3D MyView, 1RYP_{B C D} 3D_render align *
Description	The command superimpose superimposes some protein structures. Sequences without attached 3D-coordinates are ignored. The program determines the optimal reference structure. All structures are superimposed upon the reference structures. Start the 3D-applet from the context-menu of the sequence name (Right-click). Alternatively, go to menu "3D" and open the 3D-view "View_of_three_chains".
	Display alignment

10.2. Style of single atoms

beforeAC	load PDB:1RYP_B color #FF0000, 1RYP_B open_3D My_3D_View, 1RYP_B 3D_render 3D_select 100-110.CA 3D_color #FF00ff 3D_spheres 3D_select 100-110.CB 3D_color #FF0000 3D_spheres 3D_label Hello
Description	Atoms are selected with the command 3D_select. The following style commands are available:3D_cartoon 3D_dots 3D_label 3D_lines 3D_mesh 3D_sa_surface 3D_spheres 3D_sticks 3D_surface 3D_color 3D_ribbons Further 3D commands are: 3D_render 3D_center 3D_center_amino 3D_object_delete 3D_rotate 3D_script_panel 3D_select 3D_selection_name 3D_zoom
	Display alignment

10.3. Attaching 3D styles to residue selections

beforeAC	load PDB:1RYP_B color #FF0000, 1RYP_B let $SEL=1RYP_B/My_Special_Residues new_selection 100:-110:.CA.CB, $SEL add_annotation 3D_view=3D_spheres, $SEL color #ff0000, $SEL # Or type set_annotation Color=#ff0000, $SEL add_annotation 3D_view=3D_color #FFffFF, $SEL add_annotation Atoms=.CB, $SEL add_annotation 3D_view=3D_color #FF00FF, $SEL add_annotation Atoms=.CA, $SEL add_annotation 3D_view=3D_color #FF0077, $SEL
Description	Another method for changing 3D-styles is to create a residue selection and to attach annotations of the type "3D_view.
	Display alignment

10.4. Residue annotations from UniProt

beforeAC	load PDB:1SBC_A UNIPROT:P00780 # Delete all residue annotations from the PDB file delete 1SBC_A/* # Infer coordinates of the PDB structures onto the UniProt sequence. project_coordinates PDB:1SBC_A, P00780 # Load Annotations from Uniprot. uniprot_features * # Going to attach styles to annotations that have the name "Active_site" ... let $SEL=*/Active_site # All atoms white. #add_annotation 3D_color=#FF00FF, $SEL add_annotation Atoms=.CB, $SEL # C-beta atoms red and sphere. add_annotation 3D_view=3D_color #FF0000, $SEL add_annotation 3D_view=3D_spheres, $SEL # C-alpha atoms blue and sphere. add_annotation Atoms=.CA, $SEL add_annotation 3D_view=3D_color #0000FF, $SEL add_annotation 3D_view=3D_spheres, $SEL
Description	With the command add_annotation, 3D-styles are attached to residue annotations with the Name "Active_site" loaded from Uniprot. All entries of type "3D_view" are evaluated one after the other. Initially all atoms of the amino acids are considered. The current set of atoms is altered with a command like "add_annotation Atoms=.CA, residue-selection". This example exhibits are a rare problem: Why are the Uniprot annotations not shown on PDB:1SBC_A? Because the PDB sequence differes from the UniProt as indicated by entries in the PDB file like SEQADV 1SBC SER A 103 UNP P00780 THR 207 CONFLICT Perhaps in later releases of Alignment Annotator, mismatches indicated in this way may be tolerated and the UniProt annotations loaded.
	Display alignment

10.5. Multimeric proteins

beforeAC	load PDB:1RYP_B open_3D With_Chains_A_B_C, PDB:1RYP_B PDB:1RYP_A PDB:1RYP_C select_3D With_Chains_A_B_C, PDB:1RYP_B 3D_render select_3D With_Chains_A_B_C, PDB:1RYP_A PDB:1RYP_C 3D_select $ALL 3D_lines off 3D_select .CA.C.N 3D_lines on 3D_color #FFff00
Description	Users might want to answer the question whether a specific amino acid is in proximity to another subunit of a multimeric protein. The 3D view can contain molecules or subunits that are not in the sequence alignment. For this purpose the command open_3D does not only take references to sequences in the alignment but also: PDB reference as demonstrated in this case Paths of structure files. They must start with slash, dot-slash or dot-dot-slash. URLs In the current example PDB:1RYP_B is part of the alignment while PDB:1RYP_A PDB:1RYP_C are references to PDB-chains. Nevertheless, also the molecules that are not part of the alignment can be displayed three-dimensionally and the display style of single residues or atoms can be altered.
	Display alignment

10.6. DNA

beforeAC	load PDB:1gd2_E open_3D View_with_all_subunits, PDB:1gd2_E PDB:1gd2_F PDB:1gd2_G PDB:1gd2_H PDB:1gd2_I PDB:1gd2_J PDB:1gd2_A PDB:1gd2_B PDB:1gd2_C PDB:1gd2_D 3D_render
Description	This is a leucine zipper transcription factor. Chains E to J are peptide chains. Chains A, B, C and D contain DNA-chains. Limitation: currently it is not possible to select specific nucleotides.
	Display alignment

11. Structure alignment

11.1. Sequence or structure based alignment

beforeAC	aa_sequence AGYDRHITIFSPEGRLYQVEYAFKATNQTNINSLAVRGKDCTVVISQKKVPDKLLDPTTVSYIFCISRTIGMVVNGPIPDARNAALRAKAEAAEFRYKYGYDMPCDVLAKRMANLSQIYTQRAYMRPLGVILTFVSVDEELGPSIYKTDPAGYYVGYKATATGPKQQEITTNLENHFKKSKIDHINEESWEKVVEFAITHMIDALGTEFSKNDLEVGVATKDKFFTLSAENIEERLVAIAEQD, a1_SaccharomycesCerevisiae aa_sequence TSIMAVTFKDGVILGADSRTTTGAYIANRVTDKLTRVHDKIWCCRSGSAADTQAIADIVQYHLELYTSQYGTPSTETAASVFKELCYENKDNLTAGIIVAGYDDKNKGEVYTIPLGGSVHKLPYAIAGSGSTFIYGYCDKNFRENMSKEETVDFIKHSLSQAIKWDGSSGGVIRMVVLTAAGVERLIFYPDEYEQL, b1_SaccharomycesCerevisiae aa_sequence TTIVSVRRNGHVVIAGDGQATLGNTVMKGNVKKVRRLYNDKVIAGFAGGTADAFTLFELFERKLEMHQGHLVKAAVELAKDWRTDRMLRKLEALLAVADETASLIITGNGDVVQPENDLIAIGSGGPYAQAAARALLENTELSAREIAEKALDIAGDICIYTNHFHTIEELSYKAEFHHH, hs_EscherichiaColi let $BALLOON=Balloon=<HTML><BODY><OL> <LI>Thr<sub>1</sub></LI> <LI>Lys<sub>33</sub></LI> <LI>Ser<sub>129</sub></LI></OL> </BODY></HTML> GFF b1_SaccharomycesCerevisiae\|\|Active_site\|1,33,129\|\|\|\|\|$BALLOON GFF hs_EscherichiaColi\|\|Active_site\|1,33,124\|\|\|\|\|$BALLOON project_coordinates AUTO, *
afterAC	title ClustalW
Description	Here, the command project_coordinates is run before alignment computation. Therefore the 3D coordinates of Cα atoms are used for alignment computation. For comparison, move the command line "project_coordinates..." to the second script text box and observe the alignment of the active site residue Ser₁₂₉. Since the sequence similarity of these remote homologs is low, the alignment quality obtained by ClustalW is poor. This can be seen by the active site trias which is aligned only if the 3D structure is used. Another indicator are the secondary structure elements which are displayed with a check-box in the tool-bar. Since insertions and deletions hardly occur in helices and beta sheets, they should be almost devoid of gaps.
	Display alignment

12. Abnormal computation

12.1. Abnormal program termination

beforeAC	die aa_sequence ACDFGHI, seq
Description	There are two reasons why computation can be terminated prematurely: Another job has been queued while computation time exceeded a maximum. In this case the current job gets killed to allow the new job to be executed. Technical problems, programming errors, server failure. This example simulates abnormal program termination due to technical problems or programming errors in Alignment Annotator. The user can modify the program parameter and script lines in the hope that the error does not occur on re-submission. In this example, you can open the the section Change - Script and remote the command "die" from the script. Then re-start computation to obtain the alignment. For administrators: Ctrl-left-click into the parent page of the alignment opens the debug panel.
	Display alignment

12.2. Very long computation

beforeAC	aa_sequence acdefghiklmn, seq sleep 99000
Description	Under certain conditions (here a sleep command), Alignment Annotator may run for a very long time. Besides technical server problems, possible reasons are Time consuming alignment computation and 3D-superposition Server is very busy Large data is loaded from remote computers Remote computers neither send a result nor an HTTP error code or answer with delay. In this case the user can stop the computation and modify the script before resubmitting the job. For testing, edit the section Scripts of the alignment view and remove the sleep command. After clicking submit the alignment appears.
	Display alignment

12.3. Exceptions

beforeAC	aa_sequence abcedfghklmn, seq simulate_exception
Description	In case of programming errors, so-called exceptions such as NullPointerException or IndexOutOfBoundsException occur and the program may not be able to produce a result. Adding the parameter "true" to the command, simulates an exception that is caught.
	Display alignment