Forming Web-links for protein files and alignments
This page describes how Web-links for the protein and alignment
viewer STRAP are formed using the web variables load
Clicking these Web-links in a web browser opens amino acid sequences, protein structures
or multiple sequence files in STRAP.
The web variables load=
contain one or several protein
references. A protein reference may be an URL or an Database-ID followed by colon and an entry ID.
Unless loaded with the variable "load="
proteins will be aligned after loading. For this purpose the 3D-superposition program TM-align
and the sequence alignment program ClustalW are combined. With
"alignAndRearange=" the proteins are reordered
according to sequence similarity.
Additional web variables provide further options:
- rename=sp Renames the protein using the
Swissprot mnemonic name found in the protein file.
This swissprot name like "PSA2_CARAU" contains of:
- Protein designation. Here "PSA2".
- Five letters organism name. Here "CARAU" for Carassius Auratus.
- no3D=hexadecimal_number. The argument is a
hexadecimal number that acts as
a bit mask for those proteins that
should not be displayed in 3D.
Example with no3D=4.
The binary representation
of 0x4 is 000000100. The third digit is "1" and therefore the third
peptide PDB:1ryp_C is not shown in 3D. You see only two superimposed backbones.
Same without "no3D=".
- noSP=hexadecimal_number. The hexadecimal number is a bit mask
for the proteins that should not be superposed
three-dimensionally. They will be shown in their original
coordinates defined in the PDB file.
Same example without
- dasFeatures=DAS title1|DAS title2 Obtain
DAS features. A vertical-bar separated list of titles from the DAS-registry. Examples:
DAS-CSA - extended
and Cosmic Protein Mutations on P51587 Cosmic+Protein+Mutations. Also see Spice.
- separateInstance=t Example 1prn
The proteins are shown in a new Window.
- script=script_lines. See Using script commands
Additional information for the protein entries
The database reference or URL of the protein can optionally be
followed fields separated by "|" (vertical bar).
- 1st Field: Database reference such as "UNIPROT:hslv_ecoli" or "PDB:1ryp_A" or a crude Web address for a protein file. This field is mandatory, the others are optional.
- 2nd Field: Protein name. Example "ExampleName" Example "otherName".
If the protein link refers to more than one chain, then the respective chain identifiers are appended to the name: Example "ExampleName" with chains and Example "otherName" with chains.
- 3rd Field: URL of protein icon.
- 4th Field: Underlined residues. This field can contain
several subfields, each preceded by a Web color like
"#FF00FF". It can contain the following 3D-renderings: "sticks",
"dots", "spheres", "ribbon"
Example1 (red and yellow) and Example 2 (green)
- 5th Field: The coding sequence CDS expression in EMBL or Genbank style. See Embl examples
- 6th Field: The matrices that are applied to the asymmetrical units for displaying the 3D-molecule. See Biological unit
percent encoding of
"|" is %7C. Strictly speaking, "|" should be written as %7C in URLs. But apparently web browsers tolerate if "|" is not properly encoded.
Complex Uniprot Expressions
GenomeNet (Kegg) Examples
EMBL or Genbank nucleotide example
EMBL and Genbank files have a nucleotide sequence block.
Coding sequences (CDS) are defined by an enumeration of nucleotide positions of the form
FT CDS join(25240..25717,29079..29174,31348..31417,39382..39809,
or in case of reverse complement
FT CDS complement(5226515..5227132)
This expression is used to compute the amino acid sequence. The
following examplifies how this expression can be changed or how the n-th
CDS can be selected.
- M57965Myosin from EMBL
- M57965 Myosin from Genbank
- M57965 Myosin. Overriding CDS: "join(20..30,40,50"
- M57965 Myosin.Overriding CDS: "complement(20..30,40,50)"
- M57965 Selecting CDS No 1
Ensembl (under reconstruction)
Proteins with nucleic acid:
- 1gd2 Example of protein with nucleic acid. Leucine Zipper.
- 1al2 Large virus
- 1al0 Large virus
- 2wbs Zink finger
- 1q82 Ribosome. Example for huge protein with many different chains and nucleic acid
Setting the biological unit:
The matrices which are applied to the
can be specified in the 5th field in form of a bit-mask given as a hexadecimal number. For
example 8 means the 4th matrix as the binary representation of 0x8 is 00000001000. Minus 1 denotes the
1(wrong, not existing)
10000(wrong, not existing)
Hetero-Compounds, DNA, RNA:
PDB files often contain non-peptide structures such as flavine or NADH and DNA/RNA structures which are treated in the following way:
Those hetero compounds that share the chain identifier together with a peptide are added
to the respective peptide object.
This will be indicated by a vertical green (nucleotide) and red (heteros) bar of the protein labels in the alignment row headers.
But if the hetero compound has a chain of its own, then things are more complicated:
- PD000033 Medium sized Prodom file
Large Prodom file with 12489 sequences.
Example with direct Web address
Instead of referring to a protein by database-colon-accession-ID, a crude Web address of the
Protein file can be used.
Special characters of the URL like the two slashes in "http://" must be percent encoded.
The client computer needs Java
version 1.5 or higher.
The links in this document point to a jnlp file. The jnlp file must be opened by the browser with the
program bin/javaws which is part of the Java system.
Occasionally, browsers fail to locate this program. In these
cases the user needs to find the location of javaws on the
hard-disk. See Browser settings
- Jalview Displays the alignment of the specified proteins in Jalview.
- Spice: Displays hslv_ecoli in Spice.
Failure to download:
Occasional, protein entries are removed from databases and are not available any more.
What happens if STRAP tries to download a non-existing file or when the server is not responding?
The result depends on the server response.
- Some servers return an error message which is then interpreted as a protein sequence by STRAP.
- A few servers return an http error. STRAP will skip these entries.
- It may happen that the server does neither return a message not an http error code.
It just blocks.
This is the worst case because all download from this server are blocked during the current STRAP session.
The user will need to restart STRAP.
Here some examples of non-existing entries:
Time consuming alignments:
Frame size and location
The location of the application frame is specified with the option geometry=width x heigth + offsetX + offsetY
geometry=400x300+100+100 This means width=400 height=300, Position at pixel 100,100.
geometry=500x300-100+100 Negative offsetX refers to the right screen margin.