Home
Forming Web-links for protein files and alignments
Introduction
This page describes how Web-links for the protein and alignment
viewer STRAP are formed using the web variables
load
or
align.
Clicking these Web-links in a Web-browser opens amino acid sequences, protein structures
or multiple sequence files in STRAP.
Client side
Requirements: Java
is needed on the client PC.
Server side Requirements:
There are no particular requirements for the Web-server, as the
links are static links.
Application:
This technique provides an efficient way to present scientific results.
Amino acid sequences, protein alignments and protein structures
which are relevant for specific research projects can be
exhibited on a Web page.
Certain residues such as mutations and site directed mutagenesis sites can be highlighted in the alignment and in 3D.
These Web-links can also be included in computer programs for Systems Biology, office
documents and PDF-documents to efficiently document and communicate alignments and 3D-structures.
Scientists can send Emails to colleagues and collaborators containing these Web-links.
The recipient can open the same alignment with all residue highlightings.
Finally, these Web links privide a way by which Bioinformatics databases can use
STRAP as a viewer for proteins and alignments.
Advantages
In Web pages, alignments are often represented as static
documents (.html, .pdf, .doc, .rtf) or shown dynamically with
Browser embedded Java applets. All information needed to display the
alignment resides on the server. Other servers are not involved when the alignment is displayed on the client computer.
Here, a different approach is suggested: Encoding only the protein references in a Web to allow loading of up-to-date versions of the
protein files from the original protein databases. This has the following advantages:
- The sequences are downloaded from original databases in
their current version with the most recently added
sequences features and X-references.
- Sequence feature files are freshly downloaded from
BioDAS-servers and Expasy before
the features are highlighted on the sequences and
3D-structures. Teams of curators are permanently improving
sequence features, nevertheless, the information displayed on
the client is up-to-date.
- These protein
files can then be conveniently transported with the mouse to
any location on the file system or to other computer
applications using STRAP's Drag-and-Drop facility.
- The Web link is a very condensed representation because the Web-address contains
only the references of the protein entries in the public file repositories, but not the sequences or
3D-coordinates itself. Usually, the alignment gaps do not need to be included because
the alignment can be re-computed on the client from the 3D-coordinates and amino acid sequences.
- In case of Wiki-articles or Emails, the user does not need to attach files. The information in the URL is sufficient.
- The program checks whether a related 3D-structure is already
known using precomputed Blast-results.
Limitations
Occasionally, Java programs do
not start for various reasons.
A Web-address must not exceed a critical size.
There is a workaround for this problem: Large
information can be included in
so-called forms
The technique described on this page is suitable only for simply loading and aligning proteins.
For complex tasks use
Using script commands.
Automated generation of the Web-address
STRAP assists the generation of the Web link which if clicked loads the sequences of the current project.
With the STRAP dialog
Publish alignment in Web pages
in the menu "File" users can automatically
generate the Web-address (URL).
Web-variables in the URL
The variables
load= or
align= or
alignAndRearange= contain one or several protein
references. A protein reference may be an URL or an Database-ID followed by colon and an entry ID.
Unless loaded with the variable "load="
proteins will be aligned after loading. For this purpose the 3D-superposition program TM-align
and the sequence alignment program ClustalW are combined. With
"alignAndRearange=" the proteins are reordered
according to sequence similarity.
Additional web variables provide further options:
- rename=sp Renames the protein using the
Swissprot mnemonic name found in the protein file.
This swissprot name like "PSA2_CARAU" contains of:
- Protein designation. Here "PSA2".
- Underscore
- Five letters organism name. Here "CARAU" for Carassius Auratus.
Example
- no3D=hexadecimal_number. The argument is a
hexadecimal number that acts as
a bit mask for those proteins that
should not be displayed in 3D.
Example with no3D=4.
The binary representation
of 0x4 is 000000100. The third digit is "1" and therefore the third
peptide PDB:1ryp_C is not shown in 3D. You see only two superimposed backbones.
Same without "no3D=".
- noSP=hexadecimal_number. The hexadecimal number is a bit mask
for the proteins that should not be superposed
three-dimensionally. They will be shown in their original
coordinates defined in the PDB file.
Example and
Same example without
- dasFeatures=DAS title1|DAS title2 Obtain
DAS features. A vertical-bar separated list of titles from the DAS-registry. Examples:
DAS-uniprot and
DAS-CSA - extended
and Cosmic Protein Mutations on P51587 Cosmic+Protein+Mutations. Also see Spice.
- separateInstance=t Example 1prn
The proteins are shown in a new Window.
- script=script_lines. See Using script commands
Additional information for the protein entries
The database reference or URL of the protein can optionally be
followed fields separated by "|" (vertical bar).
- 1st Field: Database reference such as "UNIPROT:hslv_ecoli" or "PDB:1ryp_A" or a crude Web address for a protein file. This field is mandatory, the others are optional.
- 2nd Field: Protein name. Example "ExampleName" Example "otherName".
If the protein link refers to more than one chain, then the respective chain identifiers are appended to the name: Example "ExampleName" with chains and Example "otherName" with chains.
- 3rd Field: URL of protein icon.
Example
- 4th Field: Underlined residues. This field can contain
several subfields, each preceded by a Web color like
"#FF00FF". It can contain the following 3D-renderings: "sticks",
"dots", "spheres", "ribbon"
Example1 (red and yellow) and Example 2 (green)
- 5th Field: The coding sequence CDS expression in EMBL or Genbank style. See Embl examples
- 6th Field: The matrices that are applied to the asymmetrical units for displaying the 3D-molecule. See Biological unit
Note: The
percent encoding of
"|" is %7C. Strictly speaking, "|" should be written as %7C in URLs. But apparently Web browsers tolerate if "|" is not properly encoded.
Uniprot Examples
Complex Uniprot Expressions
GenomeNet (Kegg) Examples
Entrez Examples
EMBL or Genbank nucleotide example
EMBL and Genbank files have a nucleotide sequence block.
Coding sequences (CDS) are defined by an enumeration of nucleotide positions of the form
FT CDS join(25240..25717,29079..29174,31348..31417,39382..39809,
or in case of reverse complement
FT CDS complement(5226515..5227132)
This expression is used to compute the amino acid sequence. The
following examplifies how this expression can be changed or how the n-th
CDS can be selected.
- M57965Myosin from EMBL
- M57965 Myosin from Genbank
- M57965 Myosin. Overriding CDS: "join(20..30,40,50"
- M57965 Myosin.Overriding CDS: "complement(20..30,40,50)"
- M57965 Selecting CDS No 1
Ensembl (under reconstruction)
PDB-Examples
Proteins with nucleic acid:
- 1gd2 Example of protein with nucleic acid. Leucine Zipper.
- 1l4p
- 1al2 Large virus
- 1al0 Large virus
- 2wbs Zink finger
- 1q82 Ribosome. Example for huge protein with many different chains and nucleic acid
Setting the biological unit:
The matrices which are applied to the
can be specified in the 5th field in form of a bit-mask given as a hexadecimal number. For
example 8 means the 4th matrix as the binary representation of 0x8 is 00000001000. Minus 1 denotes the
asymmetric unit.
-1
all matrices
1(wrong, not existing)
2
3
4
8
10
20
40
10000(wrong, not existing)
Hetero-Compounds, DNA, RNA:
PDB files often contain non-peptide structures such as flavine or NADH and DNA/RNA structures which are treated in the following way:
Those hetero compounds that share the chain identifier together with a peptide are added
to the respective peptide object.
This will be indicated by a vertical green (nucleotide) and red (heteros) bar of the protein labels in the alignment row headers.
But if the hetero compound has a chain of its own, then things are more complicated:
SCOP- Examples
PFAM Examples
Prodom Examples
- PD000033 Medium sized Prodom file
- PD000006
Large Prodom file with 12489 sequences.
Example with direct Web address
Instead of refering to a protein by database-colon-accession-ID, a crude Web address of the
Protein file can be used.
Special characters of the URL like the two slashes in "http://" must be percent encoded.
Technical details
The client computer needs
Java version 1.5 or higher.
The links in this document point to a jnlp file. The jnlp file must be opened by the browser with the
program bin/javaws which is part of the Java system.
Occasionally, browsers fail to locate this program. In these
cases the user needs to find the location of javaws on the
hard-disk. See
Browser settings.
External applications
Sequista cannot be used in Strap-Lite version yet.
- Jalview Displays the alignment of the specified proteins in Jalview.
- Seqvista: Displays M57965 in Seqvista.
- Spice: Displays hslv_ecoli in Spice.
What happens in case of download errors:
Occasional, protein entries are removed from databases and are not available any more.
What happens if STRAP tries to download a non-existing file or when the server is not responding?
The result depends on the server response.
- Some servers return an error message which is then interpreted as a protein sequence by STRAP.
- A few servers return an http error. STRAP will skip these entries.
- It may happen that the server does neither return a message not an http error code.
It just blocks.
This is the worst case because all download from this server are blocked during the current STRAP session.
The user will need to restart STRAP.
Here some examples of non-existing entries:
Time consuming alignments:
Frame size and location
The location of the application frame is specified with the option geometry=width x heigth + offsetX + offsetY
following the
.
-
geometry=400x300+100+100 This means width=400 height=300, Position at pixel 100,100.
-
geometry=400x300+200+100
-
geometry=500x300+100+100
-
geometry=500x300-100+100 Negative offsetX refers to the right screen margin.
Related resources:
- Jmol is a protein
3D-view which can be integrated in Web pages and controlled by buttons in the Web-page.
- Jalview,
and
are alignment
applets for Web pages.