Home

Forming Web-links for protein files and alignments

Introduction

This page describes how Web-links for the protein and alignment viewer STRAP are formed using the web variables load or align. Clicking these Web-links in a Web-browser opens amino acid sequences, protein structures or multiple sequence files in STRAP.

Client side Requirements: Java is needed on the client PC. Server side Requirements: There are no particular requirements for the Web-server, as the links are static links.

Application: This technique provides an efficient way to present scientific results. Amino acid sequences, protein alignments and protein structures which are relevant for specific research projects can be exhibited on a Web page. Certain residues such as mutations and site directed mutagenesis sites can be highlighted in the alignment and in 3D. These Web-links can also be included in computer programs for Systems Biology, office documents and PDF-documents to efficiently document and communicate alignments and 3D-structures. Scientists can send Emails to colleagues and collaborators containing these Web-links. The recipient can open the same alignment with all residue highlightings. Finally, these Web links privide a way by which Bioinformatics databases can use STRAP as a viewer for proteins and alignments.

Advantages

In Web pages, alignments are often represented as static documents (.html, .pdf, .doc, .rtf) or shown dynamically with Browser embedded Java applets. All information needed to display the alignment resides on the server. Other servers are not involved when the alignment is displayed on the client computer.

Here, a different approach is suggested: Encoding only the protein references in a Web to allow loading of up-to-date versions of the protein files from the original protein databases. This has the following advantages:

Limitations

Occasionally, Java programs do not start for various reasons.

A Web-address must not exceed a critical size. There is a workaround for this problem: Large information can be included in so-called forms

The technique described on this page is suitable only for simply loading and aligning proteins. For complex tasks use Using script commands.

Automated generation of the Web-address

STRAP assists the generation of the Web link which if clicked loads the sequences of the current project. With the STRAP dialog Publish alignment in Web pages in the menu "File" users can automatically generate the Web-address (URL).

Web-variables in the URL

The variables load= or align= or alignAndRearange= contain one or several protein references. A protein reference may be an URL or an Database-ID followed by colon and an entry ID. Unless loaded with the variable "load=" proteins will be aligned after loading. For this purpose the 3D-superposition program TM-align and the sequence alignment program ClustalW are combined. With "alignAndRearange=" the proteins are reordered according to sequence similarity.

Additional web variables provide further options:

Additional information for the protein entries

The database reference or URL of the protein can optionally be followed fields separated by "|" (vertical bar). Note: The percent encoding of "|" is %7C. Strictly speaking, "|" should be written as %7C in URLs. But apparently Web browsers tolerate if "|" is not properly encoded.

Uniprot Examples

Complex Uniprot Expressions

GenomeNet (Kegg) Examples

Entrez Examples

EMBL or Genbank nucleotide example

EMBL and Genbank files have a nucleotide sequence block. Coding sequences (CDS) are defined by an enumeration of nucleotide positions of the form
FT   CDS             join(25240..25717,29079..29174,31348..31417,39382..39809,
or in case of reverse complement
FT   CDS             complement(5226515..5227132)
This expression is used to compute the amino acid sequence. The following examplifies how this expression can be changed or how the n-th CDS can be selected.

Ensembl (under reconstruction)

PDB-Examples

Proteins with nucleic acid:

Setting the biological unit:

The matrices which are applied to the can be specified in the 5th field in form of a bit-mask given as a hexadecimal number. For example 8 means the 4th matrix as the binary representation of 0x8 is 00000001000. Minus 1 denotes the asymmetric unit.
-1     all matrices     1(wrong, not existing)     2     3     4     8     10     20     40     10000(wrong, not existing)    

Hetero-Compounds, DNA, RNA:

PDB files often contain non-peptide structures such as flavine or NADH and DNA/RNA structures which are treated in the following way: Those hetero compounds that share the chain identifier together with a peptide are added to the respective peptide object. This will be indicated by a vertical green (nucleotide) and red (heteros) bar of the protein labels in the alignment row headers. But if the hetero compound has a chain of its own, then things are more complicated:

SCOP- Examples

PFAM Examples

Prodom Examples


Example with direct Web address

Instead of refering to a protein by database-colon-accession-ID, a crude Web address of the Protein file can be used. Special characters of the URL like the two slashes in "http://" must be percent encoded.

Technical details

The client computer needs Java version 1.5 or higher. The links in this document point to a jnlp file. The jnlp file must be opened by the browser with the program bin/javaws which is part of the Java system. Occasionally, browsers fail to locate this program. In these cases the user needs to find the location of javaws on the hard-disk. See Browser settings.

External applications

Sequista cannot be used in Strap-Lite version yet.

What happens in case of download errors:

Occasional, protein entries are removed from databases and are not available any more. What happens if STRAP tries to download a non-existing file or when the server is not responding? The result depends on the server response. Here some examples of non-existing entries:

Time consuming alignments:

Frame size and location

The location of the application frame is specified with the option geometry=width x heigth + offsetX + offsetY following the .
  1. geometry=400x300+100+100 This means width=400 height=300, Position at pixel 100,100.
  2. geometry=400x300+200+100
  3. geometry=500x300+100+100
  4. geometry=500x300-100+100 Negative offsetX refers to the right screen margin.

Related resources: