Home

Forming Web-links for protein files and alignments

Introduction

This page describes how Web-links for the protein and alignment viewer STRAP are formed using the web variables load or align. Clicking these Web-links in a Web-browser opens amino acid sequences, protein structures or multiple sequence files in STRAP.

Client side Requirements: Java is needed on the client PC. Server side Requirements: There are no particular requirements for the Web-server, as the links are static links.

Application: This technique provides an efficient way to present scientific results. Amino acid sequences, protein alignments and protein structures which are relevant for specific research projects can be exhibited on a Web page. Certain residues such as mutations and site directed mutagenesis sites can be highlighted in the alignment and in 3D. These Web-links can also be included in computer programs for Systems Biology, office documents and PDF-documents to efficiently document and communicate alignments and 3D-structures. Scientists can send Emails to colleagues and collaborators containing these Web-links. The recipient can open the same alignment with all residue highlightings. Finally, these Web links privide a way by which Bioinformatics databases can use STRAP as a viewer for proteins and alignments.

Advantages

In Web pages, alignments are often represented as static documents (.html, .pdf, .doc, .rtf) or shown dynamically with Browser embedded Java applets. All information needed to display the alignment resides on the server. Other servers are not involved when the alignment is displayed on the client computer.

Here, a different approach is suggested: Encoding only the protein references in a Web to allow loading of up-to-date versions of the protein files from the original protein databases. This has the following advantages:

The sequences are downloaded from original databases in their current version with the most recently added sequences features and X-references.
Sequence feature files are freshly downloaded from BioDAS-servers and Expasy before the features are highlighted on the sequences and 3D-structures. Teams of curators are permanently improving sequence features, nevertheless, the information displayed on the client is up-to-date.
These protein files can then be conveniently transported with the mouse to any location on the file system or to other computer applications using STRAP's Drag-and-Drop facility.
The Web link is a very condensed representation because the Web-address contains only the references of the protein entries in the public file repositories, but not the sequences or 3D-coordinates itself. Usually, the alignment gaps do not need to be included because the alignment can be re-computed on the client from the 3D-coordinates and amino acid sequences.
In case of Wiki-articles or Emails, the user does not need to attach files. The information in the URL is sufficient.
The program checks whether a related 3D-structure is already known using precomputed Blast-results.

Limitations

Occasionally, Java programs do not start for various reasons.

A Web-address must not exceed a critical size. There is a workaround for this problem: Large information can be included in so-called forms

The technique described on this page is suitable only for simply loading and aligning proteins. For complex tasks use Using script commands.

Automated generation of the Web-address

STRAP assists the generation of the Web link which if clicked loads the sequences of the current project. With the STRAP dialog Publish alignment in Web pages in the menu "File" users can automatically generate the Web-address (URL).

Web-variables in the URL

The variables load= or align= or alignAndRearange= contain one or several protein references. A protein reference may be an URL or an Database-ID followed by colon and an entry ID.

Unless loaded with the variable "load=" proteins will be aligned after loading. For this purpose the 3D-superposition program TM-align and the sequence alignment program ClustalW are combined. With "alignAndRearange=" the proteins are reordered according to sequence similarity.

Additional web variables provide further options:

rename=sp Renames the protein using the Swissprot mnemonic name found in the protein file. This swissprot name like "PSA2_CARAU" contains of:
1. Protein designation. Here "PSA2".
2. Underscore
3. Five letters organism name. Here "CARAU" for Carassius Auratus.
Example
no3D=hexadecimal_number. The argument is a hexadecimal number that acts as a bit mask for those proteins that should not be displayed in 3D. Example with no3D=4. The binary representation of 0x4 is 000000100. The third digit is "1" and therefore the third peptide PDB:1ryp_C is not shown in 3D. You see only two superimposed backbones. Same without "no3D=".
noSP=hexadecimal_number. The hexadecimal number is a bit mask for the proteins that should not be superposed three-dimensionally. They will be shown in their original coordinates defined in the PDB file. Example and Same example without
dasFeatures=DAS title1|DAS title2 Obtain DAS features. A vertical-bar separated list of titles from the DAS-registry. Examples: DAS-uniprot and DAS-CSA - extended and Cosmic Protein Mutations on P51587 Cosmic+Protein+Mutations. Also see Spice.
separateInstance=t Example 1prn The proteins are shown in a new Window.
script=script_lines. See Using script commands

Additional information for the protein entries

The database reference or URL of the protein can optionally be followed fields separated by "|" (vertical bar).

1st Field: Database reference such as "UNIPROT:hslv_ecoli" or "PDB:1ryp_A" or a crude Web address for a protein file. This field is mandatory, the others are optional.
2nd Field: Protein name. Example "ExampleName" Example "otherName". If the protein link refers to more than one chain, then the respective chain identifiers are appended to the name: Example "ExampleName" with chains and Example "otherName" with chains.
3rd Field: URL of protein icon. Example
4th Field: Underlined residues. This field can contain several subfields, each preceded by a Web color like "#FF00FF". It can contain the following 3D-renderings: "sticks", "dots", "spheres", "ribbon" Example1 (red and yellow) and Example 2 (green)
5th Field: The coding sequence CDS expression in EMBL or Genbank style. See Embl examples
6th Field: The matrices that are applied to the asymmetrical units for displaying the 3D-molecule. See Biological unit

Note: The percent encoding of "|" is %7C. Strictly speaking, "|" should be written as %7C in URLs. But apparently Web browsers tolerate if "|" is not properly encoded.

Uniprot Examples

Complex Uniprot Expressions

Proteins with name:hemoglobin Warning: these are many proteins. Better to open a new instance and not automatically starting alignment. If the user wanted to have it in the same view then he could use Drag-and-Drop.
Proteins with gene:hbb
Proteins with name:subtilisin Displaying Swissprot names like "GER2_WHEAT" instead of "P15290"
Proteins with EC 1.2.3.4

GenomeNet (Kegg) Examples

Entrez Examples

Three hexokinases

EMBL or Genbank nucleotide example

EMBL and Genbank files have a nucleotide sequence block. Coding sequences (CDS) are defined by an enumeration of nucleotide positions of the form

FT   CDS             join(25240..25717,29079..29174,31348..31417,39382..39809,

or in case of reverse complement

FT   CDS             complement(5226515..5227132)

This expression is used to compute the amino acid sequence. The following examplifies how this expression can be changed or how the n-th CDS can be selected.

M57965Myosin from EMBL
M57965 Myosin from Genbank
M57965 Myosin. Overriding CDS: "join(20..30,40,50"
M57965 Myosin.Overriding CDS: "complement(20..30,40,50)"
M57965 Selecting CDS No 1

Ensembl (under reconstruction)

ENSP00000369497 Protein
ENSG00000106633 Gene Hexokinase
Transcripts of Hexokinase GCK-001 GCK-002 GCK-003 GCK-201 GCK-202
Peptides of Hexokinase
OTTHUMG00000017411 OTTMUSP00000000526 Manually curated Vega genes
Alignment of peptides
ENSP00000369497 Loading Into Seqvista
ENSP00000369497 Colored highlightings

PDB-Examples

1prn Example of PDB entry with only one peptide chain.
1aab Example of an NMR file. STRAP loads only the first model.
1ryp Example of a protein with many different chains.
1ryp_CExample for specifying one particular chain. Here chain "C".
1gg2 Hetero-trimeric G-protein with alpha, beta and gamma subunit.
1g28 Fibroblast growth factor.
Flavo proteins: 1fiq 1szf 1jnw 1szg 1jqi 1t57 1p0n 1v4b 1qzu 1v5e 1reo 1v5f 1ryi 1vcf 1sbz 1vcg 1siq 1vp8 1sze 1xdi 1ybh 1y56 1g63
Problems: 2i7j 1ijs 1a34

Proteins with nucleic acid:

1gd2 Example of protein with nucleic acid. Leucine Zipper.
1l4p
1al2 Large virus
1al0 Large virus
2wbs Zink finger
1q82 Ribosome. Example for huge protein with many different chains and nucleic acid

Setting the biological unit:

The matrices which are applied to the can be specified in the 5th field in form of a bit-mask given as a hexadecimal number. For example 8 means the 4th matrix as the binary representation of 0x8 is 00000001000. Minus 1 denotes the asymmetric unit.
-1 all matrices 1(wrong, not existing) 2 3 4 8 10 20 40 10000(wrong, not existing)

Hetero-Compounds, DNA, RNA:

PDB files often contain non-peptide structures such as flavine or NADH and DNA/RNA structures which are treated in the following way: Those hetero compounds that share the chain identifier together with a peptide are added to the respective peptide object. This will be indicated by a vertical green (nucleotide) and red (heteros) bar of the protein labels in the alignment row headers. But if the hetero compound has a chain of its own, then things are more complicated:

Nucleotide chain A and B of 1gd2 The two nucleotide chains are displayed in a 3D-view. They are not associated to any peptide.
Nucleotide chain A and two peptide chains of 1gd2 The nucleotide chain is added to the peptide with the least Euclidean distance, which is chain F. The protein label 1gd2_E has the small green vertical bar.
Parenthetical group Association of nucleotide chain A to peptide chain E even if it is closer to peptide F. This time 1gd2_E is marked by a green vertical bar.

SCOP- Examples

SCOP-sunid 50499 alpha-Lytic protease from Lysobacter enzymogenes
SCOP-sunid 52766 Serine-carboxyl proteinase, SCP from Pseudomonas sp., sedolisin

PFAM Examples

Prodom Examples

PD000033 Medium sized Prodom file
PD000006 Large Prodom file with 12489 sequences.

Example with direct Web address

Instead of refering to a protein by database-colon-accession-ID, a crude Web address of the Protein file can be used.

Special characters of the URL like the two slashes in "http://" must be percent encoded.

Technical details

The client computer needs Java version 1.5 or higher. The links in this document point to a jnlp file. The jnlp file must be opened by the browser with the program bin/javaws which is part of the Java system. Occasionally, browsers fail to locate this program. In these cases the user needs to find the location of javaws on the hard-disk. See Browser settings.

External applications

Sequista cannot be used in Strap-Lite version yet.

Jalview Displays the alignment of the specified proteins in Jalview.
Seqvista: Displays M57965 in Seqvista.
Spice: Displays hslv_ecoli in Spice.

What happens in case of download errors:

Occasional, protein entries are removed from databases and are not available any more. What happens if STRAP tries to download a non-existing file or when the server is not responding? The result depends on the server response.

Some servers return an error message which is then interpreted as a protein sequence by STRAP.
A few servers return an http error. STRAP will skip these entries.
It may happen that the server does neither return a message not an http error code. It just blocks. This is the worst case because all download from this server are blocked during the current STRAP session. The user will need to restart STRAP.

Here some examples of non-existing entries:

Time consuming alignments:

Frame size and location

The location of the application frame is specified with the option geometry=width x heigth + offsetX + offsetY following the .

geometry=400x300+100+100 This means width=400 height=300, Position at pixel 100,100.
geometry=400x300+200+100
geometry=500x300+100+100
geometry=500x300-100+100 Negative offsetX refers to the right screen margin.

Related resources:

Jmol is a protein 3D-view which can be integrated in Web pages and controlled by buttons in the Web-page.
Jalview, and are alignment applets for Web pages.