Data analysis

From Bioinformatics.Org Wiki

Jump to: navigation, search

Bioinformatics tools can be used to obtain sequences of genes or proteins of interest, either from material obtained, labeled, prepared and examined in electric fields by individual researchers/groups or from repositories of sequences from previously investigated material.


Analysis of data

Both types of sequence can then be analyzed in many ways with bioinformatics tools.

They can be assembled. Note that this is one of the occasions when the meaning of a biological term differs markedly from a computational one (see the amusing confusion over the issue at Web-based geek forum Slashdot). Computer scientists, banish from your mind any thought of assembly language. Sequencing can only be performed for relatively short stretches of a biomolecule and finished sequences are therefore prepared by arranging overlapping "reads" of monomers (single beads on a molecular chain) into a single continuous passage of "code". This is the bioinformatic sense of assembly.

They can be mapped***---that is, their sequences can be parsed to find sites where so-called "restriction enzymes" will cut them.

They can be compared, usually by aligning corresponding segments and looking for matching and mismatching letters in their sequences. Genes or proteins that are sufficiently similar are likely to be related and are therefore said to be "homologous" to each other---the whole truth is rather more complicated than this. Such cousins are called "homologs".

If a homolog (a related molecule) exists, then a newly discovered protein may be modeled---that is the three dimensional structure of the gene product can be predicted without doing laboratory experiments.

Bioinformatics is used in primer design. Primers are short sequences needed to make many copies of (amplify) a piece of DNA as used in PCR (the Polymerase Chain Reaction).

Bioinformatics is used to attempt to predict the function of actual gene products.

Information about the similarity, and, by implication, the relatedness of proteins is used to trace the "family trees" of different molecules through evolutionary time.

There are various other applications of computer analysis to sequence data, but, with so much raw data being generated by the Human Genome Project and other initiatives in biology, computers are presently essential for many biologists just to manage their day-to-day results

Molecular modelling / structural biology is a growing field which can be considered part of bioinformatics. There are, for example, tools which allow you (often via the Net) to make pretty good predictions of the secondary structure of proteins arising from a given amino acid sequence, often based on known "solved" structures and other sequenced molecules acquired by structural biologists.

Structural biologists use "bioinformatics" to handle the vast and complex data from X-ray crystallography, nuclear magnetic resonance (NMR) and electron microscopy investigations and create the 3-D models of molecules that seem to be everywhere in the media.


Unfortunately the word "map" is used in several different ways in biology/genetics/bioinformatics. The definition given above is the one most frequently used in this context, but a gene can be said to be "mapped" when its parent chromosome has been identified, when its physical or genetic distance from other genes is established and---less frequently---when the structure and locations of its various coding components (its "exons") are established.


See also

Personal tools
wiki navigation