# Hidden Markov Model

### From Bioinformatics.Org Wiki

(Added narrative) |
m |
||

Line 12: | Line 12: | ||

* Switches from one genomic region to another are the state transitions. | * Switches from one genomic region to another are the state transitions. | ||

** 5'[[UTR]] to [[CDS]] | ** 5'[[UTR]] to [[CDS]] | ||

- | * CDS to [[intron]] | + | ** CDS to [[intron]] |

- | * intron to CDS | + | ** intron to CDS |

- | * CDS to 3'UTR | + | ** CDS to 3'UTR |

- | * 4 state transitions equals a probability of ¼. | + | ** 4 state transitions equals a probability of ¼. |

Difficulties with the HMM method include the need for accurate, applicable, and sufficiently sized [[training set|training sets]] of data. As for the example of gene detection, in order to accurately predict genes in the [[human genome]], many genes in the genome must be accurately known. | Difficulties with the HMM method include the need for accurate, applicable, and sufficiently sized [[training set|training sets]] of data. As for the example of gene detection, in order to accurately predict genes in the [[human genome]], many genes in the genome must be accurately known. |

## Latest revision as of 21:37, 4 September 2009

Markov chains are named for Russian mathematician Andrei Markov (1856-1922), and they are defined as observed sequences. A Markov model is a system that produces a Markov chain, and a hidden Markov model is one where the rules for producing the chain are unknown or "hidden." The rules include two probabilities: (i) that there will be a certain observation and (ii) that there will be a certain state transition, given the state of the model at a certain time.^{[1]}

The Hidden Markov Model (HMM) method is a mathematical approach to solving certain types of problems: (i) given the model, find the probability of the observations; (ii) given the model and the observations, find the most likely state transition trajectory; and (iii) maximize either *i* or *ii* by adjusting the model's parameters. For each of these problems, algorithms have been developed: (i) Forward-Backward, (ii) Viterbi, and (iii) Baum-Welch (and the Segmental K-means alternative).^{[1]}^{[2]}

The HMM method has been traditionally used in signal processing, speech recognition, and, more recently, bioinformatics. It may generally be used in pattern recognition problems, anywhere there may be a model producing a sequence of observations. In bioinformatics, it has been used in sequence alignment, *in silico* gene detection, structure prediction, data-mining literature, and so on.

Here is a simple example of the use of the HMM method in *in silico* gene detection:

- Codons (or DNA triplets) are the observations.
- 64 codons equals a probability of 1/64.

- The DNA sequence is the Markov chain (set of observations).
- Switches from one genomic region to another are the state transitions.

Difficulties with the HMM method include the need for accurate, applicable, and sufficiently sized training sets of data. As for the example of gene detection, in order to accurately predict genes in the human genome, many genes in the genome must be accurately known.

## Software

- HMMER - Profile hidden Markov models for biological sequence analysis
- SAM - Sequence Alignment and Modeling System, a collection of flexible software tools for creating, refining, and using linear hidden Markov models for biological sequence analysis
- VEIL - A hidden Markov model for finding genes in vertebrate DNA
- UGENE - A free multi-platform graphical interface for algortihms based on HMMER2 and HMMER3 packages

## References

- ↑
^{1.0}^{1.1}Dugad, R. and Desai, U.B. 1996. A Tutorial on Hidden Markov Models. http://vision.ai.uiuc.edu/dugad/hmm_tut.html - ↑ Dean, T. et al. 1998. Hidden Markov Models. http://www.cs.brown.edu/research/ai/dynamics/tutorial/Documents/HiddenMarkovModels.html