Table of Contents

Class: BaumWelchTrainer Bio/HMM/Trainer.py

Trainer that uses the Baum-Welch algorithm to estimate parameters.

These should be used when a training sequence for an HMM has unknown paths for the actual states, and you need to make an estimation of the model parameters from the observed emissions.

This uses the Baum-Welch algorithm, first described in Baum, L.E. 1972. Inequalities. 3:1-8 This is based on the description in Biological Sequence Analysis by Durbin et al. in section 3.3

This algorithm is guaranteed to converge to a local maximum, but not necessarily to the global maxima, so use with care!

Base Classes   
AbstractTrainer
Methods   
__init__
train
update_emissions
update_transitions
  __init__ 
__init__ ( self,  markov_model )

Initialize the trainer.

Arguments:

  • markov_model - The model we are going to estimate parameters for. This should have the parameters with some initial estimates, that we can build from.

  train 
train (
        self,
        training_seqs,
        stopping_criteria,
        dp_method=ScaledDPAlgorithms,
        )

Estimate the parameters using training sequences.

The algorithm for this is taken from Durbin et al. p64, so this is a good place to go for a reference on what is going on.

Arguments:

  • training_seqs -- A list of TrainingSequence objects to be used for estimating the parameters.

  • stopping_criteria -- A function, that when passed the change in log likelihood and threshold, will indicate if we should stop the estimation iterations.

  • dp_method -- A class instance specifying the dynamic programming implementation we should use to calculate the forward and backward variables. By default, we use the scaling method.

  update_emissions 
update_emissions (
        self,
        emission_counts,
        training_seq,
        forward_vars,
        backward_vars,
        training_seq_prob,
        )

Add the contribution of a new training sequence to the emissions

Arguments:

  • emission_counts -- A dictionary of the current counts for the emissions

  • training_seq -- The training sequence we are working with

  • forward_vars -- Probabilities calculated using the forward algorithm.

  • backward_vars -- Probabilities calculated using the backwards algorithm.

  • training_seq_prob - The probability of the current sequence.

This calculates E_{k}(b) (the estimated emission probability for emission letter b from state k) using formula 3.21 in Durbin et al.

  update_transitions 
update_transitions (
        self,
        transition_counts,
        training_seq,
        forward_vars,
        backward_vars,
        training_seq_prob,
        )

Add the contribution of a new training sequence to the transitions.

Arguments:

  • transition_counts -- A dictionary of the current counts for the transitions

  • training_seq -- The training sequence we are working with

  • forward_vars -- Probabilities calculated using the forward algorithm.

  • backward_vars -- Probabilities calculated using the backwards algorithm.

  • training_seq_prob - The probability of the current sequence.

This calculates A_{kl} (the estimated transition counts from state k to state l) using formula 3.20 in Durbin et al.


Table of Contents

This document was automatically generated on Mon Jul 1 12:03:09 2002 by HappyDoc version 2.0.1