MaxEnt.py
Maximum Entropy code.
Uses Improved Iterative Scaling:
XXX ref
# XXX need to define terminology
Imported modules
|
|
from Bio.Tools import listfns
from Numeric import *
import math
|
Functions
|
|
|
|
_calc_empirical_expects
|
_calc_empirical_expects (
xs,
ys,
classes,
features,
)
_calc_empirical_expects(xs, ys, classes, features) -> list of expectations
Calculate the expectation of each function from the data. This is
the constraint for the maximum entropy distribution. Return a
list of expectations, parallel to the list of features.
|
|
_calc_f_sharp
|
_calc_f_sharp (
N,
nclasses,
features,
)
_calc_f_sharp(N, nclasses, features) -> matrix of f sharp values.
|
|
_calc_model_expects
|
_calc_model_expects (
xs,
classes,
features,
alphas,
)
_calc_model_expects(xs, classes, features, alphas) -> list of expectations.
Calculate the expectation of each feature from the model. This is
not used in maximum entropy training, but provides a good function
for debugging.
|
|
_calc_p_class_given_x
|
_calc_p_class_given_x (
xs,
classes,
features,
alphas,
)
_calc_p_class_given_x(xs, classes, features, alphas) -> matrix
Calculate P(y|x), where y is the class and x is an instance from
the training set. Return a XSxCLASSES matrix of probabilities.
|
|
_eval_feature_fn
|
_eval_feature_fn (
fn,
xs,
classes,
)
_eval_feature_fn(fn, xs, classes) -> dict of values
Evaluate a feature function on every instance of the training set
and class. fn is a callback function that takes two parameters: a
training instance and a class. Return a dictionary of (training
set index, class index) -> non-zero value. Values of 0 are not
stored in the dictionary.
|
|
_iis_solve_delta
|
_iis_solve_delta (
N,
feature,
f_sharp,
empirical,
prob_yx,
)
|
|
_train_iis
|
_train_iis (
xs,
classes,
features,
f_sharp,
alphas,
e_empirical,
)
Do one iteration of hill climbing to find better alphas.
This is a good function to parallelize.
|
|
calculate
|
calculate ( me, observation )
calculate(me, observation) -> list of log probs
Calculate the log of the probability for each class. me is a
MaxEntropy object that has been trained. observation is a vector
representing the observed data. The return value is a list of
unnormalized log probabilities for each class.
|
|
classify
|
classify ( me, observation )
classify(me, observation) -> class
Classify an observation into a class.
|
|
train
|
train (
training_set,
results,
feature_fns,
update_fn=None,
)
train(training_set, results, feature_fns[, update_fn]) -> MaxEntropy object
Train a maximum entropy classifier on a training set.
training_set is a list of observations. results is a list of the
class assignments for each observation. feature_fns is a list of
the features. These are callback functions that take an
observation and class and return a 1 or 0. update_fn is a
callback function that's called at each training iteration. It is
passed a MaxEntropy object that encapsulates the current state of
the training.
Exceptions
|
|
"IIS did not converge"
ValueError, "No data in the training set."
ValueError, "training_set and results should be parallel lists."
|
|
Classes
|
|
MaxEntropy |
Holds information for a Maximum Entropy classifier.
|
|
|