Class: SeqMat | Bio/SubsMat/__init__.py | ||||||||||||||||||||||||||||||||||||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
A Generic sequence matrix class The key is a 2-tuple containing the letter indices of the matrix. Those should be sorted in the tuple (low, high). Because each matrix is dealt with as a half-matrix. 5/2001 added the following: Methods for subtraction, addition and multiplication of matrices Generation of an expected frequency table from an observed frequency matrix Calculation of linear correlation coefficient between two matrices. Needs Bio.Tools.statfns Calculation of relative entropy is now done using the _make_relative_entropy method and is stored in the member self.relative_entropy Calculation of entropy is now done using the _make_entropy method and is stored in the member self.entropy Jensen-Shannon distance between the distributions from which the matrices are derived. This is a distance function based on the distribution's entropies. Substitution matrix routines Iddo Friedberg idoerg@cc.huji.ac.il Biopython license applies (http://biopython.org) General: You should have python 2.0 or above. http://www.python.org You should have biopython (http://biopython.org) installed. This module provides a class and a few routines for generating substitution matrices, similar ot BLOSUM or PAM matrices, but based on user-provided data. The class used for these matrices is SeqMat Matrices are implemented as a user dictionary. Each index contains a 2-tuple, which are the two residue/nucleotide types replaced. The value differs according to the matrix's purpose: e.g in a log-odds frequency matrix, the value would be log(Pij/(Pi*Pj)) where: Pij: frequency of substitution of letter (residue/nucletide) i by j Pi, Pj: expected frequencies of i and j, respectively. Usage: The following section is layed out in the order by which most people wish to generate a log-odds matrix. Of course, interim matrices can be generated and investigated. Most people just want a log-odds matrix, that's all. Generating an Accepted Replacement Matrix:
Initially, you should generate an accepted replacement matrix
(ARM) from your data. The values in ARM are the counted number of
replacements according to your data. The data could be a set of pairs
or multiple alignments. So for instance if Alanine was replaced by
Cysteine 10 times, and Cysteine by Alanine 12 times, the corresponding
ARM entries would be:
[ If you provide a full matrix, the constructore will create a half-matrix
automatically.
If you provide a half-matrix, make sure
of a (low, high) sorted order in the keys: there should only be
a ( Internal functions: Generating the observed frequency matrix (OFM): Use: OFM = _build_obs_freq_mat(ARM) The OFM is generated from the ARM, only instead of replacement counts, it contains replacement frequencies. Generating an expected frequency matrix (EFM): Use: EFM = _build_exp_freq_mat(OFM,exp_freq_table) exp_freq_table: should be a freqTableC instantiation. See freqTable.py for detailed information. Briefly, the expected frequency table has the frequencies of appearance for each member of the alphabet Generating a substitution frequency matrix (SFM): Use: SFM = _build_subs_mat(OFM,EFM) Accepts an OFM, EFM. Provides the division product of the corresponding values. Generating a log-odds matrix (LOM): Use: LOM=_build_log_odds_mat(SFM[,logbase=10,factor=10.0,roundit=1]) Accepts an SFM. logbase: base of the logarithm used to generate the log-odds values. factor: factor used to multiply the log-odds values. roundit: default - true. Whether to round the values. Each entry is generated by log(LOM[key])*factor And rounded if required. External: In most cases, users will want to generate a log-odds matrix only, without explicitly calling the OFM --> EFM --> SFM stages. The function build_log_odds_matrix does that. User provides an ARM and an expected frequency table. The function returns the log-odds matrix
|