Table of Contents

Class: SeqMat Bio/SubsMat/__init__.py

A Generic sequence matrix class The key is a 2-tuple containing the letter indices of the matrix. Those should be sorted in the tuple (low, high). Because each matrix is dealt with as a half-matrix.

5/2001 added the following: Methods for subtraction, addition and multiplication of matrices Generation of an expected frequency table from an observed frequency matrix Calculation of linear correlation coefficient between two matrices. Needs Bio.Tools.statfns Calculation of relative entropy is now done using the _make_relative_entropy method and is stored in the member self.relative_entropy Calculation of entropy is now done using the _make_entropy method and is stored in the member self.entropy Jensen-Shannon distance between the distributions from which the matrices are derived. This is a distance function based on the distribution's entropies.

Substitution matrix routines Iddo Friedberg idoerg@cc.huji.ac.il Biopython license applies (http://biopython.org)

General: You should have python 2.0 or above. http://www.python.org You should have biopython (http://biopython.org) installed.

This module provides a class and a few routines for generating substitution matrices, similar ot BLOSUM or PAM matrices, but based on user-provided data. The class used for these matrices is SeqMat

Matrices are implemented as a user dictionary. Each index contains a 2-tuple, which are the two residue/nucleotide types replaced. The value differs according to the matrix's purpose: e.g in a log-odds frequency matrix, the value would be log(Pij/(Pi*Pj)) where: Pij: frequency of substitution of letter (residue/nucletide) i by j Pi, Pj: expected frequencies of i and j, respectively.

Usage: The following section is layed out in the order by which most people wish to generate a log-odds matrix. Of course, interim matrices can be generated and investigated. Most people just want a log-odds matrix, that's all.

Generating an Accepted Replacement Matrix: Initially, you should generate an accepted replacement matrix (ARM) from your data. The values in ARM are the counted number of replacements according to your data. The data could be a set of pairs or multiple alignments. So for instance if Alanine was replaced by Cysteine 10 times, and Cysteine by Alanine 12 times, the corresponding ARM entries would be: [A,'C']: 10, [C,'A'] 12 as order doesn't matter, user can already provide only one entry: [A,'C']: 22 A SeqMat instance may be initialized with either a full (first method of counting: 10, 12) or half (the latter method, 22) matrices. A Full protein alphabet matrix would be of the size 20x20 = 400. A Half matrix of that alphabet would be 20x20/2 + 20/2 = 210. That is because same-letter entries don't change. (The matrix diagonal). Given an alphabet size of N: Full matrix size:N*N Half matrix size: N(N+1)/2

If you provide a full matrix, the constructore will create a half-matrix automatically. If you provide a half-matrix, make sure of a (low, high) sorted order in the keys: there should only be a (A,C) not a (C,A).

Internal functions:

Generating the observed frequency matrix (OFM): Use: OFM = _build_obs_freq_mat(ARM) The OFM is generated from the ARM, only instead of replacement counts, it contains replacement frequencies. Generating an expected frequency matrix (EFM): Use: EFM = _build_exp_freq_mat(OFM,exp_freq_table) exp_freq_table: should be a freqTableC instantiation. See freqTable.py for detailed information. Briefly, the expected frequency table has the frequencies of appearance for each member of the alphabet Generating a substitution frequency matrix (SFM): Use: SFM = _build_subs_mat(OFM,EFM) Accepts an OFM, EFM. Provides the division product of the corresponding values. Generating a log-odds matrix (LOM): Use: LOM=_build_log_odds_mat(SFM[,logbase=10,factor=10.0,roundit=1]) Accepts an SFM. logbase: base of the logarithm used to generate the log-odds values. factor: factor used to multiply the log-odds values. roundit: default - true. Whether to round the values. Each entry is generated by log(LOM[key])*factor And rounded if required.

External: In most cases, users will want to generate a log-odds matrix only, without explicitly calling the OFM --> EFM --> SFM stages. The function build_log_odds_matrix does that. User provides an ARM and an expected frequency table. The function returns the log-odds matrix

Base Classes   
UserDict.UserDict
Methods   
__init__
__mul__
__sub__
__sum__
_alphabet_from_matrix
_correct_matrix
_full_to_half
_init_zero
all_letters_sum
letter_sum
make_entropy
make_relative_entropy
print_full_mat
print_mat
  __init__ 
__init__ (
        self,
        data=None,
        alphabet=None,
        mat_type=NOTYPE,
        mat_name='',
        build_later=0,
        )

  __mul__ 
__mul__ ( self,  other )

returns a matrix for which each entry is the multiplication product of the two matrices passed

  __sub__ 
__sub__ ( self,  other )

returns a number which is the subtraction product of the two matrices

  __sum__ 
__sum__ ( self,  other )

  _alphabet_from_matrix 
_alphabet_from_matrix ( self )

  _correct_matrix 
_correct_matrix ( self )

  _full_to_half 
_full_to_half ( self )

Convert a full-matrix to a half-matrix

  _init_zero 
_init_zero ( self )

  all_letters_sum 
all_letters_sum ( self )

  letter_sum 
letter_sum ( self,  letter )

  make_entropy 
make_entropy ( self )

  make_relative_entropy 
make_relative_entropy ( self,  obs_freq_mat )

if this matrix is a log-odds matrix, return its entropy Needs the observed frequency matrix for that

Exceptions   
TypeError, "entropy: substitution or log-odds matrices only"
  print_full_mat 
print_full_mat (
        self,
        f=sys.stdout,
        format="%4d",
        topformat="%4s",
        alphabet=None,
        factor=1,
        non_sym=None,
        )

  print_mat 
print_mat (
        self,
        f=sys.stdout,
        format="%4d",
        bottomformat="%4s",
        alphabet=None,
        factor=1,
        )

Print a nice half-matrix. f=sys.stdout to see on the screen User may pass own alphabet, which should contain all letters in the alphabet of the matrix, but may be in a different order. This order will be the order of the letters on the axes


Table of Contents

This document was automatically generated on Mon Jul 1 12:02:57 2002 by HappyDoc version 2.0.1