Represent a clustalw multiple alignment command line.
This is meant to make it easy to code the command line options you
want to submit to clustalw.
Clustalw has a ton of options and things to do but this is set up to
represent a clustalw mutliple alignment.
Warning: I don't use all of these options personally, so if you find
one to be broken for any reason, please let us know!
Methods
|
|
__init__
__str__
set_dna_matrix
set_guide_tree
set_new_guide_tree
set_output
set_protein_matrix
set_type
|
|
__init__
|
__init__ (
self,
sequence_file,
command='clustalw',
)
Initialize some general parameters that can be set as attributes.
Arguments:
o sequence_file - The file to read the sequences for alignment from.
o command - The command used to run clustalw. This defaults to
just clustalw (ie. assumes you have it on your path somewhere).
General attributes that can be set:
o is_quick - if set as 1, will use a fast algorithm to create
the alignment guide tree.
o allow_negative - allow negative values in the alignment matrix.
Multiple alignment attributes that can be set as attributes:
o gap_open_pen - Gap opening penalty
o gap_ext_pen - Gap extension penalty
o is_no_end_pen - A flag as to whether or not there should be a gap
separation penalty for the ends.
o gap_sep_range - The gap separation penalty range.
o is_no_pgap - A flag to turn off residue specific gaps
o is_no_hgap - A flag to turn off hydrophilic gaps
o h_gap_residues - A list of residues to count a hydrophilic
o max_div - A percent identity to use for delay (? - I don't undertand
this!)
o trans_weight - The weight to use for transitions
|
|
__str__
|
__str__ ( self )
Write out the command line as a string.
|
|
set_dna_matrix
|
set_dna_matrix ( self, dna_matrix )
Set the type of DNA matrix to use.
The dna_matrix can either be one of the defined types (iub or clustalw)
or a file with the matrix to use.
Exceptions
|
|
ValueError("Invalid matrix %s. Options are %s or a file." %( dna_matrix, self.DNA_MATRIX ) )
|
|
|
set_guide_tree
|
set_guide_tree ( self, tree_file )
Provide a file to use as the guide tree for alignment.
Raises:
o IOError - If the tree_file doesn't exist.
Exceptions
|
|
IOError( "Could not find the guide tree file %s." % tree_file )
|
|
|
set_new_guide_tree
|
set_new_guide_tree ( self, tree_file )
Set the name of the guide tree file generated in the alignment.
|
|
set_output
|
set_output (
self,
output_file,
output_type=None,
output_order=None,
change_case=None,
add_seqnos=None,
)
Set the output parameters for the command line.
Exceptions
|
|
ValueError( "Add SeqNos only valid for CLUSTAL output." )
ValueError( "Change case only valid for GDE output." )
ValueError("Invalid change case %s. Valid choices are %s" %( change_case, self.CHANGE_CASE ) )
ValueError("Invalid output order %s. Valid choices are %s" %( output_order, self.OUTPUT_ORDER ) )
ValueError("Invalid output type %s. Valid choices are %s" %( output_type, self.OUTPUT_TYPES ) )
ValueError("Invalid seqnos option %s. Valid choices: %s" %( add_seqnos, self.OUTPUT_SEQNOS ) )
|
|
|
set_protein_matrix
|
set_protein_matrix ( self, protein_matrix )
Set the type of protein matrix to use.
Protein matrix can be either one of the defined types (blosum, pam,
gonnet or id) or a file with your own defined matrix.
Exceptions
|
|
ValueError("Invalid matrix %s. Options are %s or a file." %( string.upper( protein_matrix ), self.PROTEIN_MATRIX ) )
|
|
|
set_type
|
set_type ( self, residue_type )
Set the type of residues within the file.
Clustal tries to guess whether the info is protein or DNA based on
the number of GATCs, but this can be wrong if you have a messed up
protein or DNA you are working with, so this allows you to set it
explicitly.
Exceptions
|
|
ValueError("Invalid residue type %s. Valid choices are %s" %( residue_type, self.RESIDUE_TYPES ) )
|
|
|