NCBIStandalone.py
This module provides code to work with the standalone version of
BLAST, either blastall or blastpgp, provided by the NCBI.
http://www.ncbi.nlm.nih.gov/BLAST/
Classes:
LowQualityBlastError Except that indicates low quality query sequences.
BlastParser Parses output from blast.
BlastErrorParser Parses output and tries to diagnose possible errors.
PSIBlastParser Parses output from psi-blast.
Iterator Iterates over a file of blast results.
_Scanner Scans output from standalone BLAST.
_BlastConsumer Consumes output from blast.
_PSIBlastConsumer Consumes output from psi-blast.
_HeaderConsumer Consumes header information.
_DescriptionConsumer Consumes description information.
_AlignmentConsumer Consumes alignment information.
_HSPConsumer Consumes hsp information.
_DatabaseReportConsumer Consumes database report information.
_ParametersConsumer Consumes parameters information.
Functions:
blastall Execute blastall.
blastpgp Execute blastpgp.
Imported modules
|
|
from Bio import File
from Bio.Blast import Record
from Bio.ParserSupport import *
import os
import popen2
import re
import string
from types import *
|
Functions
|
|
_get_cols
_re_search
_safe_float
_safe_int
blastall
blastpgp
|
|
_get_cols
|
_get_cols (
line,
cols_to_get,
ncols=None,
expected={},
)
Exceptions
|
|
SyntaxError, "I expected %d columns (got %d) in line\n%s" %( ncols, len( cols ), line )
SyntaxError, "I expected '%s' in column %d in line\n%s" %( expected [ k ], k, line )
|
|
|
_re_search
|
_re_search (
regex,
line,
error_msg,
)
|
|
_safe_float
|
_safe_float ( str )
Thomas Rosleff Soerensen (rosleff@mpiz-koeln.mpg.de) noted that
float(e-172 ) does not produce an error on his platform. Thus,
we need to check the string for this condition.
|
|
_safe_int
|
_safe_int ( str )
|
|
blastall
|
blastall (
blastcmd,
program,
database,
infile,
**keywds,
)
blastall(blastcmd, program, database, infile, **keywds) ->
read, error Undohandles Execute and retrieve data from blastall. blastcmd is the command
used to launch the blastall executable. program is the blast program
to use, e.g. blastp , blastn , etc. database is the path to the database
to search against. infile is the path to the file containing
the sequence to search with.
You may pass more parameters to **keywds to change the behavior of
the search. Otherwise, optional values will be chosen by blastall. Scoring
matrix Matrix to use.
gap_open Gap open penalty.
gap_extend Gap extension penalty.
nuc_match Nucleotide match reward. (BLASTN)
nuc_mismatch Nucleotide mismatch penalty. (BLASTN)
query_genetic_code Genetic code for Query.
db_genetic_code Genetic code for database. (TBLAST[NX])
Algorithm
gapped Whether to do a gapped alignment. T/F (not for TBLASTX)
expectation Expectation value cutoff.
wordsize Word size.
strands Query strands to search against database.([T]BLAST[NX])
keep_hits Number of best hits from a region to keep.
xdrop Dropoff value (bits) for gapped alignments.
hit_extend Threshold for extending hits.
region_length Length of region used to judge hits.
db_length Effective database length.
search_length Effective length of search space.
Processing
filter Filter query sequence? T/F
believe_query Believe the query defline. T/F
restrict_gi Restrict search to these GI's.
nprocessors Number of processors to use.
Formatting
html Produce HTML output? T/F
descriptions Number of one-line descriptions.
alignments Number of alignments.
align_view Alignment view. Integer 0-6.
show_gi Show GI's in deflines? T/F
seqalign_file seqalign file to output.
Exceptions
|
|
ValueError, "blastall does not exist at %s" % blastcmd
|
|
|
blastpgp
|
blastpgp (
blastcmd,
database,
infile,
**keywds,
)
blastpgp(blastcmd, database, infile, **keywds) ->
read, error Undohandles Execute and retrieve data from blastpgp. blastcmd is the command
used to launch the blastpgp executable. database is the path to the
database to search against. infile is the path to the file containing
the sequence to search with.
You may pass more parameters to **keywds to change the behavior of
the search. Otherwise, optional values will be chosen by blastpgp. Scoring
matrix Matrix to use.
gap_open Gap open penalty.
gap_extend Gap extension penalty.
window_size Multiple hits window size.
npasses Number of passes.
passes Hits/passes. Integer 0-2.
Algorithm
gapped Whether to do a gapped alignment. T/F
expectation Expectation value cutoff.
wordsize Word size.
keep_hits Number of beset hits from a region to keep.
xdrop Dropoff value (bits) for gapped alignments.
hit_extend Threshold for extending hits.
region_length Length of region used to judge hits.
db_length Effective database length.
search_length Effective length of search space.
nbits_gapping Number of bits to trigger gapping.
pseudocounts Pseudocounts constants for multiple passes.
xdrop_final X dropoff for final gapped alignment.
xdrop_extension Dropoff for blast extensions.
model_threshold E-value threshold to include in multipass model.
required_start Start of required region in query.
required_end End of required region in query.
Processing
XXX should document default values
program The blast program to use. (PHI-BLAST)
filter Filter query sequence with SEG? T/F
believe_query Believe the query defline? T/F
nprocessors Number of processors to use.
Formatting
html Produce HTML output? T/F
descriptions Number of one-line descriptions.
alignments Number of alignments.
align_view Alignment view. Integer 0-6.
show_gi Show GI's in deflines? T/F
seqalign_file seqalign file to output.
align_outfile Output file for alignment.
checkpoint_outfile Output file for PSI-BLAST checkpointing.
restart_infile Input file for PSI-BLAST restart.
hit_infile Hit file for PHI-BLAST.
matrix_outfile Output file for PSI-BLAST matrix in ASCII.
align_infile Input alignment file for PSI-BLAST restart.
Exceptions
|
|
ValueError, "blastpgp does not exist at %s" % blastcmd
|
|
Classes
|
|
|
|