[Biodevelopers] forcing a full sequence comparison in blast

Phil Princely phil.princely at gmail.com
Fri Apr 6 22:47:46 EDT 2007


Hi all,

I'm working on a script to compare all genes in a genome against a
full sequence in a blast database. both have around 2000 genes. my
script takes the test genome, extracts one amino acid sequence and
runs it through blast. it then filters the output to grab only the
name of the gene with the best match and the similarity (in percent).
For example, from these lines:

>Contig 165-147: 171558..172979 (reverse), 474 amino acids
 Identities = 471/473 (99%), Positives = 471/473 (99%)

it grabs the text Contig 165-147 and the percent 99%.

My problem comes when sequences have a lower similarity, and blast
uses only a section of the input gene. For example

>Contig 158-62: 61482..62750 (direct), 423 amino acids
 Identities = 15/46 (32%), Positives = 27/46 (58%), Gaps = 2/46 (4%)

Here, it's only used 46 of the amino acids, where the full gene
sequence has 347.

Is there a way I can force blast to use the full 347 amino acids for
comparison. The researchers in my lab are most interested in places
with low similarities, since they are trying to find the portions
which make this organism virulent.

Thanks again

Phil P.


More information about the Biodevelopers mailing list