[Biodevelopers] forcing a full sequence comparison in blast

Fri Apr 6 22:47:46 EDT 2007

Hi all,

I'm working on a script to compare all genes in a genome against a
full sequence in a blast database. both have around 2000 genes. my
script takes the test genome, extracts one amino acid sequence and
runs it through blast. it then filters the output to grab only the
name of the gene with the best match and the similarity (in percent).
For example, from these lines:

>Contig 165-147: 171558..172979 (reverse), 474 amino acids
 Identities = 471/473 (99%), Positives = 471/473 (99%)

it grabs the text Contig 165-147 and the percent 99%.

My problem comes when sequences have a lower similarity, and blast
uses only a section of the input gene. For example

>Contig 158-62: 61482..62750 (direct), 423 amino acids
 Identities = 15/46 (32%), Positives = 27/46 (58%), Gaps = 2/46 (4%)

Here, it's only used 46 of the amino acids, where the full gene
sequence has 347.

Is there a way I can force blast to use the full 347 amino acids for
comparison. The researchers in my lab are most interested in places
with low similarities, since they are trying to find the portions
which make this organism virulent.

Thanks again

Phil P.