[Biodevelopers] forcing a full sequence comparison in blast

Phil Princely phil.princely at gmail.com
Thu Apr 12 22:47:04 EDT 2007

Thanks to everyone for your help.

Minky, thanks for the help. I'm using the -F F option. I asked about
another blast problem on this list and that was the solution.

Martin, thanks for the advice about needle. I'd never heard about it,
but I'll going to have to concentrate on blast. I'm still a beginner
with blast, and I'd like to get a decent knowledge of it before I try
something new.

Michael, thank you too. The output I've gotten from blast hasn't given
me any results under 20% matches. I'm not exactly sure why it cuts off
at that point. There seems to be some setting, since (out of 2000
comparisons) there are no matches with 19% similarity, but 6 with 20%
similarity, 14 with 21% and so on.

I've changed my filter to grab the gene length for comparison to the
original gene length. Here's a result

Gene Length: 32
>Contig 130-11: 11052..12368 (reverse), 439 amino acids
Identities = 7/11 (63%), Positives = 10/11 (90%)

Here, the original is 32 AAs long, the database gene is 439 AAs long.
Blast took a section 11 AAs long and found a section of the database
gene with 63% similarity. I'd prefer if this was expressed as 7/32
(21%) instead of 7/11 (63%).

My solution is going to be to compare the gene lengths first for big
differences, and not rely on percentages below 95%. A cursory glance
at the output data tells me that there's either a very good match, or
a bad match. I guess because the organisms are so close genetically.

I'm sorry if there's any mistakes in this. My background is more
computers than genetics, so I'm still struggling to understand a lot
of this.

Thanks again to everyone.

Phil P.

