Hi all, I'm working on a script to compare all genes in a genome against a full sequence in a blast database. both have around 2000 genes. my script takes the test genome, extracts one amino acid sequence and runs it through blast. it then filters the output to grab only the name of the gene with the best match and the similarity (in percent). For example, from these lines: >Contig 165-147: 171558..172979 (reverse), 474 amino acids Identities = 471/473 (99%), Positives = 471/473 (99%) it grabs the text Contig 165-147 and the percent 99%. My problem comes when sequences have a lower similarity, and blast uses only a section of the input gene. For example >Contig 158-62: 61482..62750 (direct), 423 amino acids Identities = 15/46 (32%), Positives = 27/46 (58%), Gaps = 2/46 (4%) Here, it's only used 46 of the amino acids, where the full gene sequence has 347. Is there a way I can force blast to use the full 347 amino acids for comparison. The researchers in my lab are most interested in places with low similarities, since they are trying to find the portions which make this organism virulent. Thanks again Phil P.