segmenting blast databases (was Re: [Bioclusters] Details on a local blast cluster question)

Sergio Ahumada N bioclusters@bioinformatics.org
Thu, 30 Jan 2003 15:49:20 -0300


> This is why you will get different results/scores if you search a
> sequence against a full database and then repeat the same search agains=
t
> a segment or smaller piece of the full database.

Yeah, I understand.

> Back in the old days of bioinformatics :) getting around this problem
> used to be a giant pain in the ass and involved manually parsing out an=
d
> correcting all of the scores from your segmented blast results. It was
> doable but the process was open to parsing and statistical errors and
> was just Not Fun.

Ooops, I forget this, I'm a newbie. In fact, I'm a computer science *stud=
ent*=20
making some "research" in my summer time (yeah, we are in summer here)=20
without any knowledge about biologist, so I don't unknow the old days of=20
bioinformatics ;-) ... Im learning with falls .. but I have so many desir=
e to=20
learn more .. what am I do here ? just learning ... [you may take this as=
 a=20
joke, but is true]

I made some scripts in Perl to parsing the results/scores of each piece f=
rom=20
the splitted "nt" database. I get the bests of each results and compare e=
ach=20
other to choose the best of the best (I hope you understand this)

Example (dummy values) three pieces of nt databse for one sequence:

result1 -> VVCCSSA01.g -> Score =3D 12, Score =3D 10, etc ..
result2 -> VVCCSSA01.g -> Score =3D 16, Score =3D 14, etc=20
result3 -> VVCCSSA01.g -> Score =3D 20, Score =3D 18, etc

So .. I chose from result 1 --> Score =3D 12 (the best, maximum), Score =3D=
 16=20
from result 2, Score =3D 20 from result3 ... then I compare and I choose =
Score=20
=3D 18 from result 3 as my result.

But statistical errors are not right, because I am reducing my search spa=
ce as=20
you say.

>   o Added the "-z" option to explicitly ovverride the effective length
> of the database (added in blast release 2.0.4)
>   o XML output option (added in blast release 2.1.2)

Aha ! I don't know -z option, maybe I have some older versions of blast.
XML ? I don't like it :)

[... -z option versus XML]

So as you see .. I have to read and learn more .. I will test your ideas =
and=20
comments for enhance my scripts.

Thanks a lot !

> --Chris

--
Sergio Ahumada N
san@inf.utfsm.cl