> -----Original Message----- > From: bioclusters-admin@bioinformatics.org [mailto:bioclusters- > admin@bioinformatics.org] On Behalf Of chris dagdigian > Sent: vrijdag 7 juni 2002 13:56 > To: bioclusters@bioinformatics.org > Subject: Re: [Bioclusters] Parallel blast > > > Hi Wim, > > This will be a quickie response... > > With newer versions of ncbi-blast there are 2 things that have made the > process of splitting up the target databases so that your query can be > multiplexed across multiple searches and machines far easier: > > o The "-z" option switch (used to be undocumented I think?) allows you > to override/tell the blastall binary the effective size of the database. > If you feed the original (large) value to the blastall binary while > searching against the small slice you will at least get back the correct > scores and statistics. This is a huge time and accuracy saver as trying > to parse and adjust these values after the fact is a giant error-prone > excercise in pain. > I've been messing around with blast for quite a while now. I'm using the -z flag and the xml output to eventually get a result that is identical to the normal blast output (without partitioning). What few people know is that the -z parameter alone is not enough. The statistics of blast are also based on the number of sequences in the database. For this given, there is no parameter for blastall. Likewise, I understand that some people divide query sequences as well. If you want your results to be the same, you have to let blast know just how big your 'original' query was, so it can calculate its statistics correctly. I'm working on a solution that does both these things and merges the output files, but I'm afraid it's not as easy as it sounds. I'm just wondering if there any other parallel blast solutions, so I can spare me the hassle of trying to do this myself > o XML output of results > > Having the scores and statistics correct while getting the results back > in a way that is far easier to parse than the human readable version is > 95% of the battle. Everything else is fairly simple. The XML output (although it has a pretty strange format) helps, that's for sure! Wim > > -Chris > > > Wim Glassee wrote: > > ><snip> > > > >I've noticed some people cut their databases and query sequences to > >smaller pieces, with or without overlap, and perform separate blasts. > >But how do you put them back together again? And are the results the > >same? > > > >Wim > > > > > > > > > > > -- > Chris Dagdigian, <dag@sonsorol.org> > Life Science IT & Research Computing Consultant > Office: 617-666-6454, Mobile: 617-877-5498, Fax: 425-699-0193 > Work: http://bioteam.net PGP KeyID: 83D4310E Yahoo IM: craffi > > > > _______________________________________________ > Bioclusters maillist - Bioclusters@bioinformatics.org > http://bioinformatics.org/mailman/listinfo/bioclusters