[Bioclusters] Re: Help on BLAST

Mario Belluardo bioclusters@bioinformatics.org
Tue, 27 Aug 2002 14:19:19 +0200


Hi Wim,
thanks for your looking at the source.
Anyway I think someone couldbe interested in author notes.

------------------------------------

Seqsplit & Blastunsplit

        - Programs for running blastx on very long queries
          Source code and executables for SunOS

A brief documentation:

Seqsplit cuts up a query in n smaller, partly overlapping chunks and
Blastunsplit recombines the output of each chunk into one file and
recalculates the positions in the original query sequence.  This is
necessary for DNA queries longer than about 100000 bases since most
machines run out of memory in such cases.  These programs enable the
user to run smaller jobs and also to split them up over several
machines, if wanted.  The only drawback is that matches to the
overlapping sequence between two chunks may be duplicated or
fragmented, but we haven't found this a serious problem.

NOTE: Instead of running Blastunsplit, it is also possible to blast
the individual chunks with the option -qoffset, which should change
the coordinates the same way as Blastunsplit would.
Seqsplit & Blastunsplit were created by Chris Lee and modernised by
Erik Sonnhammer at the Sanger Centre, Cambridge UK

For comments and questions, contact Erik Sonnhammer at
esr@sanger.ac.uk

Cambridge,
950922


> Message: 3
> From: "Wim Glassee" <wim.glassee@ua.ac.be>
> To: <bioclusters@bioinformatics.org>
> Subject: RE: [Bioclusters] Re: Help on BLAST
> Date: Mon, 26 Aug 2002 15:34:04 +0200
> Reply-To: bioclusters@bioinformatics.org
> 
> Hi,
> 
> I had a fast look at the sources for seqsplit and blastunsplit, and
> there doesn't seem to be any statistics recalculation of any kind in
> there. If you blast smaller pieces of a query sequence against a db, the
> statistics will not be the same as for the original blast, so when
> merging the output files, you won't end up with the same results. In a
> lot of cases even the number of hits and/or hsps will NOT be the same.
> 
> Wim
> 
> > -----Original Message-----
> > From: bioclusters-admin@bioinformatics.org [mailto:bioclusters-
> > admin@bioinformatics.org] On Behalf Of Mario Belluardo
> > Sent: maandag 26 augustus 2002 15:04
> > To: bioclusters@bioinformatics.org
> > Subject: [Bioclusters] Re: Help on BLAST
> >
> > Hi Sylvain,
> > I've found and testing seqsplit (and blastunsplit) that you can
> download
> > form here
> >
> > ftp://ftp.cgr.ki.se/pub/prog/MSPcrunch+Blixem/
> >
> > Here is the web documentation:
> > http://www.cgr.ki.se/cgr/groups/sonnhammer/MSPcrunch.html
> >
> > Unfortunately seems it works only with a single-sequence at time, it
> > means that you cannot submit multi-sequences querys, but you can
> modify
> > yourself the source code. I would like to do it, so if you modify it
> > before me let me know!
> >
> > Mario
> >
> >
> >
> > > Message: 2
> > > Date: Fri, 23 Aug 2002 14:51:14 -0400
> > > From: Sylvain Foisy <sylvain.foisy@bioneq.qc.ca>
> > > To: bioclusters@bioinformatics.org
> > > Subject: [Bioclusters] Re: Help on BLAST
> > > Reply-To: bioclusters@bioinformatics.org
> > >
> > > Hi
> > >
> > > On Friday, August 23, 2002, at 12:01 PM, bioclusters-
> > > request@bioinformatics.org wrote:
> > >
> > > > I read your posts saying "splitting the query sequence into small
> =
> > > > fragments and BLASTing each of those fragments against the
> (entire) =
> > > > database is super-easy to implement." Could you please tell me how
> to
> > =
> > > > combine the results, or a link to the solution would be very
> helpful?
> > >
> > > Add me to the list of interested parties to that subject. I would
> like
> > > to know how to write an app that would do these three steps:
> > >
> > > -Splitting a sequence in multiples of, let say, 100 nucleotides;
> > > -Send each of them to a node for BLASTing;
> > > -Reassemble the different results into a single report for the
> users.
> > >
> > > Any web links that would help us in our quest?
> > >
> > > Cordially
> > >
> > > Sylvain
> > >
> > > ++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> > > Sylvain Foisy, Ph. D.
> > > Directeur-Operations / Project Manager
> > > BioNEQ - Le Reseau quebecois de bioinformatique
> > > Genome-Quebec
> > > Tel.: (514) 878-9911
> > > E-mail: sylvain.foisy@bioneq.qc.ca
> > > ++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> >
> >
> >
> > --
> >
> > Dr. Mario Belluardo
> > Institute for Cancer Research and Treatment
> > http://www.ircc.it
> > _______________________________________________
> > Bioclusters maillist  -  Bioclusters@bioinformatics.org
> > https://bioinformatics.org/mailman/listinfo/bioclusters
> 
> --__--__--
> 
> _______________________________________________
> Bioclusters maillist  -  Bioclusters@bioinformatics.org
> https://bioinformatics.org/mailman/listinfo/bioclusters
> 
> End of Bioclusters Digest