<div dir="ltr">Be sure to include -num_threads<div><br></div><div>I still think that this will be slower overall, but it will be interesting to hear your results!</div><div><br></div><div>Marty</div><div><br></div><img src="http://t.signauxneuf.com/e1t/o/5/f18dQhb0S7ks8dDMPbW2n0x6l2B9gXrN7sKj6v4fhjlN5w6Cy8d75CPW3MxYkM3LvrVvW9c0Hhy1k1H6H0?si=4739913040265216&amp;pi=ec43d7c9-b4fd-4b41-f06a-e0b61eef4d77" style="display:none!important" height="1" width="1"></div><div class="gmail_extra"><br><div class="gmail_quote">On Tue, May 5, 2015 at 9:33 PM, Zain A Alvi <span dir="ltr">&lt;<a href="mailto:zain.alvi@student.shu.edu" target="_blank">zain.alvi@student.shu.edu</a>&gt;</span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">Hi Everyone,<br>

<br>

Thank you for all the great and helpful recommendations, especially Tim, Tony, Dr. Beall, and Andreas.  I am trying to do exactly what Tim has showed and having BLASTx run on each fasta file one at a time, but not at the same time.  It should go through each fasta file one at a time and do BLASTX and then move onto next fasta file until there are no fasta files left in the folder.<br>

<br>

Would something like this work as well:<br>

<br>

for input in *.fa; do -blastx -db /path_to_db -query $input -out $input.blastx_output; done<br>

<br>

Then concantentate all *.blastx_output &gt; Final_BlastxOutput.blastx_output<br>

<br>

Thank you for the very interesting information about parallel on Bio Linux. Would parallel work well for de-novo assemblers like Velvet and Spades (as examples)? Especially Velvet after reading about: <a href="https://www.biostars.org/p/86907/" target="_blank">https://www.biostars.org/p/86907/</a><br>

<br>

Also would creating multiple databases of the same database with a different name/title. Will that go around the problem of accessing the same database and memory problems when trying to run multiple BLASTx.  I know it is not recommended, would this quasi method be any beneficial to do. Should I just stick with the script above or the script that Tim kindly shared?<br>

<br>

For example:<br>

<br>

Folder A<br>

blastx -db /path_to_db01 -infile input_seq_001-100 -out ouput_seq_001-100.blastx_output<br>

blastx -db /path_to_db01 -infile input_seq_101-200 -out ouput_seq_101-200.blastx_output<br>

etc to<br>

blasts -db /path_to_db01 -infile input_seq_401-499 -out ouput_seq_401-499.blastx_output<br>

<br>

Folder B:<br>

blastx -db /path_to_db02 -infile input_seq_501-600 -out ouput_seq_501-600.blastx_output<br>

blastx -db /path_to_db02 -infile input_seq_601-700 -out ouput_seq_601-700.blastx_output<br>

etc<br>

blastx -db /path_to_db02 -infile input_seq_901-1000 -out ouput_seq_901-1000.blastx_output<br>

<br>

On a side note how is BLASTX from BLAST+ package compared MPI-BLAST? I thought MPI-BLAST is based on the older version of BLAST hence it might return fewer results. This is our major concern as I am going for tabular output format with all sequence titles and information (-outfmt 6 salltitles) This will be helpful for filtering the viral genome for by using some simple grep -w filtering techniques for the contigs.<br>

<br>

Also there is some interesting points about using xargs to parallelize BLAST+ (the last example): <a href="https://www.biostars.org/p/76009/" target="_blank">https://www.biostars.org/p/76009/</a> Has anyone tried this?<br>

<br>

Thank you Prash for the recommendation for mpich. Its definitely interesting on how it works.  My mentor and I are trying to accomplish this on  a 32 Thread Workstation (Intel Xeon E5-2640v2 (16 cores)) with 128 GB of RAM for Viral Genome that I am planning on using BLASTX across the Viral refseq Protein sequences from NCBI.<br>

<br>

Thank you Dr. Beall. If you don&#39;t mind sharing, I would definitely be interested in taking look and trying to see how the script is like. Many thanks.  If I am able to successfully hack the script, I am more than willing to share it with rest of the community.<br>

<br>

Thank you again Andreas, Tim, Tony, Dr. Beall, and Prash. I really appreciate all the suggestions and help.<br>

<br>

Kind regards,<br>

<br>

Zain<br>

<br>

________________________________________<br>

From: Tony Travis &lt;<a href="mailto:tony.travis@minke-informatics.co.uk">tony.travis@minke-informatics.co.uk</a>&gt;<br>

Sent: Tuesday, May 5, 2015 12:19 PM<br>

To: <a href="mailto:bio-linux@nebclists.nerc.ac.uk">bio-linux@nebclists.nerc.ac.uk</a><br>

<span class="im HOEnZb">Subject: Re: [Bio-Linux] Blasting Multiple Fasta Files<br>

<br>

</span><div class="HOEnZb"><div class="h5">On 05/05/15 16:08, Tim Booth wrote:<br>

&gt; [...]<br>

&gt; You want to run:<br>

&gt;<br>

&gt; blastx -db foo -infile seqs_000000_to_000999.fsa -out seqs_000000_to_000999.blastx<br>

&gt; ...then...<br>

&gt; blastx -db foo -infile seqs_001000_to_001999.fsa -out seqs_001000_to_001999.blastx<br>

&gt; ...then...<br>

&gt; blastx -db foo -infile seqs_002000_to_002999.fsa -out seqs_002000_to_002999.blastx<br>

&gt; ...then...<br>

&gt; blastx -db foo -infile seqs_003000_to_003999.fsa -out seqs_003000_to_003999.blastx<br>

&gt; ...etc<br>

&gt; [...]<br>

<br>

Hi, Tim.<br>

<br>

It&#39;s not good to run multiple instances of BLAST on the same machine<br>

because each invocation of BLAST will have a copy of the same database<br>

stored in memory. MPI-BLAST avoids this by loading different parts of<br>

the database into each worker process.<br>

<br>

The time-consuming part of BLAST is the initial exact word match and<br>

both the old and new versions of BLAST allow you to specify how many<br>

threads to run to speed this up:<br>

<br>

  BLAST  uses &quot;-a nn&quot;<br>

  BLAST+ uses &quot;-num_threads nn&quot;<br>

<br>

I compared &quot;blastall&quot;, &quot;blastn&quot;, &quot;blat&quot;, &quot;pblat&quot; and &quot;bowtie&quot; for<br>

mapping microRNA and mRNA to a custom database in:<br>

<br>

Travis, A. J., Moody, J., Helwak, A., Tollervey, D., &amp; Kudla, G. (2013).<br>

Hyb: A bioinformatics pipeline for the analysis of CLASH (crosslinking,<br>

ligation and sequencing of hybrids) data. Methods (San Diego, Calif.).<br>

<a href="http://doi.org/10.1016/j.ymeth.2013.10.015" target="_blank">http://doi.org/10.1016/j.ymeth.2013.10.015</a><br>

<br>

[&quot;pblat&quot; is a parallel/multi-threaded version of BLAT]<br>

<br>

You will need a script like this one by Jonathan Moody to convert<br>

&quot;bowtie2&quot; alignments to equivalent tabular BLAST output:<br>

<br>

  <a href="https://github.com/gkudla/hyb/blob/master/bin/sam2blast" target="_blank">https://github.com/gkudla/hyb/blob/master/bin/sam2blast</a><br>

<br>

Bye,<br>

<br>

  Tony.<br>

<br>

--<br>

Minke Informatics Limited, Registered in Scotland - Company No. SC419028<br>

Registered Office: 3 Donview, Bridge of Alford, AB33 8QJ, Scotland (UK)<br>

tel. <a href="tel:%2B44%280%2919755%2063548" value="+441975563548">+44(0)19755 63548</a>                    <a href="http://minke-informatics.co.uk" target="_blank">http://minke-informatics.co.uk</a><br>

mob. <a href="tel:%2B44%280%297985%20078324" value="+447985078324">+44(0)7985 078324</a>        mailto:<a href="mailto:tony.travis@minke-informatics.co.uk">tony.travis@minke-informatics.co.uk</a><br>

_______________________________________________<br>

Bio-Linux mailing list<br>

<a href="mailto:Bio-Linux@nebclists.nerc.ac.uk">Bio-Linux@nebclists.nerc.ac.uk</a><br>

<a href="http://nebclists.nerc.ac.uk/mailman/listinfo/bio-linux" target="_blank">http://nebclists.nerc.ac.uk/mailman/listinfo/bio-linux</a><br>

_______________________________________________<br>

Bio-Linux mailing list<br>

<a href="mailto:Bio-Linux@nebclists.nerc.ac.uk">Bio-Linux@nebclists.nerc.ac.uk</a><br>

<a href="http://nebclists.nerc.ac.uk/mailman/listinfo/bio-linux" target="_blank">http://nebclists.nerc.ac.uk/mailman/listinfo/bio-linux</a><br>

</div></div></blockquote></div><br><br clear="all"><div><br></div>-- <br><div class="gmail_signature">-- <br>Martin Gollery<br>Senior Bioinformatics Scientist<br>Tahoe Informatics<br><a href="http://www.bioinformaticist.biz" target="_blank">www.bioinformaticist.biz</a><br><a href="http://www.hiddenmarkovmodels.com" target="_blank">www.hiddenmarkovmodels.com</a><br><br></div>

</div>