<html><head><meta http-equiv="Content-Type" content="text/html charset=utf-8"></head><body style="word-wrap: break-word; -webkit-nbsp-mode: space; -webkit-line-break: after-white-space;" class="">I have a bash script, written by a previous colleague, that splits up queries then generates blast commands and parallelizes them through xargs.<div class=""><br class=""></div><div class="">It does speed up the process a lot, depending on how many cores you have.</div><div class=""><br class=""></div><div class="">It would require some hacking for your use case since the splitting is kind of idiosyncratic, it’s doing a nucleotide blast, and we then post-process the blast results which you would not need.</div><div class=""><br class=""></div><div class="">So you might be better off starting from scratch but let me know if you want to take a look at it.</div><div class=""><br class=""></div><div class=""><br class=""></div><div class=""><br class=""><div class="">

<div style="color: rgb(0, 0, 0); letter-spacing: normal; orphans: auto; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; widows: auto; word-spacing: 0px; -webkit-text-stroke-width: 0px; word-wrap: break-word; -webkit-nbsp-mode: space; -webkit-line-break: after-white-space;" class=""><div class="">Clifford Beall, PhD, MSc</div><div class=""><a href="mailto:cliffbeall@gmail.com" class="">cliffbeall@gmail.com</a></div><div class=""><a href="mailto:beall.3@osu.edu" class="">beall.3@osu.edu</a></div><div class="">Research Assistant Professor</div><div class="">Division of Biosciences</div><div class="">Ohio State U. College of Dentistry</div></div><br class="Apple-interchange-newline"><br class="Apple-interchange-newline">

</div>

<br class=""><div><blockquote type="cite" class=""><div class=""><br class=""></div><div class="">Message: 4<br class="">Date: Tue, 5 May 2015 16:54:59 +0200<br class="">From: Andreas Leimbach &lt;<a href="mailto:aleimba@gwdg.de" class="">aleimba@gwdg.de</a>&gt;<br class="">To: Bio-Linux help and discussion &lt;<a href="mailto:bio-linux@nebclists.nerc.ac.uk" class="">bio-linux@nebclists.nerc.ac.uk</a>&gt;<br class="">Subject: Re: [Bio-Linux] Blasting Multiple Fasta Files<br class="">Message-ID: &lt;<a href="mailto:5548D9C3.7050501@gwdg.de" class="">5548D9C3.7050501@gwdg.de</a>&gt;<br class="">Content-Type: text/plain; charset="windows-1252"<br class=""><br class="">Hey,<br class=""><br class="">blast+ is not parallelized all that well. Thus, you might want to try<br class="">GNU parallel to speed up your calculations somewhat, depending on your<br class="">machine. Here are some links:<br class=""><br class=""><a href="https://www.biostars.org/p/63816/" class="">https://www.biostars.org/p/63816/</a><br class="">https://www.biostars.org/p/76009/<br class=""><br class="">Cheers,<br class="">Andreas<br class=""><br class=""><br class="">Andreas Leimbach<br class="">Universit?t M?nster<br class="">Institut f?r Hygiene<br class="">Mendelstr. 7<br class="">D-48149 M?nster<br class="">Germany<br class=""><br class="">Tel.: +49 (0)551 39 33843<br class="">E-Mail: aleimba@gwdg.de<br class=""><br class="">On 05.05.2015 16:31, Zain A Alvi wrote:<br class=""><blockquote type="cite" class="">Hi Marty,<br class=""><br class="">I apologize for the confusion. I am splitting a fasta file that contains approximately 100,000 fasta sequences to 100 fasta files that contains 1000 sequences each. &nbsp;I am hoping this will expedite the BLASTx process.<br class=""><br class=""><br class="">Kind regards,<br class=""><br class=""><br class="">Zain<br class=""><br class="">________________________________<br class="">From: Martin Gollery &lt;mgollery@unr.edu&gt;<br class="">Sent: Tuesday, May 5, 2015 10:23 AM<br class="">To: Bio-Linux help and discussion<br class="">Subject: Re: [Bio-Linux] Blasting Multiple Fasta Files<br class=""><br class="">Running a million BLASTX jobs on one sequence each is not going to save you time. It is better to run one BLASTX job on a million sequences.<br class=""><br class="">-Marty<br class=""><br class=""><br class=""><br class="">On Tue, May 5, 2015 at 7:00 AM, Zain A Alvi &lt;zain.alvi@student.shu.edu&lt;mailto:zain.alvi@student.shu.edu&gt;&gt; wrote:<br class=""><br class="">Dear Sir or Madam,<br class=""><br class=""><br class="">I hope everything is well. I have downloaded all the viral protein sequences from the NCBI refseq database using their script from their E-book. &nbsp;I have de-novo assembled some viral genomes and I know BLASTX takes a long time if the fasta is large. &nbsp;I have been able to split the large fasta file based on an user specified contig number in each new fasta file.<br class=""><br class=""><br class="">I was wondering is there a method to run BLASTX automatically on each of the fasta files one at a time so that it will be able to complete in a "shorter" amount of time as compared to BLASTing the whole large de-novo assembled fasta file. &nbsp;Then I was hoping to concatenate all the results into one file.<br class=""><br class=""><br class="">Sincerely,<br class=""><br class=""><br class="">Zain<br class=""><br class="">_______________________________________________<br class="">Bio-Linux mailing list<br class="">Bio-Linux@nebclists.nerc.ac.uk&lt;mailto:Bio-Linux@nebclists.nerc.ac.uk&gt;<br class="">http://nebclists.nerc.ac.uk/mailman/listinfo/bio-linux<br class=""><br class=""><br class=""><br class=""><br class="">--<br class="">--<br class="">Martin Gollery<br class="">Senior Bioinformatics Scientist<br class="">Tahoe Informatics<br class="">www.bioinformaticist.biz&lt;http://www.bioinformaticist.biz&gt;<br class="">www.hiddenmarkovmodels.com&lt;http://www.hiddenmarkovmodels.com&gt;<br class=""><br class=""><br class=""><br class=""><br class="">_______________________________________________<br class="">Bio-Linux mailing list<br class="">Bio-Linux@nebclists.nerc.ac.uk<br class="">http://nebclists.nerc.ac.uk/mailman/listinfo/bio-linux<br class=""><br class=""></blockquote><br class=""><br class="">------------------------------<br class=""><br class="">Subject: Digest Footer<br class=""><br class="">_______________________________________________<br class="">Bio-Linux mailing list<br class="">Bio-Linux@nebclists.nerc.ac.uk<br class="">http://nebclists.nerc.ac.uk/mailman/listinfo/bio-linux<br class=""><br class=""><br class="">------------------------------<br class=""><br class="">End of Bio-Linux Digest, Vol 80, Issue 3<br class="">****************************************<br class=""></div></blockquote></div><br class=""></div></body></html>