From graham.thomas.mres at gmail.com Mon May 4 05:50:24 2015 From: graham.thomas.mres at gmail.com (Graham Thomas) Date: Mon, 4 May 2015 10:50:24 +0100 Subject: [Bio-Linux] hmmer tutorial help Message-ID: Hi All, I am just getting started with biolinux. I am trying to follow the 'HMMER users guide (Eddy 2003)' tutorial (pg 20). First command is; > hmmbuild globin.hmm globins50.msf But I get this msg; > Alignment input open failed. couldn't determine alignment input format while reading file globins50.msf The only online reference to this problem I could find was from laurelslabnotebook.blogspot.co.uk ; "At first I was aligning the sequences with ClustalOmega and trying to put the sequences in .msf format, as many HMMER tutorials show. However, I could not for the life of me figure out what was wrong and kept getting the following error: hmmbuild vno.hmm vno_trainerSeqs.msf Alignment input open failed. couldn't determine alignment input format while reading file vno_trainerSeqs.msf So I gave up on .msf" Can anyone advise on what causes this problem and how to fix it? Many thanks Graham -------------- next part -------------- An HTML attachment was scrubbed... URL: From tbooth at ceh.ac.uk Tue May 5 05:11:14 2015 From: tbooth at ceh.ac.uk (Tim Booth) Date: Tue, 5 May 2015 10:11:14 +0100 Subject: [Bio-Linux] hmmer tutorial help In-Reply-To: References: Message-ID: <1430817074.6885.10.camel@wllt1771.nerc-wallingford.ac.uk> Hi Graham, This tutorial is out of date and refers to hmmer2. hmmer2 is still available on Bio-Linux for backwards compatibility but you have to call it explicitly - this works: $ hmm2build globin.hmm globins50.msf And similarly add a 2 to other command names. But I'd recommend that you are best off working with HMMER3. It's much improved over version 2 and there is a newer tutorial: ftp://selab.janelia.org/pub/software/hmmer/CURRENT/Userguide.pdf We really do need to update the documentation for tools on Bio-Linux, I've just been too short of time to work through it all. Cheers, TIM On Mon, 2015-05-04 at 10:50 +0100, Graham Thomas wrote: > Hi All, > > I am just getting started with biolinux. > > I am trying to follow the 'HMMER users guide (Eddy 2003)' tutorial (pg > 20). > First command is; > > > hmmbuild globin.hmm globins50.msf > > But I get this msg; > > > Alignment input open failed. > couldn't determine alignment input format > while reading file globins50.msf > > The only online reference to this problem I could find was from > laurelslabnotebook.blogspot.co.uk ; > > "At first I was aligning the sequences with ClustalOmega and trying to > put the sequences in .msf format, as many HMMER tutorials show. > However, I could not for the life of me figure out what was wrong and > kept getting the following error: > hmmbuild vno.hmm vno_trainerSeqs.msf > Alignment input open failed. > couldn't determine alignment input format > while reading file vno_trainerSeqs.msf > So I gave up on .msf" > > Can anyone advise on what causes this problem and how to fix it? > > Many thanks > Graham -- Tim Booth NERC Environmental Bioinformatics Centre Centre for Ecology and Hydrology Maclean Bldg, Benson Lane Crowmarsh Gifford Wallingford, England OX10 8BB http://environmentalomics.org/bio-linux +44 1491 69 2297 From zain.alvi at student.shu.edu Tue May 5 10:00:54 2015 From: zain.alvi at student.shu.edu (Zain A Alvi) Date: Tue, 5 May 2015 14:00:54 +0000 Subject: [Bio-Linux] Blasting Multiple Fasta Files Message-ID: <1430834454858.90974@student.shu.edu> Dear Sir or Madam, I hope everything is well. I have downloaded all the viral protein sequences from the NCBI refseq database using their script from their E-book. I have de-novo assembled some viral genomes and I know BLASTX takes a long time if the fasta is large. I have been able to split the large fasta file based on an user specified contig number in each new fasta file. I was wondering is there a method to run BLASTX automatically on each of the fasta files one at a time so that it will be able to complete in a "shorter" amount of time as compared to BLASTing the whole large de-novo assembled fasta file. Then I was hoping to concatenate all the results into one file. Sincerely, Zain -------------- next part -------------- An HTML attachment was scrubbed... URL: From mgollery at unr.edu Tue May 5 10:23:39 2015 From: mgollery at unr.edu (Martin Gollery) Date: Tue, 5 May 2015 07:23:39 -0700 Subject: [Bio-Linux] Blasting Multiple Fasta Files In-Reply-To: <1430834454858.90974@student.shu.edu> References: <1430834454858.90974@student.shu.edu> Message-ID: Running a million BLASTX jobs on one sequence each is not going to save you time. It is better to run one BLASTX job on a million sequences. -Marty On Tue, May 5, 2015 at 7:00 AM, Zain A Alvi wrote: > Dear Sir or Madam, > > > I hope everything is well. I have downloaded all the viral protein > sequences from the NCBI refseq database using their script from their > E-book. I have de-novo assembled some viral genomes and I know BLASTX > takes a long time if the fasta is large. I have been able to split the > large fasta file based on an user specified contig number in each new fasta > file. > > > I was wondering is there a method to run BLASTX automatically on each of > the fasta files one at a time so that it will be able to complete in a > "shorter" amount of time as compared to BLASTing the whole large de-novo > assembled fasta file. Then I was hoping to concatenate all the results > into one file. > > > Sincerely, > > > Zain > > _______________________________________________ > Bio-Linux mailing list > Bio-Linux at nebclists.nerc.ac.uk > http://nebclists.nerc.ac.uk/mailman/listinfo/bio-linux > > -- -- Martin Gollery Senior Bioinformatics Scientist Tahoe Informatics www.bioinformaticist.biz www.hiddenmarkovmodels.com -------------- next part -------------- An HTML attachment was scrubbed... URL: From zain.alvi at student.shu.edu Tue May 5 10:31:14 2015 From: zain.alvi at student.shu.edu (Zain A Alvi) Date: Tue, 5 May 2015 14:31:14 +0000 Subject: [Bio-Linux] Blasting Multiple Fasta Files In-Reply-To: References: <1430834454858.90974@student.shu.edu>, Message-ID: <1430836275034.45926@student.shu.edu> Hi Marty, I apologize for the confusion. I am splitting a fasta file that contains approximately 100,000 fasta sequences to 100 fasta files that contains 1000 sequences each. I am hoping this will expedite the BLASTx process. Kind regards, Zain ________________________________ From: Martin Gollery Sent: Tuesday, May 5, 2015 10:23 AM To: Bio-Linux help and discussion Subject: Re: [Bio-Linux] Blasting Multiple Fasta Files Running a million BLASTX jobs on one sequence each is not going to save you time. It is better to run one BLASTX job on a million sequences. -Marty On Tue, May 5, 2015 at 7:00 AM, Zain A Alvi > wrote: Dear Sir or Madam, I hope everything is well. I have downloaded all the viral protein sequences from the NCBI refseq database using their script from their E-book. I have de-novo assembled some viral genomes and I know BLASTX takes a long time if the fasta is large. I have been able to split the large fasta file based on an user specified contig number in each new fasta file. I was wondering is there a method to run BLASTX automatically on each of the fasta files one at a time so that it will be able to complete in a "shorter" amount of time as compared to BLASTing the whole large de-novo assembled fasta file. Then I was hoping to concatenate all the results into one file. Sincerely, Zain _______________________________________________ Bio-Linux mailing list Bio-Linux at nebclists.nerc.ac.uk http://nebclists.nerc.ac.uk/mailman/listinfo/bio-linux -- -- Martin Gollery Senior Bioinformatics Scientist Tahoe Informatics www.bioinformaticist.biz www.hiddenmarkovmodels.com -------------- next part -------------- An HTML attachment was scrubbed... URL: From aleimba at gwdg.de Tue May 5 10:54:59 2015 From: aleimba at gwdg.de (Andreas Leimbach) Date: Tue, 5 May 2015 16:54:59 +0200 Subject: [Bio-Linux] Blasting Multiple Fasta Files In-Reply-To: <1430836275034.45926@student.shu.edu> References: <1430834454858.90974@student.shu.edu>, <1430836275034.45926@student.shu.edu> Message-ID: <5548D9C3.7050501@gwdg.de> Hey, blast+ is not parallelized all that well. Thus, you might want to try GNU parallel to speed up your calculations somewhat, depending on your machine. Here are some links: https://www.biostars.org/p/63816/ https://www.biostars.org/p/76009/ Cheers, Andreas Andreas Leimbach Universit?t M?nster Institut f?r Hygiene Mendelstr. 7 D-48149 M?nster Germany Tel.: +49 (0)551 39 33843 E-Mail: aleimba at gwdg.de On 05.05.2015 16:31, Zain A Alvi wrote: > Hi Marty, > > I apologize for the confusion. I am splitting a fasta file that contains approximately 100,000 fasta sequences to 100 fasta files that contains 1000 sequences each. I am hoping this will expedite the BLASTx process. > > > Kind regards, > > > Zain > > ________________________________ > From: Martin Gollery > Sent: Tuesday, May 5, 2015 10:23 AM > To: Bio-Linux help and discussion > Subject: Re: [Bio-Linux] Blasting Multiple Fasta Files > > Running a million BLASTX jobs on one sequence each is not going to save you time. It is better to run one BLASTX job on a million sequences. > > -Marty > > > > On Tue, May 5, 2015 at 7:00 AM, Zain A Alvi > wrote: > > Dear Sir or Madam, > > > I hope everything is well. I have downloaded all the viral protein sequences from the NCBI refseq database using their script from their E-book. I have de-novo assembled some viral genomes and I know BLASTX takes a long time if the fasta is large. I have been able to split the large fasta file based on an user specified contig number in each new fasta file. > > > I was wondering is there a method to run BLASTX automatically on each of the fasta files one at a time so that it will be able to complete in a "shorter" amount of time as compared to BLASTing the whole large de-novo assembled fasta file. Then I was hoping to concatenate all the results into one file. > > > Sincerely, > > > Zain > > _______________________________________________ > Bio-Linux mailing list > Bio-Linux at nebclists.nerc.ac.uk > http://nebclists.nerc.ac.uk/mailman/listinfo/bio-linux > > > > > -- > -- > Martin Gollery > Senior Bioinformatics Scientist > Tahoe Informatics > www.bioinformaticist.biz > www.hiddenmarkovmodels.com > > > > > _______________________________________________ > Bio-Linux mailing list > Bio-Linux at nebclists.nerc.ac.uk > http://nebclists.nerc.ac.uk/mailman/listinfo/bio-linux > From prash at bioclues.org Tue May 5 10:57:28 2015 From: prash at bioclues.org (Prash) Date: Tue, 5 May 2015 16:57:28 +0200 Subject: [Bio-Linux] Blasting Multiple Fasta Files In-Reply-To: <1430836275034.45926@student.shu.edu> References: <1430834454858.90974@student.shu.edu> <1430836275034.45926@student.shu.edu> Message-ID: Dear Zain It would still take time. Should you use queuing or mpich, it should make your tasks done easy. Above all, it all depends on how good the configuration is. Regards Prash On Tuesday, May 5, 2015, Zain A Alvi wrote: > Hi Marty, > > I apologize for the confusion. I am splitting a fasta file that contains > approximately 100,000 fasta sequences to 100 fasta files that contains > 1000 sequences each. I am hoping this will expedite the BLASTx process. > > > Kind regards, > > > Zain > ------------------------------ > *From:* Martin Gollery > > *Sent:* Tuesday, May 5, 2015 10:23 AM > *To:* Bio-Linux help and discussion > *Subject:* Re: [Bio-Linux] Blasting Multiple Fasta Files > > Running a million BLASTX jobs on one sequence each is not going to save > you time. It is better to run one BLASTX job on a million sequences. > > -Marty > > > On Tue, May 5, 2015 at 7:00 AM, Zain A Alvi > wrote: > >> Dear Sir or Madam, >> >> >> I hope everything is well. I have downloaded all the viral protein >> sequences from the NCBI refseq database using their script from their >> E-book. I have de-novo assembled some viral genomes and I know BLASTX >> takes a long time if the fasta is large. I have been able to split the >> large fasta file based on an user specified contig number in each new fasta >> file. >> >> >> I was wondering is there a method to run BLASTX automatically on each >> of the fasta files one at a time so that it will be able to complete in a >> "shorter" amount of time as compared to BLASTing the whole large de-novo >> assembled fasta file. Then I was hoping to concatenate all the results >> into one file. >> >> >> Sincerely, >> >> >> Zain >> >> _______________________________________________ >> Bio-Linux mailing list >> Bio-Linux at nebclists.nerc.ac.uk >> >> http://nebclists.nerc.ac.uk/mailman/listinfo/bio-linux >> >> > > > -- > -- > Martin Gollery > Senior Bioinformatics Scientist > Tahoe Informatics > www.bioinformaticist.biz > www.hiddenmarkovmodels.com > > -- Sent from iPad Mini -------------- next part -------------- An HTML attachment was scrubbed... URL: From tbooth at ceh.ac.uk Tue May 5 11:08:20 2015 From: tbooth at ceh.ac.uk (Tim Booth) Date: Tue, 5 May 2015 16:08:20 +0100 Subject: [Bio-Linux] Blasting Multiple Fasta Files In-Reply-To: <1430836275034.45926@student.shu.edu> References: <1430834454858.90974@student.shu.edu> , <1430836275034.45926@student.shu.edu> Message-ID: <1430838500.6885.41.camel@wllt1771.nerc-wallingford.ac.uk> Hi Zain, So, I think you are saying that if you have a directory of files like this: seqs_000000_to_000999.fsa seqs_001000_to_001999.fsa seqs_002000_to_002999.fsa seqs_003000_to_003999.fsa ...etc You want to run: blastx -db foo -infile seqs_000000_to_000999.fsa -out seqs_000000_to_000999.blastx ...then... blastx -db foo -infile seqs_001000_to_001999.fsa -out seqs_001000_to_001999.blastx ...then... blastx -db foo -infile seqs_002000_to_002999.fsa -out seqs_002000_to_002999.blastx ...then... blastx -db foo -infile seqs_003000_to_003999.fsa -out seqs_003000_to_003999.blastx ...etc This can be done with a shell loop. The tricky bit is generating the output file name: $ for f in *.fasta ; do > outname=$(basename $f .fasta).blastx > blastx -db foo -query $f -out $outname > done A nifty way of running jobs like this is with 'parallel' which is pre-installed on Bio-Linux 8 and can run multiple jobs at once and even send them to other remote machines for you. Here's the basic invocation (yes, it's a bit cryptic - it's based on the xargs tool): $ ls *.fasta | parallel --res out blastx -db foo -query Then to see what files were outputted: $ find out -name stdout Hope that helps. (Just before sending this, I see that Andreas recommended parallel too!) TIM On Tue, 2015-05-05 at 15:31 +0100, Zain A Alvi wrote: > Hi Marty, > > > I apologize for the confusion. I am splitting a fasta file that > contains approximately 100,000 fasta sequences to 100 fasta files that > contains 1000 sequences each. I am hoping this will expedite the > BLASTx process. > > > Kind regards, > > > > Zain > > > > ______________________________________________________________________ > From: Martin Gollery > Sent: Tuesday, May 5, 2015 10:23 AM > To: Bio-Linux help and discussion > Subject: Re: [Bio-Linux] Blasting Multiple Fasta Files > > Running a million BLASTX jobs on one sequence each is not going to > save you time. It is better to run one BLASTX job on a million > sequences. > > > -Marty > > > > > On Tue, May 5, 2015 at 7:00 AM, Zain A Alvi > wrote: > Dear Sir or Madam, > > > > I hope everything is well. I have downloaded all the viral > protein sequences from the NCBI refseq database using > their script from their E-book. I have de-novo assembled some > viral genomes and I know BLASTX takes a long time if the fasta > is large. I have been able to split the large fasta file > based on an user specified contig number in each new fasta > file. > > > I was wondering is there a method to run BLASTX automatically > on each of the fasta files one at a time so that it will be > able to complete in a "shorter" amount of time as compared to > BLASTing the whole large de-novo assembled fasta file. Then I > was hoping to concatenate all the results into one file. > > > > Sincerely, > > > > Zain > > > > > _______________________________________________ > Bio-Linux mailing list > Bio-Linux at nebclists.nerc.ac.uk > http://nebclists.nerc.ac.uk/mailman/listinfo/bio-linux > > > > > -- Tim Booth NERC Environmental Bioinformatics Centre Centre for Ecology and Hydrology Maclean Bldg, Benson Lane Crowmarsh Gifford Wallingford, England OX10 8BB http://environmentalomics.org/bio-linux +44 1491 69 2297 From cliffbeall at gmail.com Tue May 5 11:17:10 2015 From: cliffbeall at gmail.com (Clifford Beall) Date: Tue, 5 May 2015 11:17:10 -0400 Subject: [Bio-Linux] Blasting Multiple Fasta Files In-Reply-To: References: Message-ID: <01C56676-0E51-4954-8D92-0F56044B060B@gmail.com> I have a bash script, written by a previous colleague, that splits up queries then generates blast commands and parallelizes them through xargs. It does speed up the process a lot, depending on how many cores you have. It would require some hacking for your use case since the splitting is kind of idiosyncratic, it?s doing a nucleotide blast, and we then post-process the blast results which you would not need. So you might be better off starting from scratch but let me know if you want to take a look at it. Clifford Beall, PhD, MSc cliffbeall at gmail.com beall.3 at osu.edu Research Assistant Professor Division of Biosciences Ohio State U. College of Dentistry > > Message: 4 > Date: Tue, 5 May 2015 16:54:59 +0200 > From: Andreas Leimbach > To: Bio-Linux help and discussion > Subject: Re: [Bio-Linux] Blasting Multiple Fasta Files > Message-ID: <5548D9C3.7050501 at gwdg.de> > Content-Type: text/plain; charset="windows-1252" > > Hey, > > blast+ is not parallelized all that well. Thus, you might want to try > GNU parallel to speed up your calculations somewhat, depending on your > machine. Here are some links: > > https://www.biostars.org/p/63816/ > https://www.biostars.org/p/76009/ > > Cheers, > Andreas > > > Andreas Leimbach > Universit?t M?nster > Institut f?r Hygiene > Mendelstr. 7 > D-48149 M?nster > Germany > > Tel.: +49 (0)551 39 33843 > E-Mail: aleimba at gwdg.de > > On 05.05.2015 16:31, Zain A Alvi wrote: >> Hi Marty, >> >> I apologize for the confusion. I am splitting a fasta file that contains approximately 100,000 fasta sequences to 100 fasta files that contains 1000 sequences each. I am hoping this will expedite the BLASTx process. >> >> >> Kind regards, >> >> >> Zain >> >> ________________________________ >> From: Martin Gollery >> Sent: Tuesday, May 5, 2015 10:23 AM >> To: Bio-Linux help and discussion >> Subject: Re: [Bio-Linux] Blasting Multiple Fasta Files >> >> Running a million BLASTX jobs on one sequence each is not going to save you time. It is better to run one BLASTX job on a million sequences. >> >> -Marty >> >> >> >> On Tue, May 5, 2015 at 7:00 AM, Zain A Alvi > wrote: >> >> Dear Sir or Madam, >> >> >> I hope everything is well. I have downloaded all the viral protein sequences from the NCBI refseq database using their script from their E-book. I have de-novo assembled some viral genomes and I know BLASTX takes a long time if the fasta is large. I have been able to split the large fasta file based on an user specified contig number in each new fasta file. >> >> >> I was wondering is there a method to run BLASTX automatically on each of the fasta files one at a time so that it will be able to complete in a "shorter" amount of time as compared to BLASTing the whole large de-novo assembled fasta file. Then I was hoping to concatenate all the results into one file. >> >> >> Sincerely, >> >> >> Zain >> >> _______________________________________________ >> Bio-Linux mailing list >> Bio-Linux at nebclists.nerc.ac.uk >> http://nebclists.nerc.ac.uk/mailman/listinfo/bio-linux >> >> >> >> >> -- >> -- >> Martin Gollery >> Senior Bioinformatics Scientist >> Tahoe Informatics >> www.bioinformaticist.biz >> www.hiddenmarkovmodels.com >> >> >> >> >> _______________________________________________ >> Bio-Linux mailing list >> Bio-Linux at nebclists.nerc.ac.uk >> http://nebclists.nerc.ac.uk/mailman/listinfo/bio-linux >> > > > ------------------------------ > > Subject: Digest Footer > > _______________________________________________ > Bio-Linux mailing list > Bio-Linux at nebclists.nerc.ac.uk > http://nebclists.nerc.ac.uk/mailman/listinfo/bio-linux > > > ------------------------------ > > End of Bio-Linux Digest, Vol 80, Issue 3 > **************************************** -------------- next part -------------- An HTML attachment was scrubbed... URL: From tony.travis at minke-informatics.co.uk Tue May 5 12:19:05 2015 From: tony.travis at minke-informatics.co.uk (Tony Travis) Date: Tue, 5 May 2015 17:19:05 +0100 Subject: [Bio-Linux] Blasting Multiple Fasta Files In-Reply-To: <1430838500.6885.41.camel@wllt1771.nerc-wallingford.ac.uk> References: <1430834454858.90974@student.shu.edu> , <1430836275034.45926@student.shu.edu> <1430838500.6885.41.camel@wllt1771.nerc-wallingford.ac.uk> Message-ID: <5548ED79.4040401@minke-informatics.co.uk> On 05/05/15 16:08, Tim Booth wrote: > [...] > You want to run: > > blastx -db foo -infile seqs_000000_to_000999.fsa -out seqs_000000_to_000999.blastx > ...then... > blastx -db foo -infile seqs_001000_to_001999.fsa -out seqs_001000_to_001999.blastx > ...then... > blastx -db foo -infile seqs_002000_to_002999.fsa -out seqs_002000_to_002999.blastx > ...then... > blastx -db foo -infile seqs_003000_to_003999.fsa -out seqs_003000_to_003999.blastx > ...etc > [...] Hi, Tim. It's not good to run multiple instances of BLAST on the same machine because each invocation of BLAST will have a copy of the same database stored in memory. MPI-BLAST avoids this by loading different parts of the database into each worker process. The time-consuming part of BLAST is the initial exact word match and both the old and new versions of BLAST allow you to specify how many threads to run to speed this up: BLAST uses "-a nn" BLAST+ uses "-num_threads nn" I compared "blastall", "blastn", "blat", "pblat" and "bowtie" for mapping microRNA and mRNA to a custom database in: Travis, A. J., Moody, J., Helwak, A., Tollervey, D., & Kudla, G. (2013). Hyb: A bioinformatics pipeline for the analysis of CLASH (crosslinking, ligation and sequencing of hybrids) data. Methods (San Diego, Calif.). http://doi.org/10.1016/j.ymeth.2013.10.015 ["pblat" is a parallel/multi-threaded version of BLAT] You will need a script like this one by Jonathan Moody to convert "bowtie2" alignments to equivalent tabular BLAST output: https://github.com/gkudla/hyb/blob/master/bin/sam2blast Bye, Tony. -- Minke Informatics Limited, Registered in Scotland - Company No. SC419028 Registered Office: 3 Donview, Bridge of Alford, AB33 8QJ, Scotland (UK) tel. +44(0)19755 63548 http://minke-informatics.co.uk mob. +44(0)7985 078324 mailto:tony.travis at minke-informatics.co.uk From zain.alvi at student.shu.edu Wed May 6 00:33:28 2015 From: zain.alvi at student.shu.edu (Zain A Alvi) Date: Wed, 6 May 2015 04:33:28 +0000 Subject: [Bio-Linux] Blasting Multiple Fasta Files In-Reply-To: <5548ED79.4040401@minke-informatics.co.uk> References: <1430834454858.90974@student.shu.edu> , <1430836275034.45926@student.shu.edu> <1430838500.6885.41.camel@wllt1771.nerc-wallingford.ac.uk>, <5548ED79.4040401@minke-informatics.co.uk> Message-ID: <1430886807815.56649@student.shu.edu> Hi Everyone, Thank you for all the great and helpful recommendations, especially Tim, Tony, Dr. Beall, and Andreas. I am trying to do exactly what Tim has showed and having BLASTx run on each fasta file one at a time, but not at the same time. It should go through each fasta file one at a time and do BLASTX and then move onto next fasta file until there are no fasta files left in the folder. Would something like this work as well: for input in *.fa; do -blastx -db /path_to_db -query $input -out $input.blastx_output; done Then concantentate all *.blastx_output > Final_BlastxOutput.blastx_output Thank you for the very interesting information about parallel on Bio Linux. Would parallel work well for de-novo assemblers like Velvet and Spades (as examples)? Especially Velvet after reading about: https://www.biostars.org/p/86907/ Also would creating multiple databases of the same database with a different name/title. Will that go around the problem of accessing the same database and memory problems when trying to run multiple BLASTx. I know it is not recommended, would this quasi method be any beneficial to do. Should I just stick with the script above or the script that Tim kindly shared? For example: Folder A blastx -db /path_to_db01 -infile input_seq_001-100 -out ouput_seq_001-100.blastx_output blastx -db /path_to_db01 -infile input_seq_101-200 -out ouput_seq_101-200.blastx_output etc to blasts -db /path_to_db01 -infile input_seq_401-499 -out ouput_seq_401-499.blastx_output Folder B: blastx -db /path_to_db02 -infile input_seq_501-600 -out ouput_seq_501-600.blastx_output blastx -db /path_to_db02 -infile input_seq_601-700 -out ouput_seq_601-700.blastx_output etc blastx -db /path_to_db02 -infile input_seq_901-1000 -out ouput_seq_901-1000.blastx_output On a side note how is BLASTX from BLAST+ package compared MPI-BLAST? I thought MPI-BLAST is based on the older version of BLAST hence it might return fewer results. This is our major concern as I am going for tabular output format with all sequence titles and information (-outfmt 6 salltitles) This will be helpful for filtering the viral genome for by using some simple grep -w filtering techniques for the contigs. Also there is some interesting points about using xargs to parallelize BLAST+ (the last example): https://www.biostars.org/p/76009/ Has anyone tried this? Thank you Prash for the recommendation for mpich. Its definitely interesting on how it works. My mentor and I are trying to accomplish this on a 32 Thread Workstation (Intel Xeon E5-2640v2 (16 cores)) with 128 GB of RAM for Viral Genome that I am planning on using BLASTX across the Viral refseq Protein sequences from NCBI. Thank you Dr. Beall. If you don't mind sharing, I would definitely be interested in taking look and trying to see how the script is like. Many thanks. If I am able to successfully hack the script, I am more than willing to share it with rest of the community. Thank you again Andreas, Tim, Tony, Dr. Beall, and Prash. I really appreciate all the suggestions and help. Kind regards, Zain ________________________________________ From: Tony Travis Sent: Tuesday, May 5, 2015 12:19 PM To: bio-linux at nebclists.nerc.ac.uk Subject: Re: [Bio-Linux] Blasting Multiple Fasta Files On 05/05/15 16:08, Tim Booth wrote: > [...] > You want to run: > > blastx -db foo -infile seqs_000000_to_000999.fsa -out seqs_000000_to_000999.blastx > ...then... > blastx -db foo -infile seqs_001000_to_001999.fsa -out seqs_001000_to_001999.blastx > ...then... > blastx -db foo -infile seqs_002000_to_002999.fsa -out seqs_002000_to_002999.blastx > ...then... > blastx -db foo -infile seqs_003000_to_003999.fsa -out seqs_003000_to_003999.blastx > ...etc > [...] Hi, Tim. It's not good to run multiple instances of BLAST on the same machine because each invocation of BLAST will have a copy of the same database stored in memory. MPI-BLAST avoids this by loading different parts of the database into each worker process. The time-consuming part of BLAST is the initial exact word match and both the old and new versions of BLAST allow you to specify how many threads to run to speed this up: BLAST uses "-a nn" BLAST+ uses "-num_threads nn" I compared "blastall", "blastn", "blat", "pblat" and "bowtie" for mapping microRNA and mRNA to a custom database in: Travis, A. J., Moody, J., Helwak, A., Tollervey, D., & Kudla, G. (2013). Hyb: A bioinformatics pipeline for the analysis of CLASH (crosslinking, ligation and sequencing of hybrids) data. Methods (San Diego, Calif.). http://doi.org/10.1016/j.ymeth.2013.10.015 ["pblat" is a parallel/multi-threaded version of BLAT] You will need a script like this one by Jonathan Moody to convert "bowtie2" alignments to equivalent tabular BLAST output: https://github.com/gkudla/hyb/blob/master/bin/sam2blast Bye, Tony. -- Minke Informatics Limited, Registered in Scotland - Company No. SC419028 Registered Office: 3 Donview, Bridge of Alford, AB33 8QJ, Scotland (UK) tel. +44(0)19755 63548 http://minke-informatics.co.uk mob. +44(0)7985 078324 mailto:tony.travis at minke-informatics.co.uk _______________________________________________ Bio-Linux mailing list Bio-Linux at nebclists.nerc.ac.uk http://nebclists.nerc.ac.uk/mailman/listinfo/bio-linux From mgollery at unr.edu Wed May 6 00:37:55 2015 From: mgollery at unr.edu (Martin Gollery) Date: Tue, 5 May 2015 21:37:55 -0700 Subject: [Bio-Linux] Blasting Multiple Fasta Files In-Reply-To: <1430886807815.56649@student.shu.edu> References: <1430834454858.90974@student.shu.edu> <1430836275034.45926@student.shu.edu> <1430838500.6885.41.camel@wllt1771.nerc-wallingford.ac.uk> <5548ED79.4040401@minke-informatics.co.uk> <1430886807815.56649@student.shu.edu> Message-ID: Be sure to include -num_threads I still think that this will be slower overall, but it will be interesting to hear your results! Marty On Tue, May 5, 2015 at 9:33 PM, Zain A Alvi wrote: > Hi Everyone, > > Thank you for all the great and helpful recommendations, especially Tim, > Tony, Dr. Beall, and Andreas. I am trying to do exactly what Tim has > showed and having BLASTx run on each fasta file one at a time, but not at > the same time. It should go through each fasta file one at a time and do > BLASTX and then move onto next fasta file until there are no fasta files > left in the folder. > > Would something like this work as well: > > for input in *.fa; do -blastx -db /path_to_db -query $input -out > $input.blastx_output; done > > Then concantentate all *.blastx_output > Final_BlastxOutput.blastx_output > > Thank you for the very interesting information about parallel on Bio > Linux. Would parallel work well for de-novo assemblers like Velvet and > Spades (as examples)? Especially Velvet after reading about: > https://www.biostars.org/p/86907/ > > Also would creating multiple databases of the same database with a > different name/title. Will that go around the problem of accessing the same > database and memory problems when trying to run multiple BLASTx. I know it > is not recommended, would this quasi method be any beneficial to do. Should > I just stick with the script above or the script that Tim kindly shared? > > For example: > > Folder A > blastx -db /path_to_db01 -infile input_seq_001-100 -out > ouput_seq_001-100.blastx_output > blastx -db /path_to_db01 -infile input_seq_101-200 -out > ouput_seq_101-200.blastx_output > etc to > blasts -db /path_to_db01 -infile input_seq_401-499 -out > ouput_seq_401-499.blastx_output > > Folder B: > blastx -db /path_to_db02 -infile input_seq_501-600 -out > ouput_seq_501-600.blastx_output > blastx -db /path_to_db02 -infile input_seq_601-700 -out > ouput_seq_601-700.blastx_output > etc > blastx -db /path_to_db02 -infile input_seq_901-1000 -out > ouput_seq_901-1000.blastx_output > > On a side note how is BLASTX from BLAST+ package compared MPI-BLAST? I > thought MPI-BLAST is based on the older version of BLAST hence it might > return fewer results. This is our major concern as I am going for tabular > output format with all sequence titles and information (-outfmt 6 > salltitles) This will be helpful for filtering the viral genome for by > using some simple grep -w filtering techniques for the contigs. > > Also there is some interesting points about using xargs to parallelize > BLAST+ (the last example): https://www.biostars.org/p/76009/ Has anyone > tried this? > > Thank you Prash for the recommendation for mpich. Its definitely > interesting on how it works. My mentor and I are trying to accomplish this > on a 32 Thread Workstation (Intel Xeon E5-2640v2 (16 cores)) with 128 GB > of RAM for Viral Genome that I am planning on using BLASTX across the Viral > refseq Protein sequences from NCBI. > > Thank you Dr. Beall. If you don't mind sharing, I would definitely be > interested in taking look and trying to see how the script is like. Many > thanks. If I am able to successfully hack the script, I am more than > willing to share it with rest of the community. > > Thank you again Andreas, Tim, Tony, Dr. Beall, and Prash. I really > appreciate all the suggestions and help. > > Kind regards, > > Zain > > ________________________________________ > From: Tony Travis > Sent: Tuesday, May 5, 2015 12:19 PM > To: bio-linux at nebclists.nerc.ac.uk > Subject: Re: [Bio-Linux] Blasting Multiple Fasta Files > > On 05/05/15 16:08, Tim Booth wrote: > > [...] > > You want to run: > > > > blastx -db foo -infile seqs_000000_to_000999.fsa -out > seqs_000000_to_000999.blastx > > ...then... > > blastx -db foo -infile seqs_001000_to_001999.fsa -out > seqs_001000_to_001999.blastx > > ...then... > > blastx -db foo -infile seqs_002000_to_002999.fsa -out > seqs_002000_to_002999.blastx > > ...then... > > blastx -db foo -infile seqs_003000_to_003999.fsa -out > seqs_003000_to_003999.blastx > > ...etc > > [...] > > Hi, Tim. > > It's not good to run multiple instances of BLAST on the same machine > because each invocation of BLAST will have a copy of the same database > stored in memory. MPI-BLAST avoids this by loading different parts of > the database into each worker process. > > The time-consuming part of BLAST is the initial exact word match and > both the old and new versions of BLAST allow you to specify how many > threads to run to speed this up: > > BLAST uses "-a nn" > BLAST+ uses "-num_threads nn" > > I compared "blastall", "blastn", "blat", "pblat" and "bowtie" for > mapping microRNA and mRNA to a custom database in: > > Travis, A. J., Moody, J., Helwak, A., Tollervey, D., & Kudla, G. (2013). > Hyb: A bioinformatics pipeline for the analysis of CLASH (crosslinking, > ligation and sequencing of hybrids) data. Methods (San Diego, Calif.). > http://doi.org/10.1016/j.ymeth.2013.10.015 > > ["pblat" is a parallel/multi-threaded version of BLAT] > > You will need a script like this one by Jonathan Moody to convert > "bowtie2" alignments to equivalent tabular BLAST output: > > https://github.com/gkudla/hyb/blob/master/bin/sam2blast > > Bye, > > Tony. > > -- > Minke Informatics Limited, Registered in Scotland - Company No. SC419028 > Registered Office: 3 Donview, Bridge of Alford, AB33 8QJ, Scotland (UK) > tel. +44(0)19755 63548 http://minke-informatics.co.uk > mob. +44(0)7985 078324 mailto:tony.travis at minke-informatics.co.uk > _______________________________________________ > Bio-Linux mailing list > Bio-Linux at nebclists.nerc.ac.uk > http://nebclists.nerc.ac.uk/mailman/listinfo/bio-linux > _______________________________________________ > Bio-Linux mailing list > Bio-Linux at nebclists.nerc.ac.uk > http://nebclists.nerc.ac.uk/mailman/listinfo/bio-linux > -- -- Martin Gollery Senior Bioinformatics Scientist Tahoe Informatics www.bioinformaticist.biz www.hiddenmarkovmodels.com -------------- next part -------------- An HTML attachment was scrubbed... URL: From aleimba at gwdg.de Wed May 6 03:58:39 2015 From: aleimba at gwdg.de (Andreas Leimbach) Date: Wed, 6 May 2015 09:58:39 +0200 Subject: [Bio-Linux] Blasting Multiple Fasta Files In-Reply-To: <1430886807815.56649@student.shu.edu> References: <1430834454858.90974@student.shu.edu> , <1430836275034.45926@student.shu.edu> <1430838500.6885.41.camel@wllt1771.nerc-wallingford.ac.uk>, <5548ED79.4040401@minke-informatics.co.uk> <1430886807815.56649@student.shu.edu> Message-ID: <5549C9AF.9030409@gwdg.de> Hi, if you lose the "-" before blastx it will work: for input in *.fa; do blastx -db /path_to_db -query $input -out $input.blastx_output; done And as Tony/Martin recommended you should really use '-num_threads'. The blast+ routines should be faster than legacy blast and you want the extra output option anyway. Still parallel will be faster than the loop. Having different databases won't make a difference, all will be held in memory anyway. I don't think you can run a single assembly through parallel. The assembler has to look at the whole data. Anyway, assembly algorithms are designed for parallel thread usage anyway, they all have an *option* how many threads you want to use (in the case of velvet through OpenMP). For Illumina data I'd recommend SPAdes, it has a nice workflow (including error correction etc.) and thus is quite user-friendly. The xargs example won't give you anything that parallel can't do. mpiBLAST is mainly meant for clustered computers (i.e. several servers being used for a single program run). IMO, it won't give you a speed advantage on a single computer with several cores in comparison to the aforementioned possibilities. HTH, Andreas -- Andreas Leimbach Universit?t M?nster Institut f?r Hygiene Mendelstr. 7 D-48149 M?nster Germany Tel.: +49 (0)551 39 33843 E-Mail: aleimba at gwdg.de On 06.05.2015 06:33, Zain A Alvi wrote: > Hi Everyone, > > Thank you for all the great and helpful recommendations, especially Tim, Tony, Dr. Beall, and Andreas. I am trying to do exactly what Tim has showed and having BLASTx run on each fasta file one at a time, but not at the same time. It should go through each fasta file one at a time and do BLASTX and then move onto next fasta file until there are no fasta files left in the folder. > > Would something like this work as well: > > for input in *.fa; do -blastx -db /path_to_db -query $input -out $input.blastx_output; done > > Then concantentate all *.blastx_output > Final_BlastxOutput.blastx_output > > Thank you for the very interesting information about parallel on Bio Linux. Would parallel work well for de-novo assemblers like Velvet and Spades (as examples)? Especially Velvet after reading about: https://www.biostars.org/p/86907/ > > Also would creating multiple databases of the same database with a different name/title. Will that go around the problem of accessing the same database and memory problems when trying to run multiple BLASTx. I know it is not recommended, would this quasi method be any beneficial to do. Should I just stick with the script above or the script that Tim kindly shared? > > For example: > > Folder A > blastx -db /path_to_db01 -infile input_seq_001-100 -out ouput_seq_001-100.blastx_output > blastx -db /path_to_db01 -infile input_seq_101-200 -out ouput_seq_101-200.blastx_output > etc to > blasts -db /path_to_db01 -infile input_seq_401-499 -out ouput_seq_401-499.blastx_output > > Folder B: > blastx -db /path_to_db02 -infile input_seq_501-600 -out ouput_seq_501-600.blastx_output > blastx -db /path_to_db02 -infile input_seq_601-700 -out ouput_seq_601-700.blastx_output > etc > blastx -db /path_to_db02 -infile input_seq_901-1000 -out ouput_seq_901-1000.blastx_output > > On a side note how is BLASTX from BLAST+ package compared MPI-BLAST? I thought MPI-BLAST is based on the older version of BLAST hence it might return fewer results. This is our major concern as I am going for tabular output format with all sequence titles and information (-outfmt 6 salltitles) This will be helpful for filtering the viral genome for by using some simple grep -w filtering techniques for the contigs. > > Also there is some interesting points about using xargs to parallelize BLAST+ (the last example): https://www.biostars.org/p/76009/ Has anyone tried this? > > Thank you Prash for the recommendation for mpich. Its definitely interesting on how it works. My mentor and I are trying to accomplish this on a 32 Thread Workstation (Intel Xeon E5-2640v2 (16 cores)) with 128 GB of RAM for Viral Genome that I am planning on using BLASTX across the Viral refseq Protein sequences from NCBI. > > Thank you Dr. Beall. If you don't mind sharing, I would definitely be interested in taking look and trying to see how the script is like. Many thanks. If I am able to successfully hack the script, I am more than willing to share it with rest of the community. > > Thank you again Andreas, Tim, Tony, Dr. Beall, and Prash. I really appreciate all the suggestions and help. > > Kind regards, > > Zain > > ________________________________________ > From: Tony Travis > Sent: Tuesday, May 5, 2015 12:19 PM > To: bio-linux at nebclists.nerc.ac.uk > Subject: Re: [Bio-Linux] Blasting Multiple Fasta Files > > On 05/05/15 16:08, Tim Booth wrote: >> [...] >> You want to run: >> >> blastx -db foo -infile seqs_000000_to_000999.fsa -out seqs_000000_to_000999.blastx >> ...then... >> blastx -db foo -infile seqs_001000_to_001999.fsa -out seqs_001000_to_001999.blastx >> ...then... >> blastx -db foo -infile seqs_002000_to_002999.fsa -out seqs_002000_to_002999.blastx >> ...then... >> blastx -db foo -infile seqs_003000_to_003999.fsa -out seqs_003000_to_003999.blastx >> ...etc >> [...] > > Hi, Tim. > > It's not good to run multiple instances of BLAST on the same machine > because each invocation of BLAST will have a copy of the same database > stored in memory. MPI-BLAST avoids this by loading different parts of > the database into each worker process. > > The time-consuming part of BLAST is the initial exact word match and > both the old and new versions of BLAST allow you to specify how many > threads to run to speed this up: > > BLAST uses "-a nn" > BLAST+ uses "-num_threads nn" > > I compared "blastall", "blastn", "blat", "pblat" and "bowtie" for > mapping microRNA and mRNA to a custom database in: > > Travis, A. J., Moody, J., Helwak, A., Tollervey, D., & Kudla, G. (2013). > Hyb: A bioinformatics pipeline for the analysis of CLASH (crosslinking, > ligation and sequencing of hybrids) data. Methods (San Diego, Calif.). > http://doi.org/10.1016/j.ymeth.2013.10.015 > > ["pblat" is a parallel/multi-threaded version of BLAT] > > You will need a script like this one by Jonathan Moody to convert > "bowtie2" alignments to equivalent tabular BLAST output: > > https://github.com/gkudla/hyb/blob/master/bin/sam2blast > > Bye, > > Tony. > > -- > Minke Informatics Limited, Registered in Scotland - Company No. SC419028 > Registered Office: 3 Donview, Bridge of Alford, AB33 8QJ, Scotland (UK) > tel. +44(0)19755 63548 http://minke-informatics.co.uk > mob. +44(0)7985 078324 mailto:tony.travis at minke-informatics.co.uk > _______________________________________________ > Bio-Linux mailing list > Bio-Linux at nebclists.nerc.ac.uk > http://nebclists.nerc.ac.uk/mailman/listinfo/bio-linux > _______________________________________________ > Bio-Linux mailing list > Bio-Linux at nebclists.nerc.ac.uk > http://nebclists.nerc.ac.uk/mailman/listinfo/bio-linux > From zain.alvi at student.shu.edu Wed May 6 12:16:02 2015 From: zain.alvi at student.shu.edu (Zain A Alvi) Date: Wed, 6 May 2015 16:16:02 +0000 Subject: [Bio-Linux] Blasting Multiple Fasta Files In-Reply-To: <5549C9AF.9030409@gwdg.de> References: <1430834454858.90974@student.shu.edu> , <1430836275034.45926@student.shu.edu> <1430838500.6885.41.camel@wllt1771.nerc-wallingford.ac.uk>, <5548ED79.4040401@minke-informatics.co.uk> <1430886807815.56649@student.shu.edu>,<5549C9AF.9030409@gwdg.de> Message-ID: <1430928961635.59008@student.shu.edu> Hi Everyone, Thank you for the great explanation Andreas. I apologize for my typological mistake with '-' in front of the blastx. So would the two options would be something like this The Script route: for input in *.fa; do blastx -db /path_to_db -query $input -num_threads 30 -evalue 0.001 -outputfmt 6 salltitles -out $input.blastx_output; done If I go for the parallel route, I have never tried it before: cat input.fa | time parallel -j+0 --eta --progress --block 100 --recstart '>' --pipe blastx -evalue 0.001 -outfmt 6 salltitles -db path_to_db -query - > final_results.blastx_output This will break the sequence into 100 sequences. How would I use -j+0 to make sure it only uses 30 of the 32 threads? Currently, the -j+0 will use all 32 threads. Would something like -j+2 will work? I saw this in the parallel GNU videos on youtube: https://www.youtube.com/watch?v=OpaiGYxkSuQ&list=PL284C9FF2488BC6D1&index=1&spfreload=10 and command breakdown that from the biostar link shared by Martin, Andreas, and Tim: https://www.biostars.org/p/63816/ But the worrisome part is about the parallel losing sequences here: http://seqanswers.com/forums/showthread.php?t=48879 Has anyone here experienced this? Would I still use -num_threads with parallel? I have never used parallel before. Hence all these questions and trying to self teach myself the tools. In the second option with parallel as kindly shared by Tim, which I slightly modified to what I am hoping to do. ls *.fasta | time parallel -j+0 --eta --progress --res out blastx -evalue 0.001 -outfmt 6 salltitles -db path_to_db -query >Then to see what files were outputted: >$ find out -name stdout I was wondering what does --res after parallel indicates? Will there be an easier method concatenate all the files by giving them some endings, but where would that be? Would that be something like this? ls *.fasta | time parallel -j+0 --eta --progress --res out.blastx_output blastx -evalue 0.001 -outfmt 6 salltitles -db path_to_db -query Then concatenate the *.blastx_output to final_results.blastx_output On a smaller note, I received zsh command not found when I typed parallel --version. Do I need reinstall parallel or do I need to add the location of where parallel is pre-installed in ./zshrc? Where is this location? I have checked usr/bin and there is no parallel, but there are files for parallel-fasts and parallel-fastq files. Sorry for all these novice questions. I am trying to teach myself all these tools and strategies such as parallel. Many thanks to everyone. I sincerely appreciate it. Kind regards, Zain ________________________________________ From: Andreas Leimbach Sent: Wednesday, May 6, 2015 3:58 AM To: Bio-Linux help and discussion Subject: Re: [Bio-Linux] Blasting Multiple Fasta Files Hi, if you lose the "-" before blastx it will work: for input in *.fa; do blastx -db /path_to_db -query $input -out $input.blastx_output; done And as Tony/Martin recommended you should really use '-num_threads'. The blast+ routines should be faster than legacy blast and you want the extra output option anyway. Still parallel will be faster than the loop. Having different databases won't make a difference, all will be held in memory anyway. I don't think you can run a single assembly through parallel. The assembler has to look at the whole data. Anyway, assembly algorithms are designed for parallel thread usage anyway, they all have an *option* how many threads you want to use (in the case of velvet through OpenMP). For Illumina data I'd recommend SPAdes, it has a nice workflow (including error correction etc.) and thus is quite user-friendly. The xargs example won't give you anything that parallel can't do. mpiBLAST is mainly meant for clustered computers (i.e. several servers being used for a single program run). IMO, it won't give you a speed advantage on a single computer with several cores in comparison to the aforementioned possibilities. HTH, Andreas -- Andreas Leimbach Universit?t M?nster Institut f?r Hygiene Mendelstr. 7 D-48149 M?nster Germany Tel.: +49 (0)551 39 33843 E-Mail: aleimba at gwdg.de On 06.05.2015 06:33, Zain A Alvi wrote: > Hi Everyone, > > Thank you for all the great and helpful recommendations, especially Tim, Tony, Dr. Beall, and Andreas. I am trying to do exactly what Tim has showed and having BLASTx run on each fasta file one at a time, but not at the same time. It should go through each fasta file one at a time and do BLASTX and then move onto next fasta file until there are no fasta files left in the folder. > > Would something like this work as well: > > for input in *.fa; do -blastx -db /path_to_db -query $input -out $input.blastx_output; done > > Then concantentate all *.blastx_output > Final_BlastxOutput.blastx_output > > Thank you for the very interesting information about parallel on Bio Linux. Would parallel work well for de-novo assemblers like Velvet and Spades (as examples)? Especially Velvet after reading about: https://www.biostars.org/p/86907/ > > Also would creating multiple databases of the same database with a different name/title. Will that go around the problem of accessing the same database and memory problems when trying to run multiple BLASTx. I know it is not recommended, would this quasi method be any beneficial to do. Should I just stick with the script above or the script that Tim kindly shared? > > For example: > > Folder A > blastx -db /path_to_db01 -infile input_seq_001-100 -out ouput_seq_001-100.blastx_output > blastx -db /path_to_db01 -infile input_seq_101-200 -out ouput_seq_101-200.blastx_output > etc to > blasts -db /path_to_db01 -infile input_seq_401-499 -out ouput_seq_401-499.blastx_output > > Folder B: > blastx -db /path_to_db02 -infile input_seq_501-600 -out ouput_seq_501-600.blastx_output > blastx -db /path_to_db02 -infile input_seq_601-700 -out ouput_seq_601-700.blastx_output > etc > blastx -db /path_to_db02 -infile input_seq_901-1000 -out ouput_seq_901-1000.blastx_output > > On a side note how is BLASTX from BLAST+ package compared MPI-BLAST? I thought MPI-BLAST is based on the older version of BLAST hence it might return fewer results. This is our major concern as I am going for tabular output format with all sequence titles and information (-outfmt 6 salltitles) This will be helpful for filtering the viral genome for by using some simple grep -w filtering techniques for the contigs. > > Also there is some interesting points about using xargs to parallelize BLAST+ (the last example): https://www.biostars.org/p/76009/ Has anyone tried this? > > Thank you Prash for the recommendation for mpich. Its definitely interesting on how it works. My mentor and I are trying to accomplish this on a 32 Thread Workstation (Intel Xeon E5-2640v2 (16 cores)) with 128 GB of RAM for Viral Genome that I am planning on using BLASTX across the Viral refseq Protein sequences from NCBI. > > Thank you Dr. Beall. If you don't mind sharing, I would definitely be interested in taking look and trying to see how the script is like. Many thanks. If I am able to successfully hack the script, I am more than willing to share it with rest of the community. > > Thank you again Andreas, Tim, Tony, Dr. Beall, and Prash. I really appreciate all the suggestions and help. > > Kind regards, > > Zain > > ________________________________________ > From: Tony Travis > Sent: Tuesday, May 5, 2015 12:19 PM > To: bio-linux at nebclists.nerc.ac.uk > Subject: Re: [Bio-Linux] Blasting Multiple Fasta Files > > On 05/05/15 16:08, Tim Booth wrote: >> [...] >> You want to run: >> >> blastx -db foo -infile seqs_000000_to_000999.fsa -out seqs_000000_to_000999.blastx >> ...then... >> blastx -db foo -infile seqs_001000_to_001999.fsa -out seqs_001000_to_001999.blastx >> ...then... >> blastx -db foo -infile seqs_002000_to_002999.fsa -out seqs_002000_to_002999.blastx >> ...then... >> blastx -db foo -infile seqs_003000_to_003999.fsa -out seqs_003000_to_003999.blastx >> ...etc >> [...] > > Hi, Tim. > > It's not good to run multiple instances of BLAST on the same machine > because each invocation of BLAST will have a copy of the same database > stored in memory. MPI-BLAST avoids this by loading different parts of > the database into each worker process. > > The time-consuming part of BLAST is the initial exact word match and > both the old and new versions of BLAST allow you to specify how many > threads to run to speed this up: > > BLAST uses "-a nn" > BLAST+ uses "-num_threads nn" > > I compared "blastall", "blastn", "blat", "pblat" and "bowtie" for > mapping microRNA and mRNA to a custom database in: > > Travis, A. J., Moody, J., Helwak, A., Tollervey, D., & Kudla, G. (2013). > Hyb: A bioinformatics pipeline for the analysis of CLASH (crosslinking, > ligation and sequencing of hybrids) data. Methods (San Diego, Calif.). > http://doi.org/10.1016/j.ymeth.2013.10.015 > > ["pblat" is a parallel/multi-threaded version of BLAT] > > You will need a script like this one by Jonathan Moody to convert > "bowtie2" alignments to equivalent tabular BLAST output: > > https://github.com/gkudla/hyb/blob/master/bin/sam2blast > > Bye, > > Tony. > > -- > Minke Informatics Limited, Registered in Scotland - Company No. SC419028 > Registered Office: 3 Donview, Bridge of Alford, AB33 8QJ, Scotland (UK) > tel. +44(0)19755 63548 http://minke-informatics.co.uk > mob. +44(0)7985 078324 mailto:tony.travis at minke-informatics.co.uk > _______________________________________________ > Bio-Linux mailing list > Bio-Linux at nebclists.nerc.ac.uk > http://nebclists.nerc.ac.uk/mailman/listinfo/bio-linux > _______________________________________________ > Bio-Linux mailing list > Bio-Linux at nebclists.nerc.ac.uk > http://nebclists.nerc.ac.uk/mailman/listinfo/bio-linux > _______________________________________________ Bio-Linux mailing list Bio-Linux at nebclists.nerc.ac.uk http://nebclists.nerc.ac.uk/mailman/listinfo/bio-linux From simon at calmblue.net Tue May 12 09:51:29 2015 From: simon at calmblue.net (Simon) Date: Tue, 12 May 2015 14:51:29 +0100 Subject: [Bio-Linux] Bio-Linux 8 Repositories Message-ID: <55520561.2050009@calmblue.net> Hi, I have a server with Ubuntu 14.04 LTS installed and would like to add the Bio-Linux repositories for Bio-Linux 8. I can see instructions on the Bio-Linux website for adding the Bio-Linux 7 repos to Ubuntu 12.04 but not for Bio-Linux 8 with 14.04. Could someone point me in the right direction or let me know the URLs I need to add to apt's configuration in order to install Bio-Linux 8's packages on a base Ubuntu 14.04 install. Many thanks, Simon From tbooth at ceh.ac.uk Tue May 12 11:02:54 2015 From: tbooth at ceh.ac.uk (Tim Booth) Date: Tue, 12 May 2015 16:02:54 +0100 Subject: [Bio-Linux] Bio-Linux 8 Repositories In-Reply-To: <55520561.2050009@calmblue.net> References: <55520561.2050009@calmblue.net> Message-ID: <1431442974.16741.13.camel@wllt1771.nerc-wallingford.ac.uk> Hi Simon, The instructions for setting up APT stuff got quite complex since there are multiple repositories and associated GPG keys to handle. Most of the goodies are in the PPA: https://launchpad.net/~nebc/+archive/ubuntu/bio-linux/+packages You can add just this repo and the packages it provides should work fine, but if you really want all of Bio-Linux then you should run the upgrade8.sh script as detailed here (section 4): http://environmentalomics.org/bio-linux-installation ...and trust to my scripting to set APT up properly to pull all the packages. (If you don't trust my scripting and want to inspect it, run wget -qO- http://nebc.nerc.ac.uk/downloads/bl8_only/upgrade8.sh | env UNPACK_ONLY=1 sh and look at the comments in upgrade_to_8.sh in the temporary folder it creates) Hope that helps, TIM On Tue, 2015-05-12 at 14:51 +0100, Simon wrote: > Hi, > > I have a server with Ubuntu 14.04 LTS installed and would like to add > the Bio-Linux repositories for Bio-Linux 8. > > I can see instructions on the Bio-Linux website for adding the Bio-Linux > 7 repos to Ubuntu 12.04 but not for Bio-Linux 8 with 14.04. > > Could someone point me in the right direction or let me know the URLs I > need to add to apt's configuration in order to install Bio-Linux 8's > packages on a base Ubuntu 14.04 install. > > Many thanks, > > Simon > _______________________________________________ > Bio-Linux mailing list > Bio-Linux at nebclists.nerc.ac.uk > http://nebclists.nerc.ac.uk/mailman/listinfo/bio-linux -- Tim Booth NERC Environmental Bioinformatics Centre Centre for Ecology and Hydrology Maclean Bldg, Benson Lane Crowmarsh Gifford Wallingford, England OX10 8BB http://environmentalomics.org/bio-linux +44 1491 69 2297 From simon at calmblue.net Wed May 13 10:01:07 2015 From: simon at calmblue.net (Simon) Date: Wed, 13 May 2015 15:01:07 +0100 Subject: [Bio-Linux] Bio-Linux 8 Repositories In-Reply-To: <1431442974.16741.13.camel@wllt1771.nerc-wallingford.ac.uk> References: <55520561.2050009@calmblue.net> <1431442974.16741.13.camel@wllt1771.nerc-wallingford.ac.uk> Message-ID: <55535923.2020903@calmblue.net> Hi Tim, that's brilliant, thanks. Script looks good to me, although I think the PPA will be best for our situation as we're not running a GUI and the less we vary from our standard server build the better. On 12/05/15 16:02, Tim Booth wrote: > Hi Simon, > > The instructions for setting up APT stuff got quite complex since there > are multiple repositories and associated GPG keys to handle. > > Most of the goodies are in the PPA: > > https://launchpad.net/~nebc/+archive/ubuntu/bio-linux/+packages > > You can add just this repo and the packages it provides should work > fine, but if you really want all of Bio-Linux then you should run the > upgrade8.sh script as detailed here (section 4): > > http://environmentalomics.org/bio-linux-installation > > ...and trust to my scripting to set APT up properly to pull all the > packages. > > (If you don't trust my scripting and want to inspect it, run > wget -qO- http://nebc.nerc.ac.uk/downloads/bl8_only/upgrade8.sh | env UNPACK_ONLY=1 sh > and look at the comments in upgrade_to_8.sh in the temporary folder it > creates) > > Hope that helps, > > TIM > > On Tue, 2015-05-12 at 14:51 +0100, Simon wrote: >> Hi, >> >> I have a server with Ubuntu 14.04 LTS installed and would like to add >> the Bio-Linux repositories for Bio-Linux 8. >> >> I can see instructions on the Bio-Linux website for adding the Bio-Linux >> 7 repos to Ubuntu 12.04 but not for Bio-Linux 8 with 14.04. >> >> Could someone point me in the right direction or let me know the URLs I >> need to add to apt's configuration in order to install Bio-Linux 8's >> packages on a base Ubuntu 14.04 install. >> >> Many thanks, >> >> Simon >> _______________________________________________ >> Bio-Linux mailing list >> Bio-Linux at nebclists.nerc.ac.uk >> http://nebclists.nerc.ac.uk/mailman/listinfo/bio-linux From tony.travis at minke-informatics.co.uk Wed May 13 10:14:08 2015 From: tony.travis at minke-informatics.co.uk (Tony Travis) Date: Wed, 13 May 2015 15:14:08 +0100 Subject: [Bio-Linux] Bio-Linux 8 Repositories In-Reply-To: <55535923.2020903@calmblue.net> References: <55520561.2050009@calmblue.net> <1431442974.16741.13.camel@wllt1771.nerc-wallingford.ac.uk> <55535923.2020903@calmblue.net> Message-ID: <55535C30.2010604@minke-informatics.co.uk> On 13/05/15 15:01, Simon wrote: > Hi Tim, that's brilliant, thanks. > > Script looks good to me, although I think the PPA will be best for our > situation as we're not running a GUI and the less we vary from our > standard server build the better. Hi, Simon. I've run Bio-Linux 8 as a terminal server on many servers. You need most of the Ubuntu 'desktop' installed to run a remote MATE desktop. I don't think it's an issue to run a GUI on a server, but you might find it useful to use the MATE desktop locally if the server graphics hardware is not up to running Unity. The overhead of running a GUI on the server is quite low, especially if you run the MATE desktop locally. The default remote desktop is MATE. HTH, Tony. -- Minke Informatics Limited, Registered in Scotland - Company No. SC419028 Registered Office: 3 Donview, Bridge of Alford, AB33 8QJ, Scotland (UK) tel. +44(0)19755 63548 http://minke-informatics.co.uk mob. +44(0)7985 078324 mailto:tony.travis at minke-informatics.co.uk From simon at calmblue.net Wed May 13 10:24:13 2015 From: simon at calmblue.net (Simon) Date: Wed, 13 May 2015 15:24:13 +0100 Subject: [Bio-Linux] Bio-Linux 8 Repositories In-Reply-To: <55535C30.2010604@minke-informatics.co.uk> References: <55520561.2050009@calmblue.net> <1431442974.16741.13.camel@wllt1771.nerc-wallingford.ac.uk> <55535923.2020903@calmblue.net> <55535C30.2010604@minke-informatics.co.uk> Message-ID: <55535E8D.2070402@calmblue.net> On 13/05/15 15:14, Tony Travis wrote: > On 13/05/15 15:01, Simon wrote: >> Hi Tim, that's brilliant, thanks. >> >> Script looks good to me, although I think the PPA will be best for our >> situation as we're not running a GUI and the less we vary from our >> standard server build the better. > Hi, Simon. > > I've run Bio-Linux 8 as a terminal server on many servers. You need most > of the Ubuntu 'desktop' installed to run a remote MATE desktop. > > I don't think it's an issue to run a GUI on a server, but you might find > it useful to use the MATE desktop locally if the server graphics > hardware is not up to running Unity. > > The overhead of running a GUI on the server is quite low, especially if > you run the MATE desktop locally. The default remote desktop is MATE. > > HTH, > > Tony. > Good to know, thanks, I like MATE. If it turns out that a desktop is required then I'll definitely go down that route. From raonyguimaraes at gmail.com Wed May 13 19:19:57 2015 From: raonyguimaraes at gmail.com (=?ISO-8859-1?Q?Raony_Guimaraes_Corr=EAa_Do_Carmo_Lisboa_Cardenas?=) Date: Wed, 13 May 2015 20:19:57 -0300 Subject: [Bio-Linux] Bio-Linux 8 Repositories In-Reply-To: <55535E8D.2070402@calmblue.net> References: <55520561.2050009@calmblue.net> <1431442974.16741.13.camel@wllt1771.nerc-wallingford.ac.uk> <55535923.2020903@calmblue.net> <55535C30.2010604@minke-informatics.co.uk> <55535E8D.2070402@calmblue.net> Message-ID: Hi Tim, Thank you for sharing this command: wget -qO- http://nebc.nerc.ac.uk/downloads/bl8_only/upgrade8.sh | env UNPACK_ONLY=1 sh Now I can finally see what you are doing in file "upgrade_to_8.sh" to transform Ubuntu 14.04 LTS into Biolinux 8. I recently upgraded my system to Ubuntu 15.04 Wily Werewolf (Kernel 4.1rc2) and even after that most of the packages from Biolinux 8 repositories are still working pretty well. I think there is a conflict between the old mate packages from Biolinux in launchpad and the fresh new mate packages from the official Ubuntu repositories. Keep up the good work! Interesting to see the packages in pseudo_orphans.txt as well! Here are a two bugs I'm receiving while running your script: *This one, probably because of the change from upstart to systemd:* initctl: Unable to connect to Upstart: Failed to connect to socket /com/ubuntu/upstart: Connection refused insserv: warning: script 'galaxy' missing LSB tags and overrides insserv: Default-Start undefined, assuming empty start runlevel(s) for script `galaxy' insserv: Default-Stop undefined, assuming empty stop runlevel(s) for script `galaxy' *And this second one:* E: Unable to locate package bio-linux-jalview Not all packages installed properly - exiting. I believe it's something related with the package "jalview" from biolinux repo. I could install this package from Ubuntu sources without a problem: http://archive.ubuntu.com/ubuntu/ wily/universe jalview all 2.7.dfsg-4 [3.382 kB] *Now I have a question, after changing the name of the package in file "bl_master_package_list.txt" from **bio-linux-jalview to * *jalview, how I could "pack" everything again to the file upgrade8.sh or in which order I should execute your scripts bl_install_master_list.sh, pick_cran_mirror.py and upgrade_to_8.sh in order to finally complete the upgrade to Biolinux 8 ?* I believe I could help this project by packaging some new software to add to the biolinux repositories or even trying to upgrade a package from python2 to python3. Let me know how I could help. :) Kind Regards, _____________________________________________ Raony Guimar?es Corr?a Do Carmo Lisboa Cardenas PhD Student in Bioinformatics email: raonyguimaraes at gmail.com skype/gtalk: raonyguimaraes phone: +55 31 93404152 Laboratory of Clinical Genomics UFMG School of Medicine Federal University of Minas Gerais - UFMG Av. Prof. Alfredo Balena, 190, Sala 321 Belo Horizonte, Brazil 30130-100 _____________________________________________ On Wed, May 13, 2015 at 11:24 AM, Simon wrote: > On 13/05/15 15:14, Tony Travis wrote: > >> On 13/05/15 15:01, Simon wrote: >> >>> Hi Tim, that's brilliant, thanks. >>> >>> Script looks good to me, although I think the PPA will be best for our >>> situation as we're not running a GUI and the less we vary from our >>> standard server build the better. >>> >> Hi, Simon. >> >> I've run Bio-Linux 8 as a terminal server on many servers. You need most >> of the Ubuntu 'desktop' installed to run a remote MATE desktop. >> >> I don't think it's an issue to run a GUI on a server, but you might find >> it useful to use the MATE desktop locally if the server graphics >> hardware is not up to running Unity. >> >> The overhead of running a GUI on the server is quite low, especially if >> you run the MATE desktop locally. The default remote desktop is MATE. >> >> HTH, >> >> Tony. >> >> > Good to know, thanks, I like MATE. If it turns out that a desktop is > required then I'll definitely go down that route. > > _______________________________________________ > Bio-Linux mailing list > Bio-Linux at nebclists.nerc.ac.uk > http://nebclists.nerc.ac.uk/mailman/listinfo/bio-linux > -------------- next part -------------- An HTML attachment was scrubbed... URL: From tbooth at ceh.ac.uk Thu May 14 11:47:54 2015 From: tbooth at ceh.ac.uk (Tim Booth) Date: Thu, 14 May 2015 16:47:54 +0100 Subject: [Bio-Linux] Bio-Linux 8 Repositories In-Reply-To: References: <55520561.2050009@calmblue.net> <1431442974.16741.13.camel@wllt1771.nerc-wallingford.ac.uk> <55535923.2020903@calmblue.net> <55535C30.2010604@minke-informatics.co.uk> <55535E8D.2070402@calmblue.net> Message-ID: <1431618474.9801.21.camel@wllt1771.nerc-wallingford.ac.uk> Hi Raony, The packing thing was never supposed to obfuscate the upgrade process, just to make distribution easier for me. If you or anyone else on the list would find the packer script useful it is here, along with some other bits and pieces I wrote: http://bazaar.launchpad.net/~tbooth/+junk/misc_scripts/files (packit.perl is the script; testpack.sh is a test case for it) But you don't need to repack anything, just cd to the temporary directory and run "sudo upgrade_to_8.sh". For MATE, you should definitely use the official Wily packages. When I write the upgrade script for 16.04 I'll ensure it replaces my packages with the mainstream ones. Jalview was a bug - I'd changed the package name and didn't update the script. I've fixed it in the same way you did. Galaxy is not going to work on Wily without some significant work on my part. As you said it needs converting to SystemD but I think it needs a bit more than that in addition. As for getting involved in the project, I'll answer that in a second e-mail. Cheers, TIM On Wed, 2015-05-13 at 20:19 -0300, Raony Guimaraes Corr?a Do Carmo Lisboa Cardenas wrote: > Hi Tim, > > > Thank you for sharing this command: > > wget -qO- http://nebc.nerc.ac.uk/downloads/bl8_only/upgrade8.sh | env > UNPACK_ONLY=1 sh > > > Now I can finally see what you are doing in file "upgrade_to_8.sh" to > transform Ubuntu 14.04 LTS into Biolinux 8. > > > I recently upgraded my system to Ubuntu 15.04 Wily Werewolf (Kernel > 4.1rc2) and even after that most of the packages from Biolinux 8 > repositories are still working pretty well. I think there is a > conflict between the old mate packages from Biolinux in launchpad and > the fresh new mate packages from the official Ubuntu repositories. > Keep up the good work! > > > > Interesting to see the packages in pseudo_orphans.txt as well! > > > Here are a two bugs I'm receiving while running your script: > > This one, probably because of the change from upstart to systemd: > > initctl: Unable to connect to Upstart: Failed to connect to > socket /com/ubuntu/upstart: Connection refused > insserv: warning: script 'galaxy' missing LSB tags and overrides > insserv: Default-Start undefined, assuming empty start runlevel(s) for > script `galaxy' > insserv: Default-Stop undefined, assuming empty stop runlevel(s) for > script `galaxy' > > And this second one: > > E: Unable to locate package bio-linux-jalview > Not all packages installed properly - exiting. > > > I believe it's something related with the package "jalview" from > biolinux repo. I could install this package from Ubuntu sources > without a problem: > > http://archive.ubuntu.com/ubuntu/ wily/universe jalview all > 2.7.dfsg-4 [3.382 kB] > > > Now I have a question, after changing the name of the package in file > "bl_master_package_list.txt" from bio-linux-jalview to jalview, how I > could "pack" everything again to the file upgrade8.sh or in which > order I should execute your scripts bl_install_master_list.sh, > pick_cran_mirror.py and upgrade_to_8.sh in order to finally complete > the upgrade to Biolinux 8 ? > > > I believe I could help this project by packaging some new software to > add to the biolinux repositories or even trying to upgrade a package > from python2 to python3. > > Let me know how I could help. :) > > > Kind Regards, > > > > > _____________________________________________ > > Raony Guimar?es Corr?a Do Carmo Lisboa Cardenas > PhD Student in Bioinformatics > > email: raonyguimaraes at gmail.com > skype/gtalk: raonyguimaraes > phone: +55 31 93404152 > > > Laboratory of Clinical Genomics > UFMG School of Medicine > Federal University of Minas Gerais - UFMG > Av. Prof. Alfredo Balena, 190, Sala 321 > Belo Horizonte, Brazil 30130-100 > _____________________________________________ > > On Wed, May 13, 2015 at 11:24 AM, Simon wrote: > On 13/05/15 15:14, Tony Travis wrote: > On 13/05/15 15:01, Simon wrote: > Hi Tim, that's brilliant, thanks. > > Script looks good to me, although I think the > PPA will be best for our > situation as we're not running a GUI and the > less we vary from our > standard server build the better. > Hi, Simon. > > I've run Bio-Linux 8 as a terminal server on many > servers. You need most > of the Ubuntu 'desktop' installed to run a remote MATE > desktop. > > I don't think it's an issue to run a GUI on a server, > but you might find > it useful to use the MATE desktop locally if the > server graphics > hardware is not up to running Unity. > > The overhead of running a GUI on the server is quite > low, especially if > you run the MATE desktop locally. The default remote > desktop is MATE. > > HTH, > > Tony. > > > Good to know, thanks, I like MATE. If it turns out that a > desktop is required then I'll definitely go down that route. > > _______________________________________________ > Bio-Linux mailing list > Bio-Linux at nebclists.nerc.ac.uk > http://nebclists.nerc.ac.uk/mailman/listinfo/bio-linux > > > > _______________________________________________ > Bio-Linux mailing list > Bio-Linux at nebclists.nerc.ac.uk > http://nebclists.nerc.ac.uk/mailman/listinfo/bio-linux -- Tim Booth NERC Environmental Bioinformatics Centre Centre for Ecology and Hydrology Maclean Bldg, Benson Lane Crowmarsh Gifford Wallingford, England OX10 8BB http://environmentalomics.org/bio-linux +44 1491 69 2297 From paumarc at gmail.com Sun May 24 07:40:13 2015 From: paumarc at gmail.com (=?UTF-8?Q?Pau_Marc_Mu=C3=B1oz_Torres?=) Date: Sun, 24 May 2015 13:40:13 +0200 Subject: [Bio-Linux] problems with ssh Message-ID: Hello marko I have some problems with my ssh and i don't know what should i do. Recently i moved to another room. After reading some forums i realize that my ip has change and maybe the old address (paumarc@.zmm.irb.hr ) should also be changed. Do you know what should i do to re-stablish my ssh ? I installed my biolinux at my old localization. my new ip is 193.198.xxx.xxx. Do i have to recongfigure something? i have the ssh port open thanks Pau Marc Mu?oz Torres skype: pau_marc http://www.linkedin.com/in/paumarc http://www.researchgate.net/profile/Pau_Marc_Torres3/info/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From M.D.Sharma at exeter.ac.uk Sun May 24 07:58:07 2015 From: M.D.Sharma at exeter.ac.uk (Sharma, M D) Date: Sun, 24 May 2015 11:58:07 +0000 Subject: [Bio-Linux] problems with ssh In-Reply-To: References: Message-ID: Pau, If I am understanding this correctly? 1) You installed BioLinux and configured an IP on your BioLinux machine (or got it via DHCP) when you were in the old office. 2) Assuming the above, if your new location has a different network / subnet ? then the IP of your biolinux machine would have changed. I suggest that you check your new ip configuration by running the ifconfig command on the console of your biolinux machine, check your routing to make sure that this machine is reachable from the internet / has internet access, and then use that IP (or the corresponding FQDN) to establish an SSH connection. Best, MD Dr. M D Sharma Research Fellow Centre for Ecology & Conservation College of Life and Environmental Sciences University of Exeter Cornwall Campus Penryn TR10 9FE M.D.Sharma at Exeter.ac.uk http://www.publicationslist.org/MD.Sharma http://www.researcherid.com/rid/F-8530-2013 Shared Tel: (+44) 1326 259384 Mob: (+44) 7919 242450 [cid:image001.gif at 01CF6B6C.CC1DF8B0] [cid:image004.gif at 01CF6B6C.CC1DF8B0][cid:image005.gif at 01CF6B6C.CC1DF8B0][cid:image006.gif at 01CF6B6C.CC1DF8B0][cid:image007.gif at 01CF6B6C.CC1DF8B0] This email and any attachment may contain information that is confidential, privileged, or subject to copyright, and which may be exempt from disclosure under applicable legislation. It is intended for the addressee only. If you received this message in error, please let me know and delete the email and any attachments immediately. The University will not accept responsibility for the accuracy/completeness of this e-mail and its attachments. From: Pau Marc Mu?oz Torres [mailto:paumarc at gmail.com] Sent: 24 May 2015 12:40 To: Bio-Linux help and discussion Subject: [Bio-Linux] problems with ssh Hello marko I have some problems with my ssh and i don't know what should i do. Recently i moved to another room. After reading some forums i realize that my ip has change and maybe the old address (paumarc@.zmm.irb.hr) should also be changed. Do you know what should i do to re-stablish my ssh ? I installed my biolinux at my old localization. my new ip is 193.198.xxx.xxx. Do i have to recongfigure something? i have the ssh port open thanks Pau Marc Mu?oz Torres skype: pau_marc http://www.linkedin.com/in/paumarc http://www.researchgate.net/profile/Pau_Marc_Torres3/info/ -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image001.png Type: image/png Size: 10072 bytes Desc: image001.png URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image002.gif Type: image/gif Size: 65 bytes Desc: image002.gif URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image003.png Type: image/png Size: 986 bytes Desc: image003.png URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image004.png Type: image/png Size: 1101 bytes Desc: image004.png URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image005.png Type: image/png Size: 1508 bytes Desc: image005.png URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image006.png Type: image/png Size: 1109 bytes Desc: image006.png URL: From paumarc at gmail.com Tue May 26 05:19:19 2015 From: paumarc at gmail.com (=?UTF-8?Q?Pau_Marc_Mu=C3=B1oz_Torres?=) Date: Tue, 26 May 2015 11:19:19 +0200 Subject: [Bio-Linux] problems with ssh In-Reply-To: References: Message-ID: thank you for your answer, i solved everything fixed, the it guys had some ports closed for my new ip p Pau Marc Mu?oz Torres skype: pau_marc http://www.linkedin.com/in/paumarc http://www.researchgate.net/profile/Pau_Marc_Torres3/info/ 2015-05-24 13:58 GMT+02:00 Sharma, M D : > Pau, > > > > If I am understanding this correctly? > > 1) You installed BioLinux and configured an IP on your BioLinux > machine (or got it via DHCP) when you were in the old office. > > 2) Assuming the above, if your new location has a different network > / subnet ? then the IP of your biolinux machine would have changed. > > > > I suggest that you check your new ip configuration by running the ifconfig > command on the console of your biolinux machine, check your routing to make > sure that this machine is reachable from the internet / has internet > access, and then use that IP (or the corresponding FQDN) to establish an > SSH connection. > > > > Best, > > MD > > > > Dr. M D Sharma > > Research Fellow > > Centre for Ecology & Conservation > > College of Life and Environmental Sciences > > University of Exeter > > Cornwall Campus > > Penryn > > TR10 9FE > > > > M.D.Sharma at Exeter.ac.uk > > > http://www.publicationslist.org/MD.Sharma > > > http://www.researcherid.com/rid/F-8530-2013 > > > > Shared Tel: (+44) 1326 259384 > > Mob: (+44) 7919 242450 > > [image: cid:image001.gif at 01CF6B6C.CC1DF8B0] > > [image: cid:image003.gif at 01CF6B6C.CC1DF8B0] > > > > [image: cid:image004.gif at 01CF6B6C.CC1DF8B0] > [image: > cid:image005.gif at 01CF6B6C.CC1DF8B0] [image: > cid:image006.gif at 01CF6B6C.CC1DF8B0] > [image: > cid:image007.gif at 01CF6B6C.CC1DF8B0] > > > This email and any attachment may contain information that is > confidential, privileged, or subject to copyright, and which may be exempt > from disclosure under applicable legislation. It is intended for the > addressee only. If you received this message in error, please let me know > and delete the email and any attachments immediately. The University will > not accept responsibility for the accuracy/completeness of this e-mail and > its attachments. > > > > > > > > *From:* Pau Marc Mu?oz Torres [mailto:paumarc at gmail.com] > *Sent:* 24 May 2015 12:40 > *To:* Bio-Linux help and discussion > *Subject:* [Bio-Linux] problems with ssh > > > > Hello marko > > > > I have some problems with my ssh and i don't know what should i do. > Recently i moved to another room. After reading some forums i realize that > my ip has change and maybe the old address (paumarc@.zmm.irb.hr > ) should also be changed. Do you know what > should i do to re-stablish my ssh ? I installed my biolinux at my old > localization. > > > > my new ip is 193.198.xxx.xxx. Do i have to recongfigure something? i have > the ssh port open > > > > thanks > > Pau Marc Mu?oz Torres > > skype: pau_marc > > http://www.linkedin.com/in/paumarc > http://www.researchgate.net/profile/Pau_Marc_Torres3/info/ > > _______________________________________________ > Bio-Linux mailing list > Bio-Linux at nebclists.nerc.ac.uk > http://nebclists.nerc.ac.uk/mailman/listinfo/bio-linux > > -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image003.png Type: image/png Size: 986 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image001.png Type: image/png Size: 10072 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image005.png Type: image/png Size: 1508 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image002.gif Type: image/gif Size: 65 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image004.png Type: image/png Size: 1101 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image006.png Type: image/png Size: 1109 bytes Desc: not available URL: From chiara.cocconcelli1 at gmail.com Wed May 27 04:55:14 2015 From: chiara.cocconcelli1 at gmail.com (chiara cocconcelli) Date: Wed, 27 May 2015 10:55:14 +0200 Subject: [Bio-Linux] Biolinux 8, pymol and Tkinter error using the builder function Message-ID: Hi all, I'm using Biolinux 8 with pymol from the repositories but I am encountering a tkinter-related bug. Steps to reproduce: 1) open pymol 2) in the list of buttons on the right click the Builder one 3) in the panel that opens click Protein on the left the function is not called and this error is raised: Error: 1 Exception in Tk callback Function: > (type: ) Args: () Traceback (innermost last): File "/usr/lib/python2.7/dist-packages/Pmw/Pmw_1_3/lib/PmwBase.py", line 1747, in __call__ return apply(self.func, args) File "/usr/lib/python2.7/dist-packages/pmg_tk/skins/normal/builder.py", line 1458, in toggleChemProtein if self.chemFrame.grid_info(): File "/usr/lib/python2.7/lib-tk/Tkinter.py", line 2000, in grid_info self.tk.call('grid', 'info', self._w)) : coercing to Unicode: need string or buffer, _tkinter.Tcl_Obj found I can provide debugging on request and I have a bit of python understanding, please let me know how I can best diagnose this problem thanks in advance, regards -------------- next part -------------- An HTML attachment was scrubbed... URL: From tbooth at ceh.ac.uk Wed May 27 09:39:09 2015 From: tbooth at ceh.ac.uk (Tim Booth) Date: Wed, 27 May 2015 14:39:09 +0100 Subject: [Bio-Linux] Biolinux 8, pymol and Tkinter error using the builder function In-Reply-To: References: Message-ID: <1432733949.25170.15.camel@wllt1771.nerc-wallingford.ac.uk> Hi Chiara, Confirmed this on my machine. This is a bad bug as it makes a whole chunk of Pymol functionality unavailable. As you say, it seems to be an issue with tkinter, but very specific to calling grid_info(), which is only used once in the Pymol code. Therefore, rather than attempt to debug tkinter (which is probably beyond me!) I've added a workaround patch. I've also upgraded the Pymol package from 1.7.0 to 1.7.2.1 while I was at it. The new package should appear in the software updater for you now. If you are interested, here is the patch: tbooth at balisaur$ cat 32_workaround_grid_info_bug.patch --- a/modules/pmg_tk/skins/normal/builder.py +++ b/modules/pmg_tk/skins/normal/builder.py def toggleChemProtein(self): - if self.chemFrame.grid_info(): - self.chemB.configure(relief=RAISED) - self.chemFrame.grid_forget() - self.protB.configure(relief=SUNKEN) - self.protFrame.grid(row=1, column=2, rowspan=4, sticky=W) - else: - self.chemB.configure(relief=SUNKEN) - self.chemFrame.grid(row=1, column=2, rowspan=4, sticky=W) - self.protB.configure(relief=RAISED) - self.protFrame.grid_forget() + #Nasty workaround for grid_info bug in Ubuntu 14.04. + try: + if self.chemFrame.grid_info(): raise TypeError() + #If we get to here it means the chemFrame is not active + self.protFrame.grid_forget() + self.chemFrame.grid(row=1, column=2, rowspan=4, sticky=W) + self.protB.configure(relief=RAISED) + self.chemB.configure(relief=SUNKEN) + except TypeError: + self.chemFrame.grid_forget() + self.protFrame.grid(row=1, column=2, rowspan=4, sticky=W) + self.chemB.configure(relief=RAISED) + self.protB.configure(relief=SUNKEN) ############################################################ Hopefully that sorts it. Cheers, TIM On Wed, 2015-05-27 at 10:55 +0200, chiara cocconcelli wrote: > Hi all, > > I'm using Biolinux 8 with pymol from the repositories but I am > encountering a tkinter-related bug. Steps to reproduce: > > > 1) open pymol > > 2) in the list of buttons on the right click the Builder one > > 3) in the panel that opens click Protein on the left > > > the function is not called and this error is raised: > > Error: 1 > Exception in Tk callback > Function: > > (type: ) > Args: () > Traceback (innermost last): > File "/usr/lib/python2.7/dist-packages/Pmw/Pmw_1_3/lib/PmwBase.py", > line 1747, in __call__ > return apply(self.func, args) > File > "/usr/lib/python2.7/dist-packages/pmg_tk/skins/normal/builder.py", > line 1458, in toggleChemProtein > if self.chemFrame.grid_info(): > File "/usr/lib/python2.7/lib-tk/Tkinter.py", line 2000, in grid_info > self.tk.call('grid', 'info', self._w)) > : coercing to Unicode: need string or > buffer, _tkinter.Tcl_Obj found > > > I can provide debugging on request and I have a bit of python > understanding, please let me know how I can best diagnose this problem > > thanks in advance, > > regards > > _______________________________________________ > Bio-Linux mailing list > Bio-Linux at nebclists.nerc.ac.uk > http://nebclists.nerc.ac.uk/mailman/listinfo/bio-linux -- Tim Booth NERC Environmental Bioinformatics Centre Centre for Ecology and Hydrology Maclean Bldg, Benson Lane Crowmarsh Gifford Wallingford, England OX10 8BB http://environmentalomics.org/bio-linux +44 1491 69 2297 From chiara.cocconcelli1 at gmail.com Wed May 27 09:47:07 2015 From: chiara.cocconcelli1 at gmail.com (chiara cocconcelli) Date: Wed, 27 May 2015 15:47:07 +0200 Subject: [Bio-Linux] Biolinux 8, pymol and Tkinter error using the builder function In-Reply-To: <1432733949.25170.15.camel@wllt1771.nerc-wallingford.ac.uk> References: <1432733949.25170.15.camel@wllt1771.nerc-wallingford.ac.uk> Message-ID: Hi Tim, thanks for the quick reply! I have just updated the pymol package and I can confirm that the bug is no longer there! Do you think it might be worth to push this bug to upstream? from what I can understand googling the issue there seems to be almost nobody else experiencing it...let me know what you think, I can open the bugreport if you wish. thanks again a lot, Cheers, 2015-05-27 15:39 GMT+02:00 Tim Booth : > Hi Chiara, > > Confirmed this on my machine. This is a bad bug as it makes a whole > chunk of Pymol functionality unavailable. As you say, it seems to be an > issue with tkinter, but very specific to calling grid_info(), which is > only used once in the Pymol code. > > Therefore, rather than attempt to debug tkinter (which is probably > beyond me!) I've added a workaround patch. I've also upgraded the Pymol > package from 1.7.0 to 1.7.2.1 while I was at it. The new package should > appear in the software updater for you now. > > If you are interested, here is the patch: > > tbooth at balisaur$ cat 32_workaround_grid_info_bug.patch > --- a/modules/pmg_tk/skins/normal/builder.py > +++ b/modules/pmg_tk/skins/normal/builder.py > > def toggleChemProtein(self): > - if self.chemFrame.grid_info(): > - self.chemB.configure(relief=RAISED) > - self.chemFrame.grid_forget() > - self.protB.configure(relief=SUNKEN) > - self.protFrame.grid(row=1, column=2, rowspan=4, sticky=W) > - else: > - self.chemB.configure(relief=SUNKEN) > - self.chemFrame.grid(row=1, column=2, rowspan=4, sticky=W) > - self.protB.configure(relief=RAISED) > - self.protFrame.grid_forget() > + #Nasty workaround for grid_info bug in Ubuntu 14.04. > + try: > + if self.chemFrame.grid_info(): raise TypeError() > + #If we get to here it means the chemFrame is not active > + self.protFrame.grid_forget() > + self.chemFrame.grid(row=1, column=2, rowspan=4, sticky=W) > + self.protB.configure(relief=RAISED) > + self.chemB.configure(relief=SUNKEN) > + except TypeError: > + self.chemFrame.grid_forget() > + self.protFrame.grid(row=1, column=2, rowspan=4, sticky=W) > + self.chemB.configure(relief=RAISED) > + self.protB.configure(relief=SUNKEN) > > ############################################################ > > Hopefully that sorts it. > > Cheers, > > TIM > > On Wed, 2015-05-27 at 10:55 +0200, chiara cocconcelli wrote: > > Hi all, > > > > I'm using Biolinux 8 with pymol from the repositories but I am > > encountering a tkinter-related bug. Steps to reproduce: > > > > > > 1) open pymol > > > > 2) in the list of buttons on the right click the Builder one > > > > 3) in the panel that opens click Protein on the left > > > > > > the function is not called and this error is raised: > > > > Error: 1 > > Exception in Tk callback > > Function: > > > > (type: ) > > Args: () > > Traceback (innermost last): > > File "/usr/lib/python2.7/dist-packages/Pmw/Pmw_1_3/lib/PmwBase.py", > > line 1747, in __call__ > > return apply(self.func, args) > > File > > "/usr/lib/python2.7/dist-packages/pmg_tk/skins/normal/builder.py", > > line 1458, in toggleChemProtein > > if self.chemFrame.grid_info(): > > File "/usr/lib/python2.7/lib-tk/Tkinter.py", line 2000, in grid_info > > self.tk.call('grid', 'info', self._w)) > > : coercing to Unicode: need string or > > buffer, _tkinter.Tcl_Obj found > > > > > > I can provide debugging on request and I have a bit of python > > understanding, please let me know how I can best diagnose this problem > > > > thanks in advance, > > > > regards > > > > _______________________________________________ > > Bio-Linux mailing list > > Bio-Linux at nebclists.nerc.ac.uk > > http://nebclists.nerc.ac.uk/mailman/listinfo/bio-linux > > -- > Tim Booth > NERC Environmental Bioinformatics Centre > > Centre for Ecology and Hydrology > Maclean Bldg, Benson Lane > Crowmarsh Gifford > Wallingford, England > OX10 8BB > > http://environmentalomics.org/bio-linux > +44 1491 69 2297 > > _______________________________________________ > Bio-Linux mailing list > Bio-Linux at nebclists.nerc.ac.uk > http://nebclists.nerc.ac.uk/mailman/listinfo/bio-linux > -------------- next part -------------- An HTML attachment was scrubbed... URL: From tbooth at ceh.ac.uk Wed May 27 10:14:57 2015 From: tbooth at ceh.ac.uk (Tim Booth) Date: Wed, 27 May 2015 15:14:57 +0100 Subject: [Bio-Linux] Biolinux 8, pymol and Tkinter error using the builder function In-Reply-To: References: <1432733949.25170.15.camel@wllt1771.nerc-wallingford.ac.uk> Message-ID: <1432736097.25170.34.camel@wllt1771.nerc-wallingford.ac.uk> Hi, > > thanks for the quick reply! I have just updated the pymol package and > I can confirm that the bug is no longer there! Cool. > Do you think it might be worth to push this bug to upstream? from what > I can understand googling the issue there seems to be almost nobody > else experiencing it...let me know what you think, I can open the > bugreport if you wish. I'm not sure. In all likelihood this issue is specific to Ubuntu 14.04 because of the particular version of Python used. I strongly suspect it is fixed if you upgrade to the latest Ubuntu/Debian. There are four places you could report it: 1) The Pymol developers, but they might just say "upgrade your Python" 2) The Debian maintainers, but they won't really care about Ubuntu bugs 3) The Ubuntu MOTU maintainers (for Pymol), but they have a lot to deal with 4) The Ubuntu core developers (for Python2) If you want to make a report I'd suggest you do (4) since this is where the bug really lies, and I'd file a bug here: https://bugs.launchpad.net/ubuntu/+source/python2.7 Here is a nice simple test case that demonstrates the bug: #!/usr/bin/python2.7 from Tkinter import Tk, Entry root = Tk() e = Entry(root) e.grid(column=1, row=1) print e.grid_info() Cheers, TIM From chiara.cocconcelli1 at gmail.com Wed May 27 11:44:22 2015 From: chiara.cocconcelli1 at gmail.com (chiara cocconcelli) Date: Wed, 27 May 2015 17:44:22 +0200 Subject: [Bio-Linux] Biolinux 8, pymol and Tkinter error using the builder function In-Reply-To: <1432736097.25170.34.camel@wllt1771.nerc-wallingford.ac.uk> References: <1432733949.25170.15.camel@wllt1771.nerc-wallingford.ac.uk> <1432736097.25170.34.camel@wllt1771.nerc-wallingford.ac.uk> Message-ID: 2015-05-27 16:14 GMT+02:00 Tim Booth : > If you want to make a report I'd suggest you do (4) since this is where > the bug really lies, and I'd file a bug here: > > https://bugs.launchpad.net/ubuntu/+source/python2.7 > > Thanks for the info, I am saddened by the fact that you absolutely need a launchpad account to file a bugreport for ubuntu, as soon as I recover the credentials I am going to file it. > Here is a nice simple test case that demonstrates the bug: > > #!/usr/bin/python2.7 > from Tkinter import Tk, Entry > root = Tk() > e = Entry(root) > e.grid(column=1, row=1) > print e.grid_info() > Thanks, I hope it will be useful. cheers -------------- next part -------------- An HTML attachment was scrubbed... URL: