[Bioclusters] Re: new on using clusters: problem running mpiblast (2)

Zhiliang Hu hu at animalgenome.org
Fri Sep 14 04:23:46 EDT 2007


Thanks Aaron,

Indeed I got it compiled before (and now again, without my last reported 
"CC/CPP" exports, and with or without non-specific "export CC=mpicc" and 
"export CXX=mpicxx" suggested by Zhao Xu).

The problem was, when I run the mpiblast with:
   /opt/openmpi.gcc/bin/mpirun -np 16 -machinefile ./machines
        /home/local/bin/mpiblast -p blastp -i ./bait.fasta -d ecoli.aa

I got following error that I don't have a clue as where to look for 
the cause:

1       0.095628        Bailing out with signal 11
[node001:13406] MPI_ABORT invoked on rank 1 in communicator MPI_COMM_WORLD 
with errorcode 0
0       0.101815        Bailing out with signal 15
[node001:13405] MPI_ABORT invoked on rank 0 in communicator MPI_COMM_WORLD 
with errorcode 0
15      0.157852        Bailing out with signal 15
[node001:13420] MPI_ABORT invoked on rank 15 in communicator 
MPI_COMM_WORLD with errorcode 0
2       0.105103        Bailing out with signal 15
[node001:13407] MPI_ABORT invoked on rank 2 in communicator MPI_COMM_WORLD 
with errorcode 0
3       0.109706        Bailing out with signal 15
[node001:13408] MPI_ABORT invoked on rank 3 in communicator MPI_COMM_WORLD 
with errorcode 0
4       0.114032        Bailing out with signal 15
[node001:13409] MPI_ABORT invoked on rank 4 in communicator MPI_COMM_WORLD 
with errorcode 0
5       0.117891        Bailing out with signal 15
[node001:13410] MPI_ABORT invoked on rank 5 in communicator MPI_COMM_WORLD 
with errorcode 0
6       0.122292        Bailing out with signal 15
[node001:13411] MPI_ABORT invoked on rank 6 in communicator MPI_COMM_WORLD 
with errorcode 0
7       0.125675        Bailing out with signal 15
[node001:13412] MPI_ABORT invoked on rank 7 in communicator MPI_COMM_WORLD 
with errorcode 0
8       0.129363        Bailing out with signal 15
[node001:13413] MPI_ABORT invoked on rank 8 in communicator MPI_COMM_WORLD 
with errorcode 0
9       0.134528        Bailing out with signal 15
[node001:13414] MPI_ABORT invoked on rank 9 in communicator MPI_COMM_WORLD 
with errorcode 0
10      0.138087        Bailing out with signal 15
[node001:13415] MPI_ABORT invoked on rank 10 in communicator 
MPI_COMM_WORLD with errorcode 0
11      0.141622        Bailing out with signal 15
[node001:13416] MPI_ABORT invoked on rank 11 in communicator 
MPI_COMM_WORLD with errorcode 0
12      0.145868        Bailing out with signal 15
[node001:13417] MPI_ABORT invoked on rank 12 in communicator 
MPI_COMM_WORLD with errorcode 0
13      0.149375        Bailing out with signal 15
[node001:13418] MPI_ABORT invoked on rank 13 in communicator 
MPI_COMM_WORLD with errorcode 0
14      0.152966        Bailing out with signal 15
[node001:13419] MPI_ABORT invoked on rank 14 in communicator 
MPI_COMM_WORLD with errorcode 0

[As related information, the mpirun is working fine when tested 
with a small "hello" program that showed responses from all nodes].

--
Zhiliang


On Sun, 9 Sep 2007, Aaron Darling wrote:

> Date: Sun, 09 Sep 2007 08:04:14 +1000
> From: Aaron Darling <darling at cs.wisc.edu>
> Reply-To: HPC in Bioinformatics <bioclusters at bioinformatics.org>
> To: HPC in Bioinformatics <bioclusters at bioinformatics.org>
> Subject: Re: [Bioclusters] Re: new on using clusters: problem running mpiblast
>      (2)
> 
> Hi Zhiliang
>
> For reasons that are beyond me, the version of autoconf that we used to
> package mpiBLAST 1.4.0 does not approve of setting CC and/or CXX to
> mpicc or mpicxx.  Doing so results in the autoconf error you have
> observed.  For that reason we added the --with-mpi=/path/to/mpi
> configure option.  It should be sufficient to use that option alone to
> set the preferred compiler path.  If not, then it's a bug in the
> mpiblast configure system.
>
> In response to your other query, I personally have not used mpiblast
> with OpenMPI but I believe others have.  The 1.4.0 release was tested
> against mpich1/2 and LAM.
>
> Regards,
> -Aaron
>
> Zhiliang Hu wrote:
>> Hi Zhao - Thanks for providing more clues to solve my problem!
>>
>> I have actually gone through all these steps to match up the right
>> number of formated sequence database segments, the number of processes
>> to use, having the excutables and sequence database on shared nsf file
>> paths mounted to all nodes, use the correct paths in ~/.ncbirc, and
>> try out with fewest but necessary options, and so on.
>>
>> While the previous trials were using a mpiblast seemingly compiled
>> correctly, when I tried to make sure the compile/run are using the
>> same mpi sets, I tried to re-compile as
>>
>>> export CC=/opt/openmpi.gcc/bin/mpicc
>>> export CPP=/opt/openmpi.gcc/bin/mpicxx
>>> ./configure --with-ncbi=/home/share/src/ncbi --prefix=/opt/local
>>   --with-mpi=/opt/openmpi.gcc
>>
>> -- this time the compilation failed with errors:
>> http://www.animalgenome.org/~hu/share/tmp/config.log
>> http://www.animalgenome.org/~hu/share/tmp/config2.log
>>
>> The trials were with OpenMPI-1.2.1 and 1.2.3 --are these version
>> compatibility problems?  Is there a known MPI version that worked with
>> MPIblast 1.4.0?
>>
>> Zhiliang
>>
>>
>> On Thu, 6 Sep 2007, Zhao Xu wrote:
>>
>>> Date: Thu, 6 Sep 2007 09:08:54 +0800
>>> From: Zhao Xu <xuzh.fdu at gmail.com>
>>> Reply-To: HPC in Bioinformatics <bioclusters at bioinformatics.org>
>>> To: bioclusters at bioinformatics.org
>>> Subject: [Bioclusters] Re: new on using clusters: problem running
>>> mpiblast (2)
>>>
>>> Hi Zhiliang,
>>> Firstly, I suggest that you run mpiBlast with the default parameters,
>>> only
>>> use it with -p -i -d and -o.  If the default parameter works fine, you
>>> should check the system environments: is there any memory limitation?
>>> write
>>> access? etc.
>>>
>>> If the default parameter can not work either, I think you should check:
>>> 1. How many segements did you format the database? Is the number 12?
>>> Make
>>> sure you run mpiBlast with two more processes.
>>> 2. Can every nodes read in the input sequences?
>>> 3. Check your ~/.ncbirc, make sure each node has ~/.ncbirc and each
>>> node can
>>> access the directorys written in ~/.ncbirc.
>>>
>>> Regards,
>>> XU Zhao


More information about the Bioclusters mailing list