[Bioclusters] Re: new on using clusters: problem running mpiblast (2)

Zhiliang Hu hu at animalgenome.org
Fri Sep 21 11:06:35 EDT 2007


Thanks Joe!

I tested with 'which' and 'whereis', they do find 'orted' on my system.
So I tried again with full path to "mpirun" (i am sorry I should have 
done this earlier):

> /opt/openmpi.gcc/bin/mpirun -np 3 -machinefile machines 
/home/local/bin/mpiblast -p blastp -i ./bait.fasta -d ecoli.aa

which produced error:
----------------------
1       0.0799131       Bailing out with signal 11
[node002:19427] MPI_ABORT invoked on rank 1 in communicator MPI_COMM_WORLD with errorcode 0
0       0.0862  Bailing out with signal 15
[node001:24948] MPI_ABORT invoked on rank 0 in communicator MPI_COMM_WORLD with errorcode 0
2       0.0861399       Bailing out with signal 15
[node003:15941] MPI_ABORT invoked on rank 2 in communicator MPI_COMM_WORLD with errorcode 0
----------------------


I have another openMPI installation, so I also tried:

> /opt/openmpi121.gcc/bin/mpirun -np 3 -machinefile machines 
/home/local/bin/mpiblast -p blastp -i ./bait.fasta -d ecoli.aa

which gives different errors:
----------------------
[host.ansci.iastate.edu:07014] mca: base: component_find: unable to open ras tm: file not found (ignored)
[host.ansci.iastate.edu:07014] mca: base: component_find: unable to open pls tm: file not found (ignored)
[node001:24985] mca: base: component_find: unable to open ras tm: file not found (ignored)
[node001:24985] mca: base: component_find: unable to open pls tm: file not found (ignored)
[node003:15979] mca: base: component_find: unable to open ras tm: file not found (ignored)
[node002:19464] mca: base: component_find: unable to open ras tm: file not found (ignored)
[node003:15979] mca: base: component_find: unable to open pls tm: file not found (ignored)
[node002:19464] mca: base: component_find: unable to open pls tm: file not found (ignored)
1       0.0736248       Bailing out with signal 11
[node002:19464] MPI_ABORT invoked on rank 1 in communicator MPI_COMM_WORLD with errorcode 0
0       0.0795131       Bailing out with signal 15
[node001:24985] MPI_ABORT invoked on rank 0 in communicator MPI_COMM_WORLD with errorcode 0
2       0.0794392       Bailing out with signal 15
[node003:15979] MPI_ABORT invoked on rank 2 in communicator MPI_COMM_WORLD with errorcode 0
----------------------

By the way, from the head node, 'ssh node001 which orted' does not
find it but 'ssh node001 whereis orted' found it (from both mpi 
installations).  Also, after I do 'ssh node001', both 'which' and
whereis' can find it from the two mpi installations.

I do have '/opt/openmpi121.gcc/bin' and '/opt/openmpi.gcc/bin' on
my path (I am using bash; I tried using 'tcsh' with more errors).

I hope this provide more useful clue to dig further?

Zhiliang


On Thu, 20 Sep 2007, Joe Landman wrote:

> Date: Thu, 20 Sep 2007 16:13:50 -0400
> From: Joe Landman <landman at scalableinformatics.com>
> To: HPC in Bioinformatics <bioclusters at bioinformatics.org>
> Subject: Re: [Bioclusters] Re: new on using clusters: problem running mpiblast
>      (2)
> 
> Zhiliang Hu wrote:
>
>> ---------------------------------------
>> bash: orted: command not found
>> bash: orted: command not found
>
>
> Ah-hah!
>
> Could you do a
>
> 	which orted
>
> on the head node from where you launch the mpiblast, and then
>
> 	ssh node001 which orted
>
> and report that back?
>
>> [ansci.iastate.edu:03916] ERROR: A daemon on node node001 failed to
>> start as expected.
>
> This suggests that a) orted wasn't found, and b) since that is required
> to let OpenMPI set up the remote process, the remote process doesn't get
> started.
>
>> [ansci.iastate.edu:03916] ERROR: There may be more information available
>> from
>> [ansci.iastate.edu:03916] ERROR: the remote shell (see above).
>> [ansci.iastate.edu:03916] ERROR: The daemon exited unexpectedly with
>> status 127.
>
> If you don't see orted on the remote system, you might need to contact
> your systems administrator to make sure the right path is mounted on the
> remote node.
>
> If you built OpenMPI yourself, you need to make sure your path variable
> includes the $openmpi/bin  directory.
>
> Basically this looks like OpenMPI is not in your path, which is why it
> can't find orted, and this is why mpiblast isn't booting up on the node.


More information about the Bioclusters mailing list