Thanks Joe! I tested with 'which' and 'whereis', they do find 'orted' on my system. So I tried again with full path to "mpirun" (i am sorry I should have done this earlier): > /opt/openmpi.gcc/bin/mpirun -np 3 -machinefile machines /home/local/bin/mpiblast -p blastp -i ./bait.fasta -d ecoli.aa which produced error: ---------------------- 1 0.0799131 Bailing out with signal 11 [node002:19427] MPI_ABORT invoked on rank 1 in communicator MPI_COMM_WORLD with errorcode 0 0 0.0862 Bailing out with signal 15 [node001:24948] MPI_ABORT invoked on rank 0 in communicator MPI_COMM_WORLD with errorcode 0 2 0.0861399 Bailing out with signal 15 [node003:15941] MPI_ABORT invoked on rank 2 in communicator MPI_COMM_WORLD with errorcode 0 ---------------------- I have another openMPI installation, so I also tried: > /opt/openmpi121.gcc/bin/mpirun -np 3 -machinefile machines /home/local/bin/mpiblast -p blastp -i ./bait.fasta -d ecoli.aa which gives different errors: ---------------------- [host.ansci.iastate.edu:07014] mca: base: component_find: unable to open ras tm: file not found (ignored) [host.ansci.iastate.edu:07014] mca: base: component_find: unable to open pls tm: file not found (ignored) [node001:24985] mca: base: component_find: unable to open ras tm: file not found (ignored) [node001:24985] mca: base: component_find: unable to open pls tm: file not found (ignored) [node003:15979] mca: base: component_find: unable to open ras tm: file not found (ignored) [node002:19464] mca: base: component_find: unable to open ras tm: file not found (ignored) [node003:15979] mca: base: component_find: unable to open pls tm: file not found (ignored) [node002:19464] mca: base: component_find: unable to open pls tm: file not found (ignored) 1 0.0736248 Bailing out with signal 11 [node002:19464] MPI_ABORT invoked on rank 1 in communicator MPI_COMM_WORLD with errorcode 0 0 0.0795131 Bailing out with signal 15 [node001:24985] MPI_ABORT invoked on rank 0 in communicator MPI_COMM_WORLD with errorcode 0 2 0.0794392 Bailing out with signal 15 [node003:15979] MPI_ABORT invoked on rank 2 in communicator MPI_COMM_WORLD with errorcode 0 ---------------------- By the way, from the head node, 'ssh node001 which orted' does not find it but 'ssh node001 whereis orted' found it (from both mpi installations). Also, after I do 'ssh node001', both 'which' and whereis' can find it from the two mpi installations. I do have '/opt/openmpi121.gcc/bin' and '/opt/openmpi.gcc/bin' on my path (I am using bash; I tried using 'tcsh' with more errors). I hope this provide more useful clue to dig further? Zhiliang On Thu, 20 Sep 2007, Joe Landman wrote: > Date: Thu, 20 Sep 2007 16:13:50 -0400 > From: Joe Landman <landman at scalableinformatics.com> > To: HPC in Bioinformatics <bioclusters at bioinformatics.org> > Subject: Re: [Bioclusters] Re: new on using clusters: problem running mpiblast > (2) > > Zhiliang Hu wrote: > >> --------------------------------------- >> bash: orted: command not found >> bash: orted: command not found > > > Ah-hah! > > Could you do a > > which orted > > on the head node from where you launch the mpiblast, and then > > ssh node001 which orted > > and report that back? > >> [ansci.iastate.edu:03916] ERROR: A daemon on node node001 failed to >> start as expected. > > This suggests that a) orted wasn't found, and b) since that is required > to let OpenMPI set up the remote process, the remote process doesn't get > started. > >> [ansci.iastate.edu:03916] ERROR: There may be more information available >> from >> [ansci.iastate.edu:03916] ERROR: the remote shell (see above). >> [ansci.iastate.edu:03916] ERROR: The daemon exited unexpectedly with >> status 127. > > If you don't see orted on the remote system, you might need to contact > your systems administrator to make sure the right path is mounted on the > remote node. > > If you built OpenMPI yourself, you need to make sure your path variable > includes the $openmpi/bin directory. > > Basically this looks like OpenMPI is not in your path, which is why it > can't find orted, and this is why mpiblast isn't booting up on the node.