[Bioclusters] Re: new on using clusters: problem running mpiblast (2)

Zhiliang Hu hu at animalgenome.org
Wed Sep 26 15:53:52 EDT 2007


For the same mpiblast problem -- our cluster machine vender support came 
on board to help fixing the OpenMPI problems with more tests, and now we 
got following errors [NOTE: mpirun a hello programs works fine]:

> /opt/openmpi.gcc/bin/mpirun -np 3 --mca btl openib,self  --mca 
mpi_abort_print_stack 1 --mca mpi_abort_delay 1 -machinefile ./machines 
/home/local/bin/mpiblast -p blastp -i ./bait.fasta  -d ecoli.aa

1       0.0763218       Bailing out with signal 11
[node002:19342] MPI_ABORT invoked on rank 1 in communicator MPI_COMM_WORLD 
with errorcode 0
[node002:19342] [0] 
func:/opt/openmpi.gcc/lib/libopen-pal.so.0(opal_backtrace_buffer+0x2e) 
[0x2aaaab1f430e]
[node002:19342] [1] 
func:/opt/openmpi.gcc/lib/libmpi.so.0(ompi_mpi_abort+0x21d) 
[0x2aaaaad16b8d]
[node002:19342] [2] func:/lib64/libc.so.6 [0x3d47c30070]
[node002:19342] [3] 
func:/home/local/bin/mpiblast(_ZN22PrecopySchedulerPolicy13getAssignmentEiRiS0_+0x215) 
[0x44a505]
[node002:19342] [4] 
func:/home/local/bin/mpiblast(_ZN8MpiBlast9schedulerEv+0x8df) [0x45f59f]
[node002:19342] [5] 
func:/home/local/bin/mpiblast(_ZN8MpiBlast4mainEiPPc+0x2405) [0x466325]
[node002:19342] [6] func:/home/local/bin/mpiblast(main+0x232) [0x468252]
[node002:19342] [7] func:/lib64/libc.so.6(__libc_start_main+0xf4) 
[0x3d47c1d8a4]
[node002:19342] [8] 
func:/home/local/bin/mpiblast(__gxx_personality_v0+0x469) [0x449429]
[node002:19342] Delaying for 1 seconds before aborting
0       1.08467 Bailing out with signal 15
[node001:19432] MPI_ABORT invoked on rank 0 in communicator MPI_COMM_WORLD 
with errorcode 0
[node001:19432] [0] 
func:/opt/openmpi.gcc/lib/libopen-pal.so.0(opal_backtrace_buffer+0x2e) 
[0x2aaaab1f430e]
[node001:19432] [1] 
func:/opt/openmpi.gcc/lib/libmpi.so.0(ompi_mpi_abort+0x21d) 
[0x2aaaaad16b8d]
[node001:19432] [2] func:/lib64/libc.so.6 [0x3882030070]
[node001:19432] [3] func:/lib64/libc.so.6(nanosleep+0x10) [0x3882094550]
[node001:19432] [4] func:/lib64/libc.so.6(usleep+0x34) [0x38820c7094]
[node001:19432] [5] 
func:/home/local/bin/mpiblast(_ZN8MpiBlast6writerEv+0x5c9) [0x4614f9]
[node001:19432] [6] 
func:/home/local/bin/mpiblast(_ZN8MpiBlast4mainEiPPc+0x1d13) [0x465c33]
[node001:19432] [7] func:/home/local/bin/mpiblast(main+0x232) [0x468252]
[node001:19432] [8] func:/lib64/libc.so.6(__libc_start_main+0xf4) 
[0x388201d8a4]
[node001:19432] [9] 
func:/home/local/bin/mpiblast(__gxx_personality_v0+0x469) [0x449429]
[node001:19432] Delaying for 1 seconds before aborting
2       1.08461 Bailing out with signal 15
[node007:19350] MPI_ABORT invoked on rank 2 in communicator MPI_COMM_WORLD 
with errorcode 0
[node007:19350] [0] 
func:/opt/openmpi.gcc/lib/libopen-pal.so.0(opal_backtrace_buffer+0x2e) 
[0x2aaaab1f430e]
[node007:19350] [1] 
func:/opt/openmpi.gcc/lib/libmpi.so.0(ompi_mpi_abort+0x21d) 
[0x2aaaaad16b8d]
[node007:19350] [2] func:/lib64/libc.so.6 [0x3860e30070]
[node007:19350] [3] func:/usr/local/ofed/lib64/libmthca-rdmav2.so 
[0x2aaab19eeac2]
[node007:19350] [4] func:/opt/openmpi.gcc/lib/openmpi/mca_btl_openib.so 
[0x2aaab055e0ab]
[node007:19350] [5] 
func:/opt/openmpi.gcc/lib/openmpi/mca_bml_r2.so(mca_bml_r2_progress+0x2a) 
[0x2aaab035101a]
[node007:19350] [6] 
func:/opt/openmpi.gcc/lib/libopen-pal.so.0(opal_progress+0x4a) 
[0x2aaaab1dc3aa]
[node007:19350] [7] 
func:/opt/openmpi.gcc/lib/openmpi/mca_pml_ob1.so(mca_pml_ob1_probe+0x3c5) 
[0x2aaab0142d55]
[node007:19350] [8] func:/opt/openmpi.gcc/lib/libmpi.so.0(PMPI_Probe+0xd8) 
[0x2aaaaad3aef8]
[node007:19350] [9] 
func:/home/local/bin/mpiblast(_ZN8MpiBlast6workerEv+0x6c1) [0x461ef1]
[node007:19350] [10] 
func:/home/local/bin/mpiblast(_ZN8MpiBlast4mainEiPPc+0x1af1) [0x465a11]
[node007:19350] [11] func:/home/local/bin/mpiblast(main+0x232) [0x468252]
[node007:19350] [12] func:/lib64/libc.so.6(__libc_start_main+0xf4) 
[0x3860e1d8a4]
[node007:19350] [13] 
func:/home/local/bin/mpiblast(__gxx_personality_v0+0x469) [0x449429]
[node007:19350] Delaying for 1 seconds before aborting
2 processes killed (possibly by Open MPI)

Does this ring a bell to anyone?

Zhiliang


More information about the Bioclusters mailing list