[Bioclusters] weird MPI problem

Jeremy Mann bioclusters@bioinformatics.org
Thu, 22 May 2003 15:43:36 -0500 (CDT)


Maybe somebody on this list can help me figure a weird MPI problem. As
most of you know, we run mpiblast, which has been working fine until this
afternoon. We ran several blast jobs this morning, went to lunch, came
back and started to do more blast jobs. Well, mpiblast is now erroring out
and frankly I don't know why since it was working perfectly maybe an hour
before. Anyway, here is the error I get:

<snip>

jeremy@bioinf:~/dnaseqs$ /usr/local/lam/bin/mpirun -np 22 mpiblast -f
/usr/local/mpiblast.conf -p
blastn -d nr -i nr-protein.fasta -o out2

[blastall] ERROR: Threshold for extending hits, default if zero
      blastp 11, blastn 0, blastx 12, tblastn 13
      tblastx 13, megablast 0 [/usr/local/mpiblast.conf] is bad or out of
range [? to ?]
0       0.056177        Bailing out with signal 11
-----------------------------------------------------------------------------

One of the processes started by mpirun has exited with a nonzero exit
code.  This typically indicates that the process finished in error.
If your process did not finish in error, be sure to include a "return
0" or "exit(0)" in your C code before exiting the application.

PID 5236 failed on node n1 with exit status 1.
-----------------------------------------------------------------------------
MPI_Ssend: process in local group is dead (rank 11, MPI_COMM_WORLD)
Rank (11, MPI_COMM_WORLD): Call stack within LAM:
Rank (11, MPI_COMM_WORLD):  - MPI_Ssend()
Rank (11, MPI_COMM_WORLD):  - main()
MPI_Ssend: process in local group is dead (rank 5, MPI_COMM_WORLD)
Rank (5, MPI_COMM_WORLD): Call stack within LAM:
Rank (5, MPI_COMM_WORLD):  - MPI_Ssend()
Rank (5, MPI_COMM_WORLD):  - main()
MPI_Ssend: process in local group is dead (rank 13, MPI_COMM_WORLD)
Rank (13, MPI_COMM_WORLD): Call stack within LAM:
Rank (13, MPI_COMM_WORLD):  - MPI_Ssend()
Rank (13, MPI_COMM_WORLD):  - main()
MPI_Ssend: process in local group is dead (rank 6, MPI_COMM_WORLD)
Rank (6, MPI_COMM_WORLD): Call stack within LAM:
MPI_Ssend: process in local group is dead (rank 12, MPI_COMM_WORLD)
Rank (12, MPI_COMM_WORLD): Call stack within LAM:
MPI_Ssend: process in local group is dead (rank 21, MPI_COMM_WORLD)
Rank (21, MPI_COMM_WORLD): Call stack within LAM:
Rank (21, MPI_COMM_WORLD):  - MPI_Ssend()
MPI_Ssend: process in local group is dead (rank 14, MPI_COMM_WORLD)
Rank (14, MPI_COMM_WORLD): Call stack within LAM:
MPI_Ssend: process in local group is dead (rank 9, MPI_COMM_WORLD)
Rank (9, MPI_COMM_WORLD): Call stack within LAM:
Rank (9, MPI_COMM_WORLD):  - MPI_Ssend()
Rank (9, MPI_COMM_WORLD):  - main()
MPI_Ssend: process in local group is dead (rank 15, MPI_COMM_WORLD)
Rank (15, MPI_COMM_WORLD): Call stack within LAM:
Rank (15, MPI_COMM_WORLD):  - MPI_Ssend()
Rank (15, MPI_COMM_WORLD):  - main()
MPI_Ssend: process in local group is dead (rank 17, MPI_COMM_WORLD)
Rank (17, MPI_COMM_WORLD): Call stack within LAM:
Rank (17, MPI_COMM_WORLD):  - MPI_Ssend()
Rank (17, MPI_COMM_WORLD):  - main()
MPI_Ssend: process in local group is dead (rank 7, MPI_COMM_WORLD)
Rank (7, MPI_COMM_WORLD): Call stack within LAM:
Rank (7, MPI_COMM_WORLD):  - MPI_Ssend()
Rank (7, MPI_COMM_WORLD):  - main()
MPI_Ssend: process in local group is dead (rank 19, MPI_COMM_WORLD)
Rank (19, MPI_COMM_WORLD): Call stack within LAM:
Rank (19, MPI_COMM_WORLD):  - MPI_Ssend()
Rank (19, MPI_COMM_WORLD):  - main()
Rank (21, MPI_COMM_WORLD):  - main()
MPI_Ssend: process in local group is dead (rank 10, MPI_COMM_WORLD)
Rank (10, MPI_COMM_WORLD): Call stack within LAM:
Rank (10, MPI_COMM_WORLD):  - MPI_Ssend()
Rank (10, MPI_COMM_WORLD):  - main()
MPI_Ssend: process in local group is dead (rank 18, MPI_COMM_WORLD)
Rank (18, MPI_COMM_WORLD): Call stack within LAM:
Rank (18, MPI_COMM_WORLD):  - MPI_Ssend()
Rank (18, MPI_COMM_WORLD):  - main()
Rank (14, MPI_COMM_WORLD):  - MPI_Ssend()
Rank (14, MPI_COMM_WORLD):  - main()
MPI_Ssend: process in local group is dead (rank 16, MPI_COMM_WORLD)
Rank (16, MPI_COMM_WORLD): Call stack within LAM:
Rank (16, MPI_COMM_WORLD):  - MPI_Ssend()
Rank (16, MPI_COMM_WORLD):  - main()
MPI_Ssend: process in local group is dead (rank 8, MPI_COMM_WORLD)
Rank (8, MPI_COMM_WORLD): Call stack within LAM:
Rank (8, MPI_COMM_WORLD):  - MPI_Ssend()
Rank (8, MPI_COMM_WORLD):  - main()
Rank (6, MPI_COMM_WORLD):  - MPI_Ssend()
Rank (6, MPI_COMM_WORLD):  - main()
MPI_Ssend: process in local group is dead (rank 4, MPI_COMM_WORLD)
Rank (4, MPI_COMM_WORLD): Call stack within LAM:
Rank (4, MPI_COMM_WORLD):  - MPI_Ssend()
Rank (4, MPI_COMM_WORLD):  - main()
MPI_Ssend: process in local group is dead (rank 20, MPI_COMM_WORLD)
Rank (20, MPI_COMM_WORLD): Call stack within LAM:
Rank (20, MPI_COMM_WORLD):  - MPI_Ssend()
Rank (20, MPI_COMM_WORLD):  - main()
Rank (12, MPI_COMM_WORLD):  - MPI_Ssend()
Rank (12, MPI_COMM_WORLD):  - main()

Can somebody clue me into what is going on here? Thanks a bunch!




-- 
Jeremy Mann
jeremy@biochem.uthscsa.edu

University of Texas Health Science Center
Bioinformatics Core Facility
http://www.bioinformatics.uthscsa.edu
Phone: (210) 567-2672