[Bioclusters] mpiBLAST recovery?
Iddo Friedberg
idoerg at gmail.com
Fri Nov 9 03:48:10 EST 2007
Hi,
I am using mpiblast via the Sun Grid Engine. Is there a way to recover and
rerun mpiblast once a node is down (and subsequently goes up again?) I have
a downed node, and it seems that everything froze since it went down. It
will probably not be up until tomorrow, when our sysadmin comes in. I'd hate
to lose whatever work I already accumulated.
Just to appraise you of the situation, the downed node is called ikelite-3-8
I ssh'd to one of the working nodes (ikelite-3-5) and did the following:
idoerg at ikelite-3-5 ~]$ ps -lef | grep mpiblast
0 S idoerg 768 767 0 78 0 - 13774 rt_sig Nov08 ? 00:00:00
tcsh -c /opt/Bio/mpiblast/bin/mpiblast ikelite-3-8 34397 \-p4amslave
\-p4yourname ikelite-3-5 \-p4rmrank 14
0 R idoerg 835 768 98 85 0 - 77560 - Nov08 ? 12:13:36
/opt/Bio/mpiblast/bin/mpiblast ikelite-3-8 34397 4amslave -p4yourname
ikelite-3-5 -p4rmrank 14
1 S idoerg 838 835 0 76 0 - 66503 - Nov08 ? 00:00:00
/opt/Bio/mpiblast/bin/mpiblast ikelite-3-8 34397 4amslave -p4yourname
ikelite-3-5 -p4rmrank 14
0 S idoerg 842 841 0 78 0 - 13774 rt_sig Nov08 ? 00:00:00
tcsh -c /opt/Bio/mpiblast/bin/mpiblast ikelite-3-8 34397 \-p4amslave
\-p4yourname ikelite-3-5 \-p4rmrank 15
0 S idoerg 909 842 99 85 0 - 77592 - Nov08 ? 12:16:27
/opt/Bio/mpiblast/bin/mpiblast ikelite-3-8 34397 4amslave -p4yourname
ikelite-3-5 -p4rmrank 15
1 S idoerg 910 909 0 76 0 - 66503 - Nov08 ? 00:00:00
/opt/Bio/mpiblast/bin/mpiblast ikelite-3-8 34397 4amslave -p4yourname
ikelite-3-5 -p4rmrank 15
So as you can see, there is an attempt to ssh to ikelite-3-8, but of course
it cannot since ikelite-3-8 is down.
Thanks fro any help!
Iddo
--
I. Friedberg
"The only problem with troubleshooting is that
sometimes trouble shoots back."
More information about the Bioclusters
mailing list