[Bioclusters] SGE jobs staying in dr state
Chris Dagdigian
dag at sonsorol.org
Wed Mar 30 13:45:10 EST 2005
Some questions -
- Which <major> version of Grid Engine are you running? (5.x vs 6.x)
- If using 6.x are you configured with berkeley-db spooling or
"classic" spooling?
{In SGE 5 or SGE 6 with classic-mode spooling, state information and
configuration on running/active jobs is stored in text files. If you can
live with shutting down the SGE master for a few seconds you can make
some "big hammer" type changes by editing/deleting the spool and active
job data files. Not recommended for novice users/admins though. }
- Is sge_execd running on the nodes where the phantom jobs are or did
you shut SGE down? You may have to fire the daemons back up just to
allow for the job deletion state messages to pass back and forth
- any interesting logfile messages?
Look in $SGE_ROOT/$SGE_CELL/spool/qmaster/messages as well as
$SGE_ROOT/<CELL>/spool/<nodename>/messages to see if anything obvious occurs
Also there is a dedicated Grid Engine mailing list
(users at gridengine.sunsource.net) with an active crowd of experts willing
to help with issues like this. The list is worth monitoring if you are a
heavy Grid Engine user and it is worth searching the list archives if
you experience odd problems. More info is here:
http://gridengine.sunsource.net/servlets/ProjectMailingListList
-Chris
Shane Brubaker wrote:
>
> Hi, I have some SGE jobs which stay in a "dr" state and will not go
> away. I have issued a qdel command on these jobs, so they are in a
> "deleted, running"
> state. Usually such jobs will go away after a few minutes, but these
> won't. I also can't delete that queue now because it has jobs in it.
>
> These happened to be fairly long jobs that ran a day or two. Also,
> these jobs do not show up on the actual nodes, so they aren't really
> running anymore. They only
> appear in qstat.
>
> Any help would be much appreciated.
>
More information about the Bioclusters
mailing list