[Bioclusters] SGE jobs staying in dr state
A.P. Jason de Koning
apjdk at albany.edu
Wed Mar 30 13:43:07 EST 2005
Hi Shane,
This happens to us sometimes when the SGE daemons on the node/s in
question have terminated for some reason. I would make sure they are
running on the nodes in question and re-launch them if not. That
usually works for me.
Cheers,
- Jason de Koning,
University at Albany
On Mar 30, 2005, at 1:20 PM, Shane Brubaker wrote:
>
> Hi, I have some SGE jobs which stay in a "dr" state and will not go
> away. I have issued a qdel command on these jobs, so they are in a
> "deleted, running"
> state. Usually such jobs will go away after a few minutes, but these
> won't. I also can't delete that queue now because it has jobs in it.
>
> These happened to be fairly long jobs that ran a day or two. Also,
> these jobs do not show up on the actual nodes, so they aren't really
> running anymore. They only
> appear in qstat.
>
> Any help would be much appreciated.
>
>
> Thank You,
> Shane Brubaker
> JGI
>
> At 09:09 AM 3/30/2005, you wrote:
>> Send Bioclusters mailing list submissions to
>> bioclusters at bioinformatics.org
>>
>> To subscribe or unsubscribe via the World Wide Web, visit
>> https://bioinformatics.org/mailman/listinfo/bioclusters
>> or, via email, send a message with subject or body 'help' to
>> bioclusters-request at bioinformatics.org
>>
>> You can reach the person managing the list at
>> bioclusters-owner at bioinformatics.org
>>
>> When replying, please edit your Subject line so it is more specific
>> than "Re: Contents of Bioclusters digest..."
>>
>>
>> Today's Topics:
>>
>> 1. Re: alternative DHCP implementations?
>> (jason.calvert at novartis.com)
>> 2. Re: alternative DHCP implementations? (Lars G. T. Jorgensen)
>>
>>
>> ----------------------------------------------------------------------
>>
>> Message: 1
>> Date: Wed, 30 Mar 2005 11:43:17 -0400
>> From: jason.calvert at novartis.com
>> Subject: Re: [Bioclusters] alternative DHCP implementations?
>> To: "Clustering, compute farming & distributed computing in
>> life
>> science informatics" <bioclusters at bioinformatics.org>
>> Message-ID:
>> <OFE1C386E6.4CD361C1-ON85256FD4.00560E7A
>> -85256FD4.00565C5B at EU.novartis.net>
>>
>> Content-Type: text/plain; charset="us-ascii"
>>
>> There are scripts within the OSCAR release to do this for you. You
>> can
>> start the scripts, power on the nodes in the order you wish, and then
>> assign them to auto generated hostnames. The scripts output the
>> dhcp.conf
>> file.
>>
>> I would think you could pull them out of oscar pretty easily.
>>
>> Jason
>>
>>
>>
>>
>> Chris Dagdigian <dag at sonsorol.org>
>> Sent by:
>> bioclusters-
>> bounces+jason.calvert=pharma.novartis.com at bioinformatics.org
>> 03/29/2005 02:27 PM
>> Please respond to "Clustering, compute farming & distributed
>> computing in
>> life science informatics"
>>
>>
>> To: adamm at menlo.com, "Clustering, compute farming &
>> distributed computing in
>> life science informatics" <bioclusters at bioinformatics.org>
>> cc: (bcc: Jason Calvert/PH/Novartis)
>> Subject: Re: [Bioclusters] alternative DHCP
>> implementations?
>>
>>
>>
>>
>> Agreed. It was just a shortcut. We already do allocation of IP based
>> on
>> MAC address but that only works when you know the MAC address
>> information ahead of time. This is rare especially on whitebox
>> cluster
>> projects where people don't put the MAC on the product packaging or on
>> the chassis itself. Some vendors do a good job of making the data easy
>> to find and others simply don't bother.
>>
>> A dhcp server handing out dynamic-range leases in a predictable manner
>> is what allowed us to easily map MAC address to node position and
>> nodename simply by powering on the nodes for PXE boot in the order in
>> which they are racked and stacked. Once this was done we had the
>> MAC->Node mapping data we needed to generate the static allocation
>> entries.
>>
>> A workaround for non-predictable allocation is to simply power on the
>> cluster in the order in which you want things named, then parse the
>> dhcpd leases file for both the MAC address *and* the timestamp
>> representing the lease handout. That would allow you to map MAC ->
>> Node
>> without having to care about hostnames for the first pass MAC
>> collection
>> phase. Then you build the static-by-mac entries into the conf file
>> and
>> problem solved. If we stick with ISC DHCP this is a possibility...
>>
>> c
>>
>> Adam S. Moskowitz wrote:
>> > Chris,
>> >
>> >
>> >>We are thinking about trying to find a replacement DHCP server that
>> has
>> >>a predictable method of allocating dynamic IP addresses (even if
>> only
>> >>for the initial cluster deployment)
>> >
>> >
>> > I think it's a bad idea to rely on such behavior. I don't remember
>> what
>> > the RFC says, but in general, unless the RFC guarantees an
>> > implementation should behave a particular way, you are asking for
>> > trouble to rely on specific behavior.
>> >
>> > A great example of this is how round-robin DNS used to work and
>> then how
>> > it changed and lots of things broke.
>> >
>> > DHCP isn't meant to do what you're asking it to do, so I strongly
>> > suggest you not use it to solve that particular problem.
>> >
>> > That said, DHCP supports a mechanism for binding specific IP
>> addresses
>> > to specific MAC addresses, even though the assignment is still done
>> > dynamically. Yes, this is a bit more work, but at least it's
>> guaranteed
>> > behavior.
>> _______________________________________________
>> Bioclusters maillist - Bioclusters at bioinformatics.org
>> https://bioinformatics.org/mailman/listinfo/bioclusters
>>
>>
>>
>>
>> ______________________________________________________________________
>> The Novartis email address format has changed to
>> firstname.lastname at novartis.com. Please update your address book
>> accordingly.
>> ______________________________________________________________________
>> -------------- next part --------------
>> An HTML attachment was scrubbed...
>> URL:
>> http://bioinformatics.org/pipermail/bioclusters/attachments/20050330/
>> 217cb3f0/attachment-0001.htm
>>
>> ------------------------------
>>
>> Message: 2
>> Date: Wed, 30 Mar 2005 14:55:55 +0200
>> From: "Lars G. T. Jorgensen" <lars at binf.ku.dk>
>> Subject: Re: [Bioclusters] alternative DHCP implementations?
>> To: "Clustering, compute farming & distributed computing in
>> life
>> science informatics" <bioclusters at bioinformatics.org>
>> Message-ID: <424AA1DB.8000702 at binf.ku.dk>
>> Content-Type: text/plain; charset=ISO-8859-1; format=flowed
>>
>> Chris Dagdigian wrote:
>>
>> >
>> >
>> > Agreed. It was just a shortcut. We already do allocation of IP based
>> > on MAC address but that only works when you know the MAC address
>> > information ahead of time. This is rare especially on whitebox
>> > cluster projects where people don't put the MAC on the product
>> > packaging or on the chassis itself. Some vendors do a good job of
>> > making the data easy to find and others simply don't bother.
>> >
>> > A dhcp server handing out dynamic-range leases in a predictable
>> manner
>> > is what allowed us to easily map MAC address to node position and
>> > nodename simply by powering on the nodes for PXE boot in the order
>> in
>> > which they are racked and stacked. Once this was done we had the
>> > MAC->Node mapping data we needed to generate the static allocation
>> > entries.
>> >
>> > A workaround for non-predictable allocation is to simply power on
>> the
>> > cluster in the order in which you want things named, then parse the
>> > dhcpd leases file for both the MAC address *and* the timestamp
>> > representing the lease handout. That would allow you to map MAC ->
>> > Node without having to care about hostnames for the first pass MAC
>> > collection phase. Then you build the static-by-mac entries into the
>> > conf file and problem solved. If we stick with ISC DHCP this is a
>> > possibility...
>>
>> Or buy an switch with a bit of inteligence that can dump the machines
>> MAC addresses based on ports.
>>
>> >
>> > c
>> >
>> > Adam S. Moskowitz wrote:
>> >
>> >> Chris,
>> >>
>> >>
>> >>> We are thinking about trying to find a replacement DHCP server
>> that
>> >>> has a predictable method of allocating dynamic IP addresses (even
>> if
>> >>> only for the initial cluster deployment)
>> >>
>> >>
>> >>
>> >> I think it's a bad idea to rely on such behavior. I don't remember
>> what
>> >> the RFC says, but in general, unless the RFC guarantees an
>> >> implementation should behave a particular way, you are asking for
>> >> trouble to rely on specific behavior.
>> >>
>> >> A great example of this is how round-robin DNS used to work and
>> then how
>> >> it changed and lots of things broke.
>> >>
>> >> DHCP isn't meant to do what you're asking it to do, so I strongly
>> >> suggest you not use it to solve that particular problem.
>> >>
>> >> That said, DHCP supports a mechanism for binding specific IP
>> addresses
>> >> to specific MAC addresses, even though the assignment is still done
>> >> dynamically. Yes, this is a bit more work, but at least it's
>> guaranteed
>> >> behavior.
>> >
>> > _______________________________________________
>> > Bioclusters maillist - Bioclusters at bioinformatics.org
>> > https://bioinformatics.org/mailman/listinfo/bioclusters
>> >
>>
>>
>> --
>> Mvh|Regards, Lars
>> System Administrator, Phone: 3532 1349, Room: 318
>>
>> Bioinformatics Centre, University of Copenhagen,
>> Universitetsparken 15
>> DK-2100 Copenhagen
>> DENMARK
>>
>>
>>
>>
>>
>> ------------------------------
>>
>> _______________________________________________
>> Bioclusters maillist - Bioclusters at bioinformatics.org
>> https://bioinformatics.org/mailman/listinfo/bioclusters
>>
>>
>> End of Bioclusters Digest, Vol 5, Issue 28
>> ******************************************
>
> _______________________________________________
> Bioclusters maillist - Bioclusters at bioinformatics.org
> https://bioinformatics.org/mailman/listinfo/bioclusters
>
More information about the Bioclusters
mailing list