[Bioclusters] SGE jobs staying in dr state

A.P. Jason de Koning apjdk at albany.edu
Wed Mar 30 13:43:07 EST 2005


Hi Shane,

This happens to us sometimes when the SGE daemons on the node/s in  
question have terminated for some reason.  I would make sure they are  
running on the nodes in question and re-launch them if not.  That  
usually works for me.

Cheers,
- Jason de Koning,
   University at Albany

On Mar 30, 2005, at 1:20 PM, Shane Brubaker wrote:

>
> Hi, I have some SGE jobs which stay in a "dr" state and will not go  
> away.  I have issued a qdel command on these jobs, so they are in a  
> "deleted, running"
> state.  Usually such jobs will go away after a few minutes, but these  
> won't.  I also can't delete that queue now because it has jobs in it.
>
> These happened to be fairly long jobs that ran a day or two.  Also,  
> these jobs do not show up on the actual nodes, so they aren't really  
> running anymore.  They only
> appear in qstat.
>
> Any help would be much appreciated.
>
>
> Thank You,
> Shane Brubaker
> JGI
>
> At 09:09 AM 3/30/2005, you wrote:
>> Send Bioclusters mailing list submissions to
>>         bioclusters at bioinformatics.org
>>
>> To subscribe or unsubscribe via the World Wide Web, visit
>>         https://bioinformatics.org/mailman/listinfo/bioclusters
>> or, via email, send a message with subject or body 'help' to
>>         bioclusters-request at bioinformatics.org
>>
>> You can reach the person managing the list at
>>         bioclusters-owner at bioinformatics.org
>>
>> When replying, please edit your Subject line so it is more specific
>> than "Re: Contents of Bioclusters digest..."
>>
>>
>> Today's Topics:
>>
>>    1. Re: alternative DHCP implementations?  
>> (jason.calvert at novartis.com)
>>    2. Re: alternative DHCP implementations? (Lars G. T. Jorgensen)
>>
>>
>> ----------------------------------------------------------------------
>>
>> Message: 1
>> Date: Wed, 30 Mar 2005 11:43:17 -0400
>> From: jason.calvert at novartis.com
>> Subject: Re: [Bioclusters] alternative DHCP implementations?
>> To: "Clustering,        compute farming & distributed computing in  
>> life
>>         science informatics"    <bioclusters at bioinformatics.org>
>> Message-ID:
>> <OFE1C386E6.4CD361C1-ON85256FD4.00560E7A 
>> -85256FD4.00565C5B at EU.novartis.net>
>>
>> Content-Type: text/plain; charset="us-ascii"
>>
>> There are scripts within the OSCAR release to do this for you.  You  
>> can
>> start the scripts, power on the nodes in the order you wish, and then
>> assign them to auto generated hostnames.  The scripts output the  
>> dhcp.conf
>> file.
>>
>> I would think you could pull them out of oscar pretty easily.
>>
>> Jason
>>
>>
>>
>>
>> Chris Dagdigian <dag at sonsorol.org>
>> Sent by:
>> bioclusters- 
>> bounces+jason.calvert=pharma.novartis.com at bioinformatics.org
>> 03/29/2005 02:27 PM
>> Please respond to "Clustering,  compute farming & distributed  
>> computing in
>> life science informatics"
>>
>>
>>         To:     adamm at menlo.com, "Clustering,  compute farming &  
>> distributed computing in
>> life science informatics" <bioclusters at bioinformatics.org>
>>         cc:     (bcc: Jason Calvert/PH/Novartis)
>>         Subject:        Re: [Bioclusters] alternative DHCP  
>> implementations?
>>
>>
>>
>>
>> Agreed. It was just a shortcut. We already do allocation of IP based  
>> on
>> MAC address but that only works when you know the MAC address
>> information ahead of time.  This is rare especially on whitebox  
>> cluster
>> projects where people don't put the MAC on the product packaging or on
>> the chassis itself. Some vendors do a good job of making the data easy
>> to find and others simply don't bother.
>>
>> A dhcp server handing out dynamic-range leases in a predictable manner
>> is what allowed us to easily map MAC address to node position and
>> nodename simply by powering on the nodes for PXE boot in the order in
>> which they are racked and stacked. Once this was done we had the
>> MAC->Node mapping data we needed to generate the static allocation
>> entries.
>>
>> A workaround for non-predictable allocation is to simply power on the
>> cluster in the order in which you want things named, then parse the
>> dhcpd leases file for both the MAC address *and* the timestamp
>> representing the lease handout. That would allow you to map MAC ->  
>> Node
>> without having to care about hostnames for the first pass MAC  
>> collection
>> phase.  Then you build the static-by-mac entries into the conf file  
>> and
>> problem solved. If we stick with ISC DHCP this is a possibility...
>>
>> c
>>
>> Adam S. Moskowitz wrote:
>> > Chris,
>> >
>> >
>> >>We are thinking about trying to find a replacement DHCP server that  
>> has
>> >>a predictable method of allocating dynamic IP addresses (even if  
>> only
>> >>for the initial cluster deployment)
>> >
>> >
>> > I think it's a bad idea to rely on such behavior. I don't remember  
>> what
>> > the RFC says, but in general, unless the RFC guarantees an
>> > implementation should behave a particular way, you are asking for
>> > trouble to rely on specific behavior.
>> >
>> > A great example of this is how round-robin DNS used to work and  
>> then how
>> > it changed and lots of things broke.
>> >
>> > DHCP isn't meant to do what you're asking it to do, so I strongly
>> > suggest you not use it to solve that particular problem.
>> >
>> > That said, DHCP supports a mechanism for binding specific IP  
>> addresses
>> > to specific MAC addresses, even though the assignment is still done
>> > dynamically. Yes, this is a bit more work, but at least it's  
>> guaranteed
>> > behavior.
>> _______________________________________________
>> Bioclusters maillist  -  Bioclusters at bioinformatics.org
>> https://bioinformatics.org/mailman/listinfo/bioclusters
>>
>>
>>
>>
>> ______________________________________________________________________
>> The Novartis email address format has changed to
>> firstname.lastname at novartis.com.  Please update your address book
>> accordingly.
>> ______________________________________________________________________
>> -------------- next part --------------
>> An HTML attachment was scrubbed...
>> URL:  
>> http://bioinformatics.org/pipermail/bioclusters/attachments/20050330/ 
>> 217cb3f0/attachment-0001.htm
>>
>> ------------------------------
>>
>> Message: 2
>> Date: Wed, 30 Mar 2005 14:55:55 +0200
>> From: "Lars G. T. Jorgensen" <lars at binf.ku.dk>
>> Subject: Re: [Bioclusters] alternative DHCP implementations?
>> To: "Clustering,        compute farming & distributed computing in  
>> life
>>         science informatics"    <bioclusters at bioinformatics.org>
>> Message-ID: <424AA1DB.8000702 at binf.ku.dk>
>> Content-Type: text/plain; charset=ISO-8859-1; format=flowed
>>
>> Chris Dagdigian wrote:
>>
>> >
>> >
>> > Agreed. It was just a shortcut. We already do allocation of IP based
>> > on MAC address but that only works when you know the MAC address
>> > information ahead of time.  This is rare especially on whitebox
>> > cluster projects where people don't put the MAC on the product
>> > packaging or on the chassis itself. Some vendors do a good job of
>> > making the data easy to find and others simply don't bother.
>> >
>> > A dhcp server handing out dynamic-range leases in a predictable  
>> manner
>> > is what allowed us to easily map MAC address to node position and
>> > nodename simply by powering on the nodes for PXE boot in the order  
>> in
>> > which they are racked and stacked. Once this was done we had the
>> > MAC->Node mapping data we needed to generate the static allocation
>> > entries.
>> >
>> > A workaround for non-predictable allocation is to simply power on  
>> the
>> > cluster in the order in which you want things named, then parse the
>> > dhcpd leases file for both the MAC address *and* the timestamp
>> > representing the lease handout. That would allow you to map MAC ->
>> > Node without having to care about hostnames for the first pass MAC
>> > collection phase.  Then you build the static-by-mac entries into the
>> > conf file and problem solved. If we stick with ISC DHCP this is a
>> > possibility...
>>
>> Or buy an switch with a bit of inteligence that can dump the machines
>> MAC addresses based on ports.
>>
>> >
>> > c
>> >
>> > Adam S. Moskowitz wrote:
>> >
>> >> Chris,
>> >>
>> >>
>> >>> We are thinking about trying to find a replacement DHCP server  
>> that
>> >>> has a predictable method of allocating dynamic IP addresses (even  
>> if
>> >>> only for the initial cluster deployment)
>> >>
>> >>
>> >>
>> >> I think it's a bad idea to rely on such behavior. I don't remember  
>> what
>> >> the RFC says, but in general, unless the RFC guarantees an
>> >> implementation should behave a particular way, you are asking for
>> >> trouble to rely on specific behavior.
>> >>
>> >> A great example of this is how round-robin DNS used to work and  
>> then how
>> >> it changed and lots of things broke.
>> >>
>> >> DHCP isn't meant to do what you're asking it to do, so I strongly
>> >> suggest you not use it to solve that particular problem.
>> >>
>> >> That said, DHCP supports a mechanism for binding specific IP  
>> addresses
>> >> to specific MAC addresses, even though the assignment is still done
>> >> dynamically. Yes, this is a bit more work, but at least it's  
>> guaranteed
>> >> behavior.
>> >
>> > _______________________________________________
>> > Bioclusters maillist  -  Bioclusters at bioinformatics.org
>> > https://bioinformatics.org/mailman/listinfo/bioclusters
>> >
>>
>>
>> --
>> Mvh|Regards, Lars
>> System Administrator, Phone: 3532 1349, Room: 318
>>
>> Bioinformatics Centre, University of Copenhagen,
>> Universitetsparken 15
>> DK-2100 Copenhagen
>> DENMARK
>>
>>
>>
>>
>>
>> ------------------------------
>>
>> _______________________________________________
>> Bioclusters maillist  -  Bioclusters at bioinformatics.org
>> https://bioinformatics.org/mailman/listinfo/bioclusters
>>
>>
>> End of Bioclusters Digest, Vol 5, Issue 28
>> ******************************************
>
> _______________________________________________
> Bioclusters maillist  -  Bioclusters at bioinformatics.org
> https://bioinformatics.org/mailman/listinfo/bioclusters
>



More information about the Bioclusters mailing list