Dear friend you can use SGE ( sun grid engine) this may solve your problems best wishes Bony De Kumar National facility for transgenic and gene knock out mice Lab#N301 CCMB, Uppal Road Hyderabad 500 007 Phone91-040-27192899 91-040-27160222-41(20 lines) Bony@ccmb.res.in Inbo@rediffmail.com On Mon, 16 Feb 2004 bioclusters-request@bioinformatics.org wrote: > When replying, PLEASE edit your Subject line so it is more specific > than "Re: Bioclusters digest, Vol..." And, PLEASE delete any > unrelated text from the body. > > > Today's Topics: > > 1. Using semi-public PCs for heavy computation jobs (Arnon Klein) > 2. Re: Using semi-public PCs for heavy computation jobs (Chris Dwan (CCGB)) > 3. Re: Using semi-public PCs for heavy computation jobs (Ron Chen) > 4. Re: Using semi-public PCs for heavy computation jobs (Arnon Klein) > 5. RE: Using semi-public PCs for heavy computation jobs (John Van Workum) > 6. Re: Using semi-public PCs for heavy computation jobs (Dan Bolser) > > --__--__-- > > Message: 1 > Date: Sun, 15 Feb 2004 20:49:26 +0200 > From: Arnon Klein <klein@pob.huji.ac.il> > To: Bioclusters <bioclusters@bioinformatics.org> > Subject: [Bioclusters] Using semi-public PCs for heavy computation jobs > Reply-To: bioclusters@bioinformatics.org > > As part of my graduate research, I need to run a job of a genome-wide > scale. Using all of the computers available to me at my lab, this can > take about 6 months. We don't have a cluster... > I am already making use of a students computer lab, after-hours. Those > computers run linux, and it was a no-brainer : just hacked some scripts > to rshell into the machines, activated by the crond. It's not enough, > though. > While I'm looking at the option of getting or buying CPU time on a > cluster, I am also tempted to make use of other public PCs at the > campus. The ideal thing here is to have something like SETI@home or > Fold@home, but I would go for anything that will allow me to have my > jobs running on as many PCs as possible here, while not making me the > enemy of the system admins... > We're talking about Windows based PCs (mostly 2000 or XP), at least some > of them are managed using a central image. > Right now it looks like the simplest option is to install sshd or telnet > service on them, and have a script that logs in after hours, and > execute some binary. However, I'm not sure this would go well with the > sys-admins (security implications?). > I think it would be best if I can approach the authorities with a > sensible plan - first impressions are very important... > > I would like to hear anything about this subject: configuration > suggestions, past experience, encouragements, discouragements, etc. > > Arnon > > --__--__-- > > Message: 2 > Date: Sun, 15 Feb 2004 14:10:13 -0600 (CST) > From: "Chris Dwan (CCGB)" <cdwan@mail.ahc.umn.edu> > To: Bioclusters <bioclusters@bioinformatics.org> > Subject: Re: [Bioclusters] Using semi-public PCs for heavy computation jobs > Reply-To: bioclusters@bioinformatics.org > > > > As part of my graduate research, I need to run a job of a genome-wide > > scale. Using all of the computers available to me at my lab, this can > > take about 6 months. We don't have a cluster... > > That must be a pretty impressive job. I'd love to hear more about your > research that takes so much CPU time. That, however, is probably a > different thread. I have some experience with the situation you describe. > > I've built up, by hook and crook, semi-integrated access to a moderate > number of compute resources (a few hundred CPUs) spread over several > campuses. I suspect we're in something of the same boat, resource wise. > You could call this hodgepodge a "grid," but that lends a dignity and > maturity to a system whose only really positive attribute is that it > functions to get real work done. > > Below is a list, easiest to hardest, of systems I've hooked in: > > * My very own cluster that I admin > * The cluster maintained by our local supercomputing center > * Clusters maintained by collaborators at other institutions > * Lab workstations that I admin, running Linux or OS X > * Lab workstations maintained by a someone else, running Linux or OS-X > * Lab workstations which usually run Windows, maintained by someone else, > which can be rebooted into Linux or OS X at night > > Then, of course, there are the systems which I decided would be too much > trouble, particularly given the number of CPUs in question: > > * Lab workstations running OS 9 or Windows, which I can't get rebooted > into Linux. > > > I am already making use of a students computer lab, after-hours. Those > > computers run linux, and it was a no-brainer : just hacked some scripts > > to rshell into the machines, activated by the crond. It's not enough, > > though. > > The major queuing systems for clusters (LSF, PBS-Pro (Torque?), and SGE) > each have facilities for cycle stealing from workstations. The very > best way to approach this situation is to convince the lab admin to set up > a queuing system to run jobs on those machines only in certain hours, or > (better) when there is nobody logged in at the terminal, the load is below > the number of CPUs, the mouse hasn't moved and no key has been struck in > 15 minutes, or whatever they're most comfortable with. > > Many folks on this list (myself included) have written our very own rsh / > cron based remote job execution system. Unless there's a really good > reason to do it (infinite CPUs available, but there's no chance of getting > a queueing system installed so you have to hack) experience says that it's > better to use an established package with a user support base and code > maintained by somebody else. > > A system which has been around a long time, is quite mature, and doesn't > get nearly enough credit is Condor (from the University of Wisconsin). > It's explicitly designed as a cycle scavanger. I know some cluster admins > who run condor to "backfill" their tightly scheduled cluster. > > Integrating a set of queuing systems across domains remains tricky. I've > found that, despite unlimited hype, grid software (including the globus > package) remains best suited to a single administrative domain, single > administrator setup. Of course, I haven't tried all of the offerings, and > I haven't installed globus this week...so things might have changed. I > encourage you to try all the options available and see what > works for you. Exclude Sun's "Grid Engine" from the above statement, as > they have a slightly different definition of a "grid" than we're talking > about here. > > Anyway, I've got a horrible, hacked together "metascheduler" that has > nothing going for it except the fact that it works. I would happily throw > it away if someone came out with a product or tool that did the same > thing. It, on a user by user basis maintains a list of the resources to > which that user can connect. It loops over a queue of jobs, checking to > see which resource is not overloaded and sending jobs out as appropriate. > > I've used both PBS and SGE's faclity for "sloughing off" jobs from one > queue to another. These are neat, unless you need to hand off jobs for > only some users, but not others, to go between SGE and PBS, or to really > keep track of errors. Probably best suited to a setup with several > queues maintained by a single admin or single team. > > > While I'm looking at the option of getting or buying CPU time on a > > cluster, I am also tempted to make use of other public PCs at the > > campus. The ideal thing here is to have something like SETI@home or > > Fold@home, but I would go for anything that will allow me to have my > > jobs running on as many PCs as possible here, while not making me the > > enemy of the system admins... > > United Devices sells a software package to do this, and some very large > corporate installations (2000+ CPUs) have been brough online. The trick > with systems like this is getting enough systems to offset the high > latencies and (generally) low performance CPUs. If you can convince your > University IT department to make a campus wide resource of this sort, it > will be terrific. On the other hand, you'll probably have to share it. > > In any solution you build, data motion and error detection / correction > will be the biggest time-sinks. > > I've found that, for jobs requiring a moderate size dataset (my core set > of BLAST targets is around 14GB) data motion should be decoupled from CPU > finding. I.e: I have one process that pushes data out to compute > resources on a regular basis, and jobs are only scheduled onto nodes that > have the needed data. This means that I have to ask my partners not just > for access to their CPUs, but for a bit of storage dedicated to me, as > close to the compute nodes as possible. > > > I think it would be best if I can approach the authorities with a > > sensible plan - first impressions are very important... > > The social aspects of this sort of distributed computation are, by far, > the most important. If there is trust between the administrative domains, > the rest is really just technical work. Without the trust, it's nearly > impossible to make even the best plan succeed. > > > I would like to hear anything about this subject: configuration > > suggestions, past experience, encouragements, discouragements, etc. > > Me too. I've got my experiences and opinions, and I'm always interested > in other takes on similar problems. On a totally selfish note, if anyone > wants to share CPUs with me, we can expand each other's grids. > > Any takers? > > -Chris Dwan > The University of Minnesota > > --__--__-- > > Message: 3 > Date: Sun, 15 Feb 2004 16:53:29 -0800 (PST) > From: Ron Chen <ron_chen_123@yahoo.com> > Subject: Re: [Bioclusters] Using semi-public PCs for heavy computation jobs > To: bioclusters@bioinformatics.org > Reply-To: bioclusters@bioinformatics.org > > GridEngine (SGE) 6.0 will integrate with JXTA, > offering JxGrid, to provide P2P workload management > like SETI@home. > > http://gridengine.sunsource.net/project/gridengine/workshop22-24.09.03/proceedings.html > "Resource Discovery in Sun Grid Engine using JXTA" > > However, SGE 6.0 is not available until May 2004. So I > suggest another package called "BOINC". > > http://boinc.berkeley.edu > > BOINC is free+opensource, supports multiple platforms > (Windows, Linux, Solaris, MacOSX). > > Your approach of installing sshd/telnetd is OK, but > the sys.admins will not like opening a port, since > hackers can get in easiler. BOINC does not leave a > port open, and it uses http to get the workload (so > easiler to go through firewalls). Moreover, it allows > suspending the work when users access the machine, and > allow better scheduling. Further it has better file > transfer than home-made solutions. > > I would suggest you to look at the link above as I do > not fully know all the features! > > -Ron > > --- Arnon Klein <klein@pob.huji.ac.il> wrote: > > As part of my graduate research, I need to run a job > > of a genome-wide > > scale. Using all of the computers available to me at > > my lab, this can > > take about 6 months. We don't have a cluster... > > I am already making use of a students computer lab, > > after-hours. Those > > computers run linux, and it was a no-brainer : just > > hacked some scripts > > to rshell into the machines, activated by the crond. > > It's not enough, > > though. > > While I'm looking at the option of getting or buying > > CPU time on a > > cluster, I am also tempted to make use of other > > public PCs at the > > campus. The ideal thing here is to have something > > like SETI@home or > > Fold@home, but I would go for anything that will > > allow me to have my > > jobs running on as many PCs as possible here, while > > not making me the > > enemy of the system admins... > > We're talking about Windows based PCs (mostly 2000 > > or XP), at least some > > of them are managed using a central image. > > Right now it looks like the simplest option is to > > install sshd or telnet > > service on them, and have a script that logs in > > after hours, and > > execute some binary. However, I'm not sure this > > would go well with the > > sys-admins (security implications?). > > I think it would be best if I can approach the > > authorities with a > > sensible plan - first impressions are very > > important... > > > > I would like to hear anything about this subject: > > configuration > > suggestions, past experience, encouragements, > > discouragements, etc. > > > > Arnon > > _______________________________________________ > > Bioclusters maillist - > > Bioclusters@bioinformatics.org > > > https://bioinformatics.org/mailman/listinfo/bioclusters > > > __________________________________ > Do you Yahoo!? > Yahoo! Finance: Get your refund fast by filing online. > http://taxes.yahoo.com/filing.html > > --__--__-- > > Message: 4 > Date: Mon, 16 Feb 2004 17:36:30 +0200 > From: Arnon Klein <klein@pob.huji.ac.il> > To: bioclusters@bioinformatics.org > Subject: Re: [Bioclusters] Using semi-public PCs for heavy computation jobs > Reply-To: bioclusters@bioinformatics.org > > Thanks Chris and Ron for the responses. I've found BOINC and Condor as > very interesting and possible solutions for my problem. Since my > software is built around java RMI (using Master/Worker paradigm), they > also feel the most natural transitions (The heavy calculations are done > in C, if anyone is worried about optimization...) . > I'll try to pull this off, and I'll come back to this list with the > story of how it went. > > Arnon > > > --__--__-- > > Message: 5 > From: "John Van Workum" <jdvw@tticluster.com> > To: <bioclusters@bioinformatics.org> > Subject: RE: [Bioclusters] Using semi-public PCs for heavy computation jobs > Date: Mon, 16 Feb 2004 11:10:36 -0500 > Reply-To: bioclusters@bioinformatics.org > > Arnon, > > You may want to look at GreenTea. It is a pure Java "grid" platform that may mesh well with your java RMI. > http://www.greenteatech.com/ > > Regards, > > John > TTI > > > -----Original Message----- > > From: bioclusters-admin@bioinformatics.org > > [mailto:bioclusters-admin@bioinformatics.org]On Behalf Of Arnon Klein > > Sent: Monday, February 16, 2004 10:37 AM > > To: bioclusters@bioinformatics.org > > Subject: Re: [Bioclusters] Using semi-public PCs for heavy computation > > jobs > > > > > > Thanks Chris and Ron for the responses. I've found BOINC and Condor as > > very interesting and possible solutions for my problem. Since my > > software is built around java RMI (using Master/Worker paradigm), they > > also feel the most natural transitions (The heavy calculations are done > > in C, if anyone is worried about optimization...) . > > I'll try to pull this off, and I'll come back to this list with the > > story of how it went. > > > > Arnon > > > > _______________________________________________ > > Bioclusters maillist - Bioclusters@bioinformatics.org > > https://bioinformatics.org/mailman/listinfo/bioclusters > > > > > --__--__-- > > Message: 6 > Date: Mon, 16 Feb 2004 16:12:37 +0000 (GMT) > From: Dan Bolser <dmb@mrc-dunn.cam.ac.uk> > To: bioclusters@bioinformatics.org > Subject: Re: [Bioclusters] Using semi-public PCs for heavy computation jobs > Reply-To: bioclusters@bioinformatics.org > > On Mon, 16 Feb 2004, Arnon Klein wrote: > > > Thanks Chris and Ron for the responses. I've found BOINC and Condor as > > very interesting and possible solutions for my problem. Since my > > software is built around java RMI (using Master/Worker paradigm), they > > also feel the most natural transitions (The heavy calculations are done > > in C, if anyone is worried about optimization...) . > > What kind of calculation are you doing? > Cheers, > Dan. > > > I'll try to pull this off, and I'll come back to this list with the > > story of how it went. > > > > Arnon > > > > _______________________________________________ > > Bioclusters maillist - Bioclusters@bioinformatics.org > > https://bioinformatics.org/mailman/listinfo/bioclusters > > > > > > --__--__-- > > _______________________________________________ > Bioclusters maillist - Bioclusters@bioinformatics.org > https://bioinformatics.org/mailman/listinfo/bioclusters > > > End of Bioclusters Digest >