John, There is already a 50 node dual-Xeon cluster at MSKCC using Sun Grid Engine 6.0u1 and SystemImager with Suse 9.1 linux. I'm sure you know those folks already so see what they think about SGE :) Your notes are complete and it sounds as though you have a plan. Some comments -- "most" people doing similar work use a dedicated switch for performance and reduction of administrative burden. The VLAN should work as long as the performance is reasonable. Systemimager works great and I think it was originally developed on debian although they've since changed the codebase such that you don't need to be running debian to work as a developer on it. For your desire to install Linux and sync updates easily and with little effort I'd recommend SystemImager as one path-of-least-resistance option. If your users really want transparency without having to think much about the cluster then maybe you should try the MOSIX approach first to see if the transparent distribution will suit your needs. You may find that you want, need or require the features that a batch scheduler or distributed resource management layer will give you. In this case the clear winner is Grid Engine over OpenPBS (no contest really in my opinion). The "problem" with Grid Engine or any other similar product (when compared to your stated requirements) is that your users will have to think at least a *tiny* bit about the cluster and how to run jobs. At the bare minimum they'll need to use the qsub program to submit their jobs for execution. After that Grid Engine handles the load balancing, remote execution and policy based resource allocation. This is why I recommend experimenting with MOSIX first -- it may be more "transparent" for your users at first brush. If MOSIX does not work out well for you then you'll need to spend a bit of time educating your users how to use Grid Engine to get their work done. Not terribly difficult and there are some SGE savvy users at MSKCC by now anyway. Speakman, John H./Epidemiology-Biostatistics wrote: > Hello > > If anyone can review the below and suggest a way to go, or even better > something I have gotten completely wrong, it would be much appreciated! > > Thanks > John > > Hardware: > > Ten HP Proliant nodes, one DL380 and nine DL140. Each node has two > 3.2Ghz Xeon processors. They do not have a dedicated switch; the > infrastructure folks say they want to implement this using a VLAN. We > have some performance concerns here but have agreed to give it a try. > > User characteristics: > > The users are biostatisticians who typically program in R; they often > use plug-in R modules like bioconductor. They always want the newest > version of R right away. Also they may also write programs in C or > Fortran. Data files are usually small. Nothing fancy like BLAST, etc. > > User concerns: > > Users require a Linux clustering environment which enables them to > interact with the cluster as though it were a single system (via ssh or > X) but which will distribute compute-intensive jobs across nodes. As > the code is by and large not multithreaded, it is expected that each job > will be farmed out to an idle compute node and probably stay there until > it is done. That's fine. In other words, to use all twenty CPUs we > will need twenty concurrent jobs. > > Administration concerns: > > The cluster must require the absolute minimum of configuration and > maintenance, because I've got to do it and I'm hardly ever around these > days. > > Other concerns: > > Users and administrators alike have a preference for Debian Linux over > other distributions. Users also have an aversion to non-free software. > Either or both of these considerations could be overridden if the > reasons were pressing. > > Cluster software requirements: > > (1) The cluster must have a mean of deploying Linux to the nodes and > keeping their configurations (including updates to the operating system > and applications, lists of users, printers, etc.) in synchronization. > (2) The cluster must have a means of transparently distributing jobs > to idle CPUs. It's not necessarily to actively rebalance this when a > job has started - it's okay if, once tied to a node, it stays there. > > Potential solutions: > > We like the look of NPACI Rocks but its non-Debian-ness makes it a last > resort only. What we would really like to try is a Debian version of > NPACI Rocks; in its absence we will probably have to use two separate > packages to fulfil the requirements of #1 and #2 above. > > Sensible options for #1 seem to be: > (1) SystemImager (www.systemimager.org > <http://www.systemimager.org/> ) > (2) FAI (http://www.informatik.uni-koeln.de/fai/), maybe also > involving the use of cfengine2 (http://www.iu.hio.no/cfengine/) > > SystemImager is the better-established product and looks to be simpler > to set up than FAI and/or cfengine2, in both of which the learning curve > looks steep. However, FAI seems more elegant and more like the idea of > "NPACI Rocks Debian" that we're looking for, implying that once set up > FAI/cfengine2 will require less ongoing maintenance. > > Sensible options for #2 seem to be: > > (1) OpenMosix > (2) OpenPBS > (3) Sun GridEngine N1 > > Note: all of the above have commercial versions; we'd be reluctant to > consider them unless it means big savings in administration time and > effort. We get the impression OpenMosix (and, to a lesser extent, > OpenPBS) have question marks over how much time and resources the people > maintaining these products have, suggesting bugs, instability and not > keeping up with kernel/library updates, etc. Sun GridEngine seems more > robust but does not seem to have a big Debian user base. > > What do you all should we try first? > > Thanks! > John > > -- Chris Dagdigian, <dag at sonsorol.org> BioTeam - Independent life science IT & informatics consulting Office: 617-665-6088, Mobile: 617-877-5498, Fax: 425-699-0193 PGP KeyID: 83D4310E iChat/AIM: bioteamdag Web: http://bioteam.net