Mike Coleman wrote: > > I've been thinking about setting up a Beowulf and giving these issues a lot > of thought. The advantage of going "diskless" is primarily administrative, > I think. Basically, you need some way of keeping your clients in sync > as you make changes over time. There seem to be only two good ways to > do this: share everything over NFS, or replicate changes. SystemImager > is an interesting take on the latter, and I like the core rsync idea, > but it looks more complicated than I was hoping for, and so I'm leaning > towards "diskless". > > (By "diskless", I mean not that the hosts have no disks, but that no (or > almost no) state is stored there. So, for example, the disks could be wiped > and mounted on /tmp at each boot.) > Diskless in the true diskless sense is good for several reasons: - One less component that can fail and cause downtime - Less initial cost per system - You can setup one centralized image on your controlling node; keeping systems synced is as simple as keeping the image updated There are of course situations where you might want this alternative implementation of diskless, such as wanting to cache application data locally, or to have scratch space for large data output. But in either form, "diskless" makes the management a bit more manageable. But in most cases, disks in each node is just wasted space, especially at 40+GB nowadays that come standard in these nodes. I suppose you could always buy machines with disks in them, and then take them out to be placed in a SAN array that hangs off the controller node/does your NFS. Getting back to Danny's original inquiry (whether to use beowulf or mosix), I think it's important to understand the differences between the available clustering solutions and methodologies. There are numerous ways to implement beowulf style clusters, with MOSIX(1) as one of them. MOSIX falls into the Single System Image (SSI) model, where processes start on a node (be it the controlling node or a compute node) and are transparently migrated back and forth between systems (this is done based on various metrics, such as load average, manual intervention, etc.) Using MOSIX depends on your needs. Is the code you're running parallel? Threaded? Serial? In general, MOSIX can theoretically handle all these situations pretty well, but if your needs fall into one specific category, there might be more optimized solutions for you. The Scyld Beowulf software(2) (from the guys that started this whole beowulf trend) also falls into the SSI model of clustering. The advantage of using their software is that they tie in some of the other aspects that most people need/want, such as batching and queuing, and provide a web-based management interface for it all. I've setup and managed several clusters over the last few years. The most recent was a 32 node cluster I designed and implemented for the Univ. of Chicago Geophysics department. I chose Debian as the Linux distribution, and used a package called Fully Automated Installer (FAI)(3) to deal with building the nodes. Essentially, one can add a new, fully functioning node to the cluster inside of 10 minutes. A new node will netboot, and the FAI scripts take care of partitioning, installing a set of packages based on cfengine classes (you can base classes on just about anything, and it includes hooks for setting up diskless clients), and copying over any custom configs or other setup that is necessary. This is similar to kickstart under Red Hat, or Jumpstart under Solaris. I don't think MOSIX provides any means of building the nodes initially, but Scyld probably does (it provides a system image, similar to the way SystemImager is setup I think). Note however that this only handles one aspect of setting up a cluster--the initial software installation. It does nothing for the longer term management aspects, and you're left figuring out how to deal with that on your own. This also says nothing about the actual hardware to pick, and the type of interconnect (10/100/1000 base-T, Myrinet, Scali, etc.). I could go on and on about why I picked Debian, or why to use one distribution over the other, or which software to pick and why, but that would require writing a book. There are already some really good books out there on the subject. The best advice I can give to anyone who is thinking about setting up a cluster is to do a lot of research and first gain and understanding of what's available to you. Don't rush. Define your present and potentially future needs, and pick a solution that best solves your problems. Clusters can't solve everything though, so make sure it's a problem that warrants the time and effort you'll spend in putting one together. If you're not the one doing the actual implemention, be nice to the sysadmin(s) that are (buy them beer when they have it working). :) (1) http://www.mosix.org/ (2) http://www.scyld.com/ (3) http://www.informatik.uni-koeln.de/fai/ Some other general links: http://www.linuxnetworx.com/ (I'd recommended this solution for anyone who's serious about a cluster. These guys are extremely professional) http://foundries.sourceforge.net/clusters/ (lots of links & software) -phillip ------------------------------------ Phillip Smith UNIX Systems Administrator Center for Genomics & Bioinformatics Jordan Hall 153 Indiana University, Bloomington Phone: 812-856-5081 E-mail: psmith@bio.indiana.edu ------------------------------------