Hi Hershel: We use an alternate GPFS running on SuSE and tweak the hardware a bit because we find that the number of channels, not necessarily type, but number can significantly alter performance through better data scheduling/routing without adding overhead. Just curious, what apps are you trying to run on these small clusters, how many nodes/processors/channels? Cheers, Kathleen -----Original Message----- From: Guy Coates [mailto:gmpc at sanger.ac.uk] Sent: Thursday, January 26, 2006 2:28 AM To: Clustering, compute farming & distributed computing in life science informatics Subject: Re: [Bioclusters] gpfs overload on ibm bladecenter cluster On Thu, 26 Jan 2006, Hershel Safer wrote: > We're running two small IBM BladeCenter clusters under SuSE, with GPFS > for (we hope) fast file I/O. It seems to us that when user processes > on a blade are particularly memory intensive, and GPFS needs to > compete for a resource (memory in this case), GPFS most likely won't survive the competition and will die. Recent kernels have an entry in /proc/<PID>/oom_adj If you echo a low number in there (google for sensible values) it will protect processes (eg GPFS ones) from being zapped by the out-of-memory-killer. You can also put a high number in there for user processes, so those are the first against the wall, come the revolution. You can also enforce per-process memory limits (/etc/security/limits.conf) or with your job schedular, if you run one. You might also consider not running jobs on the machines which are GPFS NSD servers. We primarily use job-schedular enforced limits, which seem to work well for us. Cheers, Guy This may happen on one or more nodes of the cluster. The GPFS daemon > 'mmfsd' will lose its connection to other members of the cluster and > lose its GPFS filesystem mounts, and consequently any services that > reside on GPFS will fail. The blade will not necessarily crash after that; it may stay afloat may even be accessible via ssh. > > Have others encountered this situation? How can we prevent this > behavior? More generally, what kinds of limits do you impose on > consumption of resources such as memory and CPU? Thanks, > > Hershel > > > ______________________________________________________________________ > _________________________________ > Hershel M. Safer, Ph.D. > Chair, 5th European Conference on Computational Biology (ECCB '06) > Head, Bioinformatics Core Facility Weizmann Institute of Science PO > Box 26, Rehovot 76100, Israel > tel: +972-8-934-3456 | fax: +972-8-934-6006 > e-mail: hershel.safer at weizmann.ac.il | hsafer at alum.mit.edu > url: http://bioportal.weizmann.ac.il > > *************************************************** > Plan now for ECCB '06! > 5th European Conference on Computational Biology Eilat, Israel, Sept > 10 -- 13, 2006 Visit www.eccb06.org for details > -- Dr. Guy Coates, Informatics System Group The Wellcome Trust Sanger Institute, Hinxton, Cambridge, CB10 1HH, UK Tel: +44 (0)1223 834244 x 6925 Fax: +44 (0)1223 494919 _______________________________________________ Bioclusters maillist - Bioclusters at bioinformatics.org https://bioinformatics.org/mailman/listinfo/bioclusters