[Bioclusters] Login & home directory strategies for PVM?
Tim Cutts
tjrc at sanger.ac.uk
Fri Feb 4 04:12:45 EST 2005
On 4 Feb 2005, at 6:46 am, Michael Gutteridge wrote:
>
> I don't believe this problem to be specific to PVM, but could be an
> issue with any parallel machine using large node sets. I'm curious as
> to strategies anyone else has used to mitigate the problem I've
> described, especially for circumstances such as this, where the slave
> nodes are merely compute donors.
Most very large clusters in the HPC world don't allow NFS at all, or
minimise it.
Our 1000-node cluster does allow some NFS, but this is to scratch
directories, and *not* to all users' home directories, in general.
Even then, we are in the process of replacing our NFS scratch
directories with true cluster filesystems (GPFS and/or Lustre), largely
for performance reasons. NFS really does suck, and NFS abuse by users
is the primary cause of cluster failure here.
But to answer your question: it sounds like you're automounting your
users' home directories. We rapidly found that automount really
doesn't work on clusters. Although it's easy to administer, you get
the behaviour you're seeing; large numbers of simultaneous mount
requests, which overwhelm the NFS server.
Consequently, the few NFS filesystems we allow our farm nodes to see,
we mount statically in /etc/fstab. We don't automount anything.
You still get the multiple mount requests problem when you switch the
cluster on (say after a power failure) so on the rare occasions we have
to power cycle the whole cluster we have to be careful to only switch
on a few dozen machines at a time until they're all up.
Tim
--
Dr Tim Cutts
Informatics Systems Group, Wellcome Trust Sanger Institute
GPG: 1024D/E3134233 FE3D 6C73 BBD6 726A A3F5 860B 3CDD 3F56 E313 4233
More information about the Bioclusters
mailing list