[Bioclusters] Login & home directory strategies for PVM?

Fri Feb 4 04:12:45 EST 2005

On 4 Feb 2005, at 6:46 am, Michael Gutteridge wrote:
>
> I don't believe this problem to be specific to PVM, but could be an 
> issue with any parallel machine using large node sets.  I'm curious as 
> to strategies anyone else has used to mitigate the problem I've 
> described, especially for circumstances such as this, where the slave 
> nodes are merely compute donors.

Most very large clusters in the HPC world don't allow NFS at all, or 
minimise it.

Our 1000-node cluster does allow some NFS, but this is to scratch 
directories, and *not* to all users' home directories, in general.

Even then, we are in the process of replacing our NFS scratch 
directories with true cluster filesystems (GPFS and/or Lustre), largely 
for performance reasons.  NFS really does suck, and NFS abuse by users 
is the primary cause of cluster failure here.

But to answer your question:  it sounds like you're automounting your 
users' home directories.  We rapidly found that automount really 
doesn't work on clusters.  Although it's easy to administer, you get 
the behaviour you're seeing; large numbers of simultaneous mount 
requests, which overwhelm the NFS server.

Consequently, the few NFS filesystems we allow our farm nodes to see, 
we mount statically in /etc/fstab.  We don't automount anything.

You still get the multiple mount requests problem when you switch the 
cluster on (say after a power failure) so on the rare occasions we have 
to power cycle the whole cluster we have to be careful to only switch 
on a few dozen machines at a time until they're all up.

Tim

-- 
Dr Tim Cutts
Informatics Systems Group, Wellcome Trust Sanger Institute
GPG: 1024D/E3134233 FE3D 6C73 BBD6 726A A3F5  860B 3CDD 3F56 E313 4233