[Bioclusters] OS X and NFS

Thu Jul 14 04:32:54 EDT 2005

On 13 Jul 2005, at 7:01 pm, M. Michael Barmada wrote:

> Hi Carlos,
>
> If its any help, we also had similar problems with our cluster. Our  
> solution
> was to train the users to include code in their scripts that would  
> create
> local directories (on the compute node - in /tmp) and copy the  
> files they
> needed to those directories, then do their computing locally and  
> copy back
> the results.

Absolutely.  And preferably do the copying with something other than  
NFS too - rcp or rsync work well, or the scheduler's built-in mechanism.
Most batch schedulers have built in abilities to this - LSF certainly  
does, in the form of lsrcp and various options to bsub.  I don't know  
about SGE - I'm not familiar with it, but I imagine the same sort of  
features are available.

It really is quite amazing how badly NFS scales.  I remember having  
serious problems with it on the first Linux cluster I built at  
Incyte's UK office about 6 years ago, and that was just 7 dual-CPU  
nodes talking to a Sun E3000 NFS server.  It didn't crash, but it got  
*really* slow - and that was deliberately caching the data locally (I  
wrote wrapper scripts around blastall and other applications to cache  
the databases locally, blowing them away by a least-recently-used  
method if there wasn't room).

Sanger's current 1100 node cluster still has NFS in places, and it  
regularly causes us grief.  Our medium-term aim is to remove pretty  
much all NFS from the cluster altogether, with the possible exception  
of automounted home directories, and use cluster filesystems like  
Lustre for shared data.

Tim

-- 
Dr Tim Cutts
Informatics Systems Group, Wellcome Trust Sanger Institute
GPG: 1024D/E3134233 FE3D 6C73 BBD6 726A A3F5  860B 3CDD 3F56 E313 4233