[Bioclusters] OS X and NFS

Fri Jul 15 09:07:22 EDT 2005

Hi Juan:

   A good rule of thumb is that in aggregate, local I/O is almost always 
fastest (with very few exceptions).  If you can move your data set off 
the NFS server as part of the job stage to local disk, you will usually 
get better performance per node.  This may not always be easy or 
practical in some cases.

   You simply need to look at local disk not as a storage medium, but as 
a cache medium, and then handle the loading and clearing of cache before 
and after your run.

Joe

Juan Carlos Perin wrote:
> 
> I had a question to see if anyone had any knowledge of a problem we've 
> been encountering.  It seems our Apple cluster is crashing due to NFS.  
> When we run large batch jobs that frequently access an NFS mount, the 
> system ends up accumulating  'stuck' processes.  If the job is able to 
> finish it eventually cleans the 'stuck' processes, and all is well.  
> But, if the job continues to allow accumulation of these stuck 
> processes, if a given job runs long enough, the system slowly 
> deteriorates and becomes less and less responsive, eventually freezing 
> up and not allowing anything to function at all.
> 
> We started the maximum number of NFS servers (20) and this improved 
> things, but didn't fix them.  We also limited the jobs to 10 nodes (20 
> processors) to theoretically allow one node to access one NFS pipeline 
> at any given time.  I'm not sure if anyone has run into this before, or 
> if anyone has ideas on how to approach fixing this problem.  The only 
> errors we're seeing otherwise are in the system log, complaining about 
> PasswordService not matching the clients response.
> 
> We're still running OSX 10.3.8 and our jobs are running through SGE 
> 5.3.  And we've got a 16 node (32 processor G5 system) with at least 2gb 
> RAM per node.   The programs running are a mixture of text mining 
> algorithms in both Perl and Java.  Both requiring frequent reads on 
> large .txt files residing on NFS shared directories.
> 
> Thanks in advance, for any ideas or suggestions.
> 
> Juan Perin
> Children's Hospital of Philadelphia
> _______________________________________________
> Bioclusters maillist  -  Bioclusters at bioinformatics.org
> https://bioinformatics.org/mailman/listinfo/bioclusters

-- 
Joseph Landman, Ph.D
Founder and CEO
Scalable Informatics LLC,
email: landman at scalableinformatics.com
web  : http://www.scalableinformatics.com
phone: +1 734 786 8423
fax  : +1 734 786 8452
cell : +1 734 612 4615