I had a question to see if anyone had any knowledge of a problem we've been encountering. It seems our Apple cluster is crashing due to NFS. When we run large batch jobs that frequently access an NFS mount, the system ends up accumulating 'stuck' processes. If the job is able to finish it eventually cleans the 'stuck' processes, and all is well. But, if the job continues to allow accumulation of these stuck processes, if a given job runs long enough, the system slowly deteriorates and becomes less and less responsive, eventually freezing up and not allowing anything to function at all. We started the maximum number of NFS servers (20) and this improved things, but didn't fix them. We also limited the jobs to 10 nodes (20 processors) to theoretically allow one node to access one NFS pipeline at any given time. I'm not sure if anyone has run into this before, or if anyone has ideas on how to approach fixing this problem. The only errors we're seeing otherwise are in the system log, complaining about PasswordService not matching the clients response. We're still running OSX 10.3.8 and our jobs are running through SGE 5.3. And we've got a 16 node (32 processor G5 system) with at least 2gb RAM per node. The programs running are a mixture of text mining algorithms in both Perl and Java. Both requiring frequent reads on large .txt files residing on NFS shared directories. Thanks in advance, for any ideas or suggestions. Juan Perin Children's Hospital of Philadelphia