Hi Peter: On Fri, 27 May 2005, peter_webb at agilent.com wrote: > Every time I run it, two nodes crash. Not the same two nodes every > time, so doesn't look like a hardware problem. > > > > I'm going to drill down and see if I can find a small sample that > reliably takes down the machine. Meantime, I thought I'd ask, have > others seen this? We are running on SuperMicro dual Xeon nodes, the O/S > is RHE4 WS. Sounds suspiciously like memory. We had a case recently with 6 motherboards (expensive beasts at that) unable to drive memory at the rated specs when running under load. Took a few hours of memtest86 to catch it, or a few good gaussian runs. I might suggest running memtest over the weekend anyway. Odds are it won't find anything, but still ... How was megablast built? How did you compile it? Also, what does your swap space look like? swapon -s will tell us. What do the logs say right before the crash (if anything)? Is it a kernel panic? a hard lock? How did you build the cluster OS? Joe -------------- next part -------------- _______________________________________________ Bioclusters maillist - Bioclusters at bioinformatics.org https://bioinformatics.org/mailman/listinfo/bioclusters