On 4 Nov 2004, at 4:02 pm, Alan Kilian wrote: > > Chris, > > I don't have any answers, but I think restating this might help > people spot the problem. > >> When I turn off GANGLIA's gmon daemon, the load drops down to ordinary >> rest states (0.1-ish). After some debugging to isolate the behavior, >> there's clearly a causal link between gmond on the portal and these >> high loads. We saw load explode on our cluster at one point, with gmond processes using 99% CPU. Network performance was awful. Some judicious use of tcpdump revealed that a farm node had got itself into a strange state where the machine had crashed, and was spewing multicast packets onto the network at the rate of one a millisecond. Unsurprisingly, the gmond processes on all the other nodes had a hard time coping with this. So: check what's going on on your network as well... Tim -- Dr Tim Cutts Informatics Systems Group, Wellcome Trust Sanger Institute GPG: 1024D/E3134233 FE3D 6C73 BBD6 726A A3F5 860B 3CDD 3F56 E313 4233