[BioBrew Users] slow ssh response on cluster

Fri Aug 29 14:50:26 EDT 2003

Glen,

Thanks for the info. As it happens I've just discovered what the problem
is with ssh. I configured the cluster while it was installed inside the
company network. Subsequently we moved into a DMZ area with highly
restrictive iptables rules. The cluster has no access to DNS, and I did
configure nameservers for frontend/compute nodes. If I ssh to nodes
directly by ip address the ssh completes quickly (also if I do "ssh -4
c0-0"). So it seems I must either relax the DMZ firewall rules or turn
off DNS lookup.

Also I did discover that mpirun spawns the tasks by ssh, but intertask
communication doesn't appear to use ssh (maybe I'm not correct here -
one of my Linpack jobs for small size did only two ssh communications
during its run, one to spawn the job, and another at the end.) Anyway it
appears I've made an unwarranted conclusion that ssh slowness is
directly related to my HPL weirdness. They MAY be related, but they may
not be...

Thanks,

Bill

On Fri, 2003-08-29 at 11:31, Glen Otero wrote:
> Bill-
> 
> You've come across two common slowdown on the cluster, 1) the use of  
> ssh and 2) mpirun using ssh to start jobs.  This is a real pain, and  
> something I will change in the future, but it takes a whole  
> reorganization of mpi.  One thing I can recommend trying is to add rsh  
> functionality to the cluster so jobs will launch faster. Instructions  
> for doing so are in the Rocks User's Guide.
> 
> WRT to HPL, I'm not surprised with the weirdness your seeing, I hear  
> about it all of the time, but I'm not an expert at tuning HPL, so I  
> can't offer any hints.  If I come across anything, I'll let you know.
> 
> Glen
-- 
Bill Barnard <bill at barnard-engineering.com>