[Bioclusters] LSF 4.2 w/ Redhat 7.2 (2.4.X kernel with glibc2.2) install problem

Chris Dagdigian bioclusters@bioinformatics.org
Wed, 01 May 2002 08:07:41 -0400


Kristian,

Every LSF-based system I've had a chance to work on has had a similar 
architecture to what you describe (master node with multiple NICs 
multihomed to public/private networks etc..)

I've seen what you have described before but am fuzzy on what we did to 
get around the issue (hey its morning here). LSF is certainly sensitive 
to the order in which machines are listed in hostfiles so you may gain 
something by swaping entries around. I believe also that LSF can make 
use of a custom hosts file that you can drop into the local 
configuration directory...you may be able to create this file and list 
only the address of the interface you want the daemons to bind to.

Since I'm working now on a similar system I'll drop you an email 
off-list if I see the same issues. I already have one bug report to 
submit to platform with respect to the installer script.


-Chris



Kristian Vlahovicek wrote:

> WRT the installation, it went smoothly, no problems with glibc. However, 
> we did not manage to get the daemons to bind to a correct interface on 
> master. We migt be missing something (anyone with more experience with LSF 
> administration is welcome to suggest the solution), or could this be the 
> glibc incompatibility? OTOH, workers seem to be working quite fine...
> 
> basically, our master has 2 interfaces, eth0 (external net - public IP) 
> and eth1 (cluster net - private IP), each named differently. when doing 
> hostsetup after install, it queries the hostname on the master and gets 
> the public one and a) either it complains about host not being listed in 
> the servers list (in case we put the private iface name) or b) sets 
> everything up without complaints, but then master sees only itself and the 
> rest of the cluster see each other but not the master. (lshosts and 
> bhosts). Any ideas? As I said, we just started the eval, so there might be 
> a trivial solution to this burried somewhere in the documentation, but the 
> out-of-the-box thing does not work. Tried to see if there were any 
> switches to the lim daemon to direct it to the proper interface, but it 
> seems not...