[Bioclusters] Xserve/iNquiry mpi/ssh issues

Rodney Dyer rjdyer at vcu.edu
Fri Feb 11 10:20:25 EST 2005


getting non-root password access is relatively easy.  I have a small  
write up doing this as I was setting it on my cluster for lam.  You can  
find the writeup at:  
http://dyerlab.bio.vcu.edu/documents/getaddrinfo.html

Also, you MUST turn off the firewall System Prefs -> Sharing ->  
Firewall (or just open the right ports) so that communication can  
happen.  This gave me a real tough time because I have the cluster  
hooked up to my office machine through a second PCI card on my desktop.  
  It appears that you cannot configure multiple interfaces using System  
Prefs.  YMMV.

Rodney J. Dyer, PhD

Department of Biology                          Assistant Professor
Virginia Commonwealth University    http://dyerlab.bio.vcu.edu
office: (804) 828-0874                           lab: (804) 828-0837
email: rjdyer at vcu.edu                          chat: rodneydyer at mac.com

On Feb 10, 2005, at 5:48 AM, Dan Swan wrote:

> Hi all,
>
> I originally posted this to the iNquiry list but I'm not sure people
> are using it that heavily..  I'm new to Xserve/iNquiry, my background  
> is
> Linux/Condor.
>
> I wish to run the MPI version of MrBayes.  I followed the MPICH
> install instructions from:
>
> http://bioteam.net/faq/index.php? 
> sid=&lang=en&action=artikel&cat=4&id=33&artlang
>
>
> which are very straightforward.  The problem is that when I execute the
> cpi test program with mpirun and -np > 1 mpirun hangs for about 5  
> minutes
> before spitting back:
>
> -su-2.05b# /usr/local/mpich-1.2.6/ch_p4/bin/mpirun -np 2
> /usr/local/mpich-1.2.6/ch_p4/examples/cpi
> p0_3247:  p4_error: Timeout in making connection to remote process on
> node01.cluster.private: 0
> Killed by signal 2.
>
> my machines.* file looks like this:
>
> node01.cluster.private
> node02.cluster.private
> node03.cluster.private
> node04.cluster.private
> node05.cluster.private
> node06.cluster.private
> node07.cluster.private
> node08.cluster.private
> node09.cluster.private
> node10.cluster.private
> node11.cluster.private:2
> node12.cluster.private:2
> node13.cluster.private:2
> node14.cluster.private:2
> node15.cluster.private
> node16.cluster.private:2
>
> If I start mpirun and log into node01 a ps -ax shows:
>
> 1162  ??  Ss     0:00.01 /usr/local/mpich-1.2.6/ch_p4/examples/cpi
> biocluster
>
> so the job appears to arrive (at least on one node).  I'm a bit  
> stumped.
>
> Googling for that error people start suggesting it might be to do with  
> ssh
> requiring passwordless authentication (although I am currently doing  
> this
> as root and am not sure this is the issue yet).  But I started looking
> into it and threw up another problem :/
>
> Passwordless ssh works fine for the root user from the head node to the
> cluster nodes, but not for any other user on the machine.  And no  
> matter
> how I try, I cannot get passwordless authentication working for a  
> non-root
> user on the cluster.
>
> Is there some MacOS X voodoo I am not aware of?  I cannot ssh to the
> cluster nodes as a non-root user, so I assume there is some "deeper"
> authentication issue here? (passwordless ssh is normally trivial to set
> up).  User home directories are being exported from the head node to  
> the
> nodes, although a root user on the node seems unable to ls -l any users
> ~/.ssh/.  Any hints gratefully appreciated.
>
> Cheers!
>
> Dan
>
> -- 
> Dr Daniel Swan - Bioinformatics Support Unit
> 924 Claremont Tower, University of Newcastle, Newcastle, NE1 7RU
> Tel: +44 (0)191 222 7856
> http://www.ncl.ac.uk/bioinformatics/support || d.c.swan at ncl.ac.uk
>
> _______________________________________________
> Bioclusters maillist  -  Bioclusters at bioinformatics.org
> https://bioinformatics.org/mailman/listinfo/bioclusters
>



More information about the Bioclusters mailing list