[Bioclusters] MPI-HMMER segmentation fault mpi error

Joe Landman landman at scalableinformatics.com
Thu Feb 12 08:45:26 EST 2009


Greetings AbDdU

   You probably want to subscribe to the mpihmmer list and post 
questions there, as this is where the developers tend to hang out.  You 
can find it here ...

http://lists.scalableinformatics.com/mailman/listinfo/mpihmmer

AbDdU! wrote:
> Dear all
> 
> I have compiled mpi-hmmer recently openmpi (if not mistaken) ... when
> i run it using a small dataset it works fine ... but when i input a
> big dataset to search against .. like about 200MB ... it crashes on
> the first node, exits with "segmentation fault" and mpi-related error
> "errno=111" which is a "connection refused" type of error ... any clue
> on this? i dont think its mpi related, nor it is an ssh issue ... or
> is it? since i'v tried running the same problem on a compute node and
> it worked fine with no errors or such.

A segmentation fault is usually what you get when you run a program that 
tries to access memory it doesn't have a right to access.  Could you let 
us know

   a) what you used for a command line

   b) what database you used for your search

   c) how much memory and what CPU type you have on the node that crashed.

   Since it worked on a compute node, this suggests either library 
differences, out of memory issues, or similar problems on the machine 
you have run on.

   To see if this is an ssh issue try this for each machine in your 
machines file

	ssh machinename hostname

where machinename is the name of the machine in the machines file.  So, 
for example, if your machines file has

	compute-1
	compute-2
	compute-3
	compute-4

then your test would look like this

	ssh compute-1 hostname
	ssh compute-2 hostname
	ssh compute-3 hostname
	ssh compute-4 hostname
	
If these work without a password, and work quickly without a password, 
it is unlikely that ssh was a problem.  If you didn't use a machines 
file, then mpi will often try to do this to the local host, so you 
should include

	ssh localhost hostname

as a test, and it should work, just like the others.

> What im sure of is that mpi is working fine as other software using it
> are working as expected ...

Still not enough information to provide a meaningful answer.

> 
> any experience with such error?

111 errors?  Yes.  Usually the result of one or more of the mpi 
processes crashing.

> 
> 
> Thank you guys if you ever read this ... :) and hopefully guide me to
> the solution



-- 
Joseph Landman, Ph.D
Founder and CEO
Scalable Informatics LLC,
email: landman at scalableinformatics.com
web  : http://www.scalableinformatics.com
        http://jackrabbit.scalableinformatics.com
phone: +1 734 786 8423 x121
fax  : +1 866 888 3112
cell : +1 734 612 4615



More information about the Bioclusters mailing list