[Bioclusters] MPI-HMMER segmentation fault mpi error
landman at scalableinformatics.com
Thu Feb 12 08:45:26 EST 2009
You probably want to subscribe to the mpihmmer list and post
questions there, as this is where the developers tend to hang out. You
can find it here ...
> Dear all
> I have compiled mpi-hmmer recently openmpi (if not mistaken) ... when
> i run it using a small dataset it works fine ... but when i input a
> big dataset to search against .. like about 200MB ... it crashes on
> the first node, exits with "segmentation fault" and mpi-related error
> "errno=111" which is a "connection refused" type of error ... any clue
> on this? i dont think its mpi related, nor it is an ssh issue ... or
> is it? since i'v tried running the same problem on a compute node and
> it worked fine with no errors or such.
A segmentation fault is usually what you get when you run a program that
tries to access memory it doesn't have a right to access. Could you let
a) what you used for a command line
b) what database you used for your search
c) how much memory and what CPU type you have on the node that crashed.
Since it worked on a compute node, this suggests either library
differences, out of memory issues, or similar problems on the machine
you have run on.
To see if this is an ssh issue try this for each machine in your
ssh machinename hostname
where machinename is the name of the machine in the machines file. So,
for example, if your machines file has
then your test would look like this
ssh compute-1 hostname
ssh compute-2 hostname
ssh compute-3 hostname
ssh compute-4 hostname
If these work without a password, and work quickly without a password,
it is unlikely that ssh was a problem. If you didn't use a machines
file, then mpi will often try to do this to the local host, so you
ssh localhost hostname
as a test, and it should work, just like the others.
> What im sure of is that mpi is working fine as other software using it
> are working as expected ...
Still not enough information to provide a meaningful answer.
> any experience with such error?
111 errors? Yes. Usually the result of one or more of the mpi
> Thank you guys if you ever read this ... :) and hopefully guide me to
> the solution
Joseph Landman, Ph.D
Founder and CEO
Scalable Informatics LLC,
email: landman at scalableinformatics.com
web : http://www.scalableinformatics.com
phone: +1 734 786 8423 x121
fax : +1 866 888 3112
cell : +1 734 612 4615
More information about the Bioclusters