[Bioclusters] [Mpihmmer] MPI-HMMER segmentation fault mpi error
John Paul Walters
waltersj at buffalo.edu
Thu Feb 12 10:13:39 EST 2009
In addition to Joe's suggestions, I would also suggest that you grab the
latest Mercurial snapshot, as it contains some fixes that aren't in the
two releases that are posted. The Mercurial link is directly beneath
the release links, and has both .zip and .gz options.
On Thu, 2009-02-12 at 08:45 -0500, Joe Landman wrote:
> Greetings AbDdU
> You probably want to subscribe to the mpihmmer list and post
> questions there, as this is where the developers tend to hang out. You
> can find it here ...
> AbDdU! wrote:
> > Dear all
> > I have compiled mpi-hmmer recently openmpi (if not mistaken) ... when
> > i run it using a small dataset it works fine ... but when i input a
> > big dataset to search against .. like about 200MB ... it crashes on
> > the first node, exits with "segmentation fault" and mpi-related error
> > "errno=111" which is a "connection refused" type of error ... any clue
> > on this? i dont think its mpi related, nor it is an ssh issue ... or
> > is it? since i'v tried running the same problem on a compute node and
> > it worked fine with no errors or such.
> A segmentation fault is usually what you get when you run a program that
> tries to access memory it doesn't have a right to access. Could you let
> us know
> a) what you used for a command line
> b) what database you used for your search
> c) how much memory and what CPU type you have on the node that crashed.
> Since it worked on a compute node, this suggests either library
> differences, out of memory issues, or similar problems on the machine
> you have run on.
> To see if this is an ssh issue try this for each machine in your
> machines file
> ssh machinename hostname
> where machinename is the name of the machine in the machines file. So,
> for example, if your machines file has
> then your test would look like this
> ssh compute-1 hostname
> ssh compute-2 hostname
> ssh compute-3 hostname
> ssh compute-4 hostname
> If these work without a password, and work quickly without a password,
> it is unlikely that ssh was a problem. If you didn't use a machines
> file, then mpi will often try to do this to the local host, so you
> should include
> ssh localhost hostname
> as a test, and it should work, just like the others.
> > What im sure of is that mpi is working fine as other software using it
> > are working as expected ...
> Still not enough information to provide a meaningful answer.
> > any experience with such error?
> 111 errors? Yes. Usually the result of one or more of the mpi
> processes crashing.
> > Thank you guys if you ever read this ... :) and hopefully guide me to
> > the solution
More information about the Bioclusters