[BioBrew Users] Unable to get mpiblast running
Bastian Friedrich
bastian at bastian-friedrich.de
Thu Apr 6 06:56:53 EDT 2006
Hi,
after installing our new cluster with Rocks 4.1 and BioBrew (plus a
number of rolls; hpc roll included), I have a hard time getting
mpiblast to run.
The cluster consists of 7 machines (head node plus 6 compute nodes),
each equipped with two dual-core Opteron CPUs and 8 GB of RAM.
These are the steps I did:
* Extend sysctl.conf to provide more shared mem:
# Shared mem = 1 GB!!
kernel.shmmax = 1099511627776
kernel.shmall = 1099511627776
on frontend and all nodes
* Extended .bashrc to use mpich, and increase P4_GLOBMEMSIZE:
export PATH=/opt/mpich/gnu/bin:$PATH
export P4_GLOBMEMSIZE=157286400
* Put all nodes into /opt/mpich/gnu/share/machines.LINUX (hm... did I do
this manually? Don't remember)
I was trying Glen's "mpiblast introduction" as published on the
Rocks-Discuss mailing list on 2005-03-24 and executed the following
command line:
mpirun -np 30 /usr/local/bin/mpiblast -p blastn -d Hs.seq.uniq -i IL2RA
-o blast_results
~/.ncbirc is configured like this:
======================================================================
[NCBI]
Data=/usr/share/ncbi/data/
[BLAST]
BLASTDB=/state/partition1/blastdb
BLASTMAT=/usr/share/ncbi/data/
[mpiBLAST]
Shared=/state/partition1/blastdb
Local=/tmp
======================================================================
(/state/partition1/blastdb is a symlink to the blastdb path on the
frontend, and contains the database on the nodes. I tried this via NFS,
too)
Depending on the value of P4_GLOBMEMSIZE, I get different errors - but
errors in all cases. The jobs are distributed among the nodes, though.
For "smaller" values of P4_GLOBMEMSIZE (i.e. 104857600 == 100 MB, most
of the time 200 MB) I get this error:
======================================================================
p0_8400: (23.453125) xx_shmalloc: returning NULL; requested 22880510
bytes
p0_8400: (23.453125) p4_shmalloc returning NULL; request = 22880510
bytes
You can increase the amount of memory by setting the environment
variable
P4_GLOBMEMSIZE (in bytes); the current size is 104857600
p0_8400: p4_error: alloc_p4_msg failed: 0
======================================================================
For 200 MB, I sometimes (?) get the same error, sometimes this one:
======================================================================
rm_21956: p4_error: semget failed for setnum: 19
======================================================================
For 300 MB, I get this:
======================================================================
p0_20214: p4_error: exceeding max num of P4_MAX_SYSV_SHMIDS: 256
======================================================================
I tried to test my mpich installation with the sample programs included
(cpi.c, mainly). I was able to get it running with -np <small number>,
but the errors described above occured when I increased the process
number.
Yes, I executed "cleanipcs; cluster-fork cleanipcs" in advance in all
cases.
I frankly have not yet understood the correlation between (possible)
shmmax/shmall settings, P4_GLOBMEMSIZE and P4_MAX_SYSV_SHMIDS and how
to tune each one for a successful mpich parallelization.
Due to these mpich problems, I installed OpenMPI and compiled the
mpiblast src.rpm against OpenMPI; the errors above did not occur, but
the blast job seemed to get stuck somewhere, too (no error message, but
the job seemed to last forever).
As I am quite new to clusters, MPI and mpiblast, I feel a little lost.
Do you have any ideas what the problems may be, and how to fix them?
Thx and Regards,
Bastian
--
Bastian Friedrich bastian at bastian-friedrich.de
Adress & Fon available on my HP http://www.bastian-friedrich.de/
\~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~\
\ To learn more about paranoids, follow them around!
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: not available
Url : http://bioinformatics.org/pipermail/biobrew-users/attachments/20060406/45ee42e5/attachment.bin
More information about the BioBrew-Users
mailing list