Hi, after installing our new cluster with Rocks 4.1 and BioBrew (plus a number of rolls; hpc roll included), I have a hard time getting mpiblast to run. The cluster consists of 7 machines (head node plus 6 compute nodes), each equipped with two dual-core Opteron CPUs and 8 GB of RAM. These are the steps I did: * Extend sysctl.conf to provide more shared mem: # Shared mem = 1 GB!! kernel.shmmax = 1099511627776 kernel.shmall = 1099511627776 on frontend and all nodes * Extended .bashrc to use mpich, and increase P4_GLOBMEMSIZE: export PATH=/opt/mpich/gnu/bin:$PATH export P4_GLOBMEMSIZE=157286400 * Put all nodes into /opt/mpich/gnu/share/machines.LINUX (hm... did I do this manually? Don't remember) I was trying Glen's "mpiblast introduction" as published on the Rocks-Discuss mailing list on 2005-03-24 and executed the following command line: mpirun -np 30 /usr/local/bin/mpiblast -p blastn -d Hs.seq.uniq -i IL2RA -o blast_results ~/.ncbirc is configured like this: ====================================================================== [NCBI] Data=/usr/share/ncbi/data/ [BLAST] BLASTDB=/state/partition1/blastdb BLASTMAT=/usr/share/ncbi/data/ [mpiBLAST] Shared=/state/partition1/blastdb Local=/tmp ====================================================================== (/state/partition1/blastdb is a symlink to the blastdb path on the frontend, and contains the database on the nodes. I tried this via NFS, too) Depending on the value of P4_GLOBMEMSIZE, I get different errors - but errors in all cases. The jobs are distributed among the nodes, though. For "smaller" values of P4_GLOBMEMSIZE (i.e. 104857600 == 100 MB, most of the time 200 MB) I get this error: ====================================================================== p0_8400: (23.453125) xx_shmalloc: returning NULL; requested 22880510 bytes p0_8400: (23.453125) p4_shmalloc returning NULL; request = 22880510 bytes You can increase the amount of memory by setting the environment variable P4_GLOBMEMSIZE (in bytes); the current size is 104857600 p0_8400: p4_error: alloc_p4_msg failed: 0 ====================================================================== For 200 MB, I sometimes (?) get the same error, sometimes this one: ====================================================================== rm_21956: p4_error: semget failed for setnum: 19 ====================================================================== For 300 MB, I get this: ====================================================================== p0_20214: p4_error: exceeding max num of P4_MAX_SYSV_SHMIDS: 256 ====================================================================== I tried to test my mpich installation with the sample programs included (cpi.c, mainly). I was able to get it running with -np <small number>, but the errors described above occured when I increased the process number. Yes, I executed "cleanipcs; cluster-fork cleanipcs" in advance in all cases. I frankly have not yet understood the correlation between (possible) shmmax/shmall settings, P4_GLOBMEMSIZE and P4_MAX_SYSV_SHMIDS and how to tune each one for a successful mpich parallelization. Due to these mpich problems, I installed OpenMPI and compiled the mpiblast src.rpm against OpenMPI; the errors above did not occur, but the blast job seemed to get stuck somewhere, too (no error message, but the job seemed to last forever). As I am quite new to clusters, MPI and mpiblast, I feel a little lost. Do you have any ideas what the problems may be, and how to fix them? Thx and Regards, Bastian -- Bastian Friedrich bastian at bastian-friedrich.de Adress & Fon available on my HP http://www.bastian-friedrich.de/ \~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~\ \ To learn more about paranoids, follow them around! -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 189 bytes Desc: not available Url : http://bioinformatics.org/pipermail/biobrew-users/attachments/20060406/45ee42e5/attachment.bin