[BioBrew Users] disabling non-SGE access to the cluster
Glen Otero
gotero at linuxprophet.com
Fri Aug 29 22:37:31 EDT 2003
On Friday, August 29, 2003, at 03:53 PM, Bill Barnard wrote:
> My cluster is working okay. I've tested submitting small jobs via SGE,
> which seems to work fine. I submitted a few small HPL jobs via SGE,
> which worked fine. Large HPL jobs still end up with a zombie process
> using SGE. (Will troubleshoot that later...)
Zombie processes suck because even if you kill the processes on the
frontend, they still will be running on the compute nodes. You have to
individually kill them on each of the nodes. Here's an easy way to
clean up all the nodes:
% cluster-fork skill -KILL -u <username>
Do this for any users that have processes, and if you do it as yourself
(not root) it will probably give you a disconnection message from each
of the nodes, but don't worry about that. After that, if you run 'ps'
you shouldn't see any user processes out there.
WRT to HPL zombie processes, if the compute nodes are not pentium 4
processors, then you might see zombie process behavior. The binaries
for hpl were optimized for the
Pentium 4 and uses instructions (SSE2) not available on Pentium III or
Athlon. The solution is to recompile the ATLAS library, install it and
rebuild hpl against it. It is easiest to just download the Atlas
libraries from netlib (prebuilt)
http://www.netlib.org/atlas/archives/linux/
But if you want to rebuild atlas and hpl from scratch, you should start
by checking out a Rocks CVS source tree.
# cvs -d:pserver:anonymous at cvs.rocksclusters.org:/home/cvs/CVSROOT/ \
checkout -r ROCKS_2_3_2_i386 rocks-src
and make sure to get the 2_3_2 version and not the HEAD
Rebuild and install ATLAS:
# cd rocks/src/contrib/atlas
# make rpm
# rpm -Uvh --force /usr/src/redhat/RPMS/i386/atlas*rpm
Rebuild HPL (no need to install it on the frontend if you don't run hpl
on the frontend):
# cd rocks/src/contrib/hpl
# make rpm
Rebuild your distribution:
# cd /home/install
# rocks-dist dist
Reinstall your compute nodes:
#shoot-node compute-0-1 compute-0-1...
The new hpl package will be bound into the new distribution
(rocks-dist knows to look in /usr/src/redhat/RPMS for new packages).
Then you should be able to run linpack on your cluster.
************
Here is what one user did to build rpms for the Pentium III:
I had the same problem with hpl and linpack on Rocks 2.3.2. You can get
the source rpm's at this location.
ftp://ftp.harddata.pub.rocks.athlon/SRPMS/
These binaries are compiled for the Athlon. The will not work on the
PIII. What you need to do for each source rpm file is a "rpmbuild
--rebuild --target=i386 atlas....." (Replace atlas ..... With complete
source rpm filename.) "rpmbuild --rebuild --target=i386 hpl.....
(Replace hpl..... With complete source rpm filename.)
If I remember correctly the atlas rebuild went into a loop on a question
about a fortran compiler. If that happens you need to edit the spec
file (specification file). To get to this file you must extract the
files from the source rpm file. To achieve this do the following:
1. "rpm -ivh atlas....."
2. Change into "usr/src/redhat/SPECS"
3. vi the Atlas.spec file. The section you want to edit is the Pentium
III section (Shown below)
#Pentium III
#
export PATH=/opt/gcc32/bin:$PATH
echo "0
y
y
n
y
y <---- This was the line that gave me trouble, I had to remove
this line completely.
linux
0
/opt/gcc32/bin/g77
-0
y
" | make
else
4. Save the file and exit vi.
5. do a "rpm -ba atlas.spec" from the "SPECS" directory. This will
create a new rpm file.
6. Wait for compile to complete. (Elevator music playing)
7. Change into the "/usr/src/redhat/RPMS/i386" directory and retrieve
your new RPM file for the PIII.
8. Install the new rpm on the frontend and all compute nodes. You will
also need to reinstall hpl to all nodes as well.
**********
HTH!
Glen
>
> Before I open the cluster for use I want to set it up so all jobs are
> submitted via SGE/qsub. I can currently submit mpirun directly, so I
> can
> clearly bypass SGE. Has anyone done this yet? (Not to say that I'm
> lazy,
> but of course I am lazy...)
>
> Thanks,
>
> Bill
> --
> Bill Barnard <bill at barnard-engineering.com>
>
> _______________________________________________
> BioBrew-Users mailing list
> BioBrew-Users at bioinformatics.org
> https://bioinformatics.org/mailman/listinfo/BioBrew-Users
>
>
Glen Otero, Ph.D.
Linux Prophet
619.917.1772
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: text/enriched
Size: 4688 bytes
Desc: not available
Url : http://bioinformatics.org/pipermail/biobrew-users/attachments/20030829/b932bab3/attachment.bin
More information about the BioBrew-Users
mailing list